0% found this document useful (0 votes)
21 views9 pages

Learning Mobility Flows From Urban Features With Spatial Interaction Models and Neural Networks

This paper presents a novel approach to assess origin-destination (OD) car flows in urban areas using neural networks, specifically graph neural networks, to model urban mobility as attributed graphs. The authors propose three neural network architectures and compare their performance against traditional spatial interaction models on a custom dataset of car OD flows in London. The study aims to provide a tool for urban planners to evaluate the impact of development projects on mobility flows by predicting potential flows between specific locations based on their features.

Uploaded by

Mirco
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views9 pages

Learning Mobility Flows From Urban Features With Spatial Interaction Models and Neural Networks

This paper presents a novel approach to assess origin-destination (OD) car flows in urban areas using neural networks, specifically graph neural networks, to model urban mobility as attributed graphs. The authors propose three neural network architectures and compare their performance against traditional spatial interaction models on a custom dataset of car OD flows in London. The study aims to provide a tool for urban planners to evaluate the impact of development projects on mobility flows by predicting potential flows between specific locations based on their features.

Uploaded by

Mirco
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Learning Mobility Flows from Urban Features with

Spatial Interaction Models and Neural Networks*


Gevorg Yeghikyan Felix L. Opolka Mirco Nanni Bruno Lepri Pietro Liò
Scuola Normale Superiore University of Cambridge ISTI-CNR FBK University of Cambridge
Pisa, Italy Cambridge, UK Pisa, Italy Trento, Italy Cambridge, UK
[Link]@[Link] flo23@[Link] [Link]@[Link] lepri@[Link] pl219@[Link]

Abstract—A fundamental problem of interest to policy mak- More specifically, the motivation for this task comes from
ers, urban planners, and other stakeholders involved in urban a scenario in which it is necessary to assess the impact
arXiv:2004.11924v1 [[Link]] 24 Apr 2020

development projects is assessing the impact of planning and of an urban development project on the OD flows in and
construction activities on mobility flows. This is a challenging
task due to the different spatial, temporal, social, and economic out of the project’s location. Examples of these motivating
factors influencing urban mobility flows. These flows, along with scenarios include retail location choice and consumer spatial
the influencing factors, can be modelled as attributed graphs behaviour prediction, which have been approached with the
with both node and edge features characterising locations in Huff model and its modifications [7]. These models, however,
a city and the various types of relationships between them. suffer from a series of drawbacks related mostly to overly
In this paper, we address the problem of assessing origin-
destination (OD) car flows between a location of interest and restrictive assumptions. In this paper, we take a different
every other location in a city, given their features and the approach and focus on the problem of evaluating OD flows
structural characteristics of the graph. We propose three neural in and out of a location of interest. By modelling urban flows
network architectures, including graph neural networks (GNN), as attributed graphs in which the nodes represent locations
and conduct a systematic comparison between the proposed in a city (i.e. each node is described by a vector of features
methods and state-of-the-art spatial interaction models, their
modifications, and machine learning approaches. The objective such as population density, Airbnb prices, available parking
of the paper is to address the practical problem of estimating areas, etc.), and the edges represent the car flows between
potential flow between an urban development project location them (each one described by a vector of features such as road
and other locations in the city, where the features of the project distance, average time required to travel, average speed, etc.),
location are known in advance. We evaluate the performance this project aims to offer an instrument for assessing flows
of the models on a regression task using a custom data set of
attributed car OD flows in London. We also visualise the model between a specific location and all other locations in the city.
performance by showing the spatial distribution of flow residuals Since a rigorous experimental setting would have required
across London. difficult-to-obtain longitudinal data of OD flows before and
Index Terms—urban mobility flows, spatial interaction models, after the completion of an urban development project, we set
graph neural networks, urban computing up a quasi-experimental setting. We randomly select locations
in a city and the flows associated with them as a test set,
I. I NTRODUCTION
and attempt to find a function that takes the urban features
Planning and managing city and transportation infrastruc- describing city locations and the remaining flows as input,
tures requires understanding the relationship between urban and predicts the flows in the test set as output.
mobility flows and spatial, structural, and socio-economic In sum, our paper makes the following contributions:
features associated with them. There exists extensive literature • We propose three neural network architectures for pre-
addressing this problem ranging from the classical gravity dicting car flows between a location of interest and every
model and its modifications [1], [2] to the more recent spatial other location in a city. Two of the models use graph con-
econometric interaction models [3] and the non-parametric ra- volutional layers that pool information from geographical
diation models [4] that attempt to characterise cross-sectional or topological neighbourhoods around relevant nodes to
origin-destination (OD) flow matrices. Furthermore, various incorporate more information (Section V).
neural network-based models have been proposed for predict- • We evaluate and compare our models on a custom dataset
ing temporal OD flow matrices [5], [6]. However, modelling of aggregate OD car flows in London, containing node
OD flow matrices in their entirety, the mentioned works do and edge features (Section VI).
not address the problem of assessing flows between a specific • We show that the proposed neural network models outper-
location and every other location in the city, given all other form well-known spatial interaction and machine learning
flows, other location characteristics, as well as information on models. A comparison among neural network models
the dyadic relations between those locations. reveals that graph convolutions do not substantially im-
*To appear in the Proceedings of 2020 IEEE International Conference on prove prediction performance on the formulated task
Smart Computing (SMARTCOMP 2020) (Sections IV, VI).
• We describe our custom dataset and make it publicly disadvantage of considering either spatial agglomeration or
available along with the code for this study (Section III). competition effects, ignoring the fact that they can coexist in
the same location. Even though a number of extensions to the
II. R ELATED WORK Huff model and the gravity framework in general have been
The problem of estimating human flows between locations proposed to overcome spatial non-stationarity and to include
in a geographical space has been first addressed by [1] through a larger array of features affecting the flows [19], [20], this
a family of spatial interaction models and subsequently ex- family of models, along with the non-parametric radiation and
tended by [2]. Spatial interaction models, extensively used population-weighted opportunities model, have demonstrated
to estimate human mobility flows and trip demand between to fall short of high predictive capacity particularly at the city
locations as a function of the location features, have be- scale [21], [22], [23].
come an acknowledged method for modelling geographical More recently, machine learning, particularly a Random
mobility in transportation planning [8], [9], commuting [10], Forest approach, has shown promising results in reconstructing
and spatial economics [11]. The spatial interaction models inter-city OD flow matrices [24]. However, its performance on
are usually calibrated via an Ordinary Least Squares (OLS) intra-urban flow data remains to be tested.
regression, which assumes normally distributed data. However, Moreover, as already mentioned, the discussed models ad-
OD flows are usually not distributed normally, are count data, dress the problem of modelling the OD flow matrix as a whole
and contain a large number of zero flows. This makes the and have to be adapted to our specific task of estimating flows
setting incompatible with OLS estimation and requires either between a specific location and all other locations, given the
a Poisson model or, in the presence of over-dispersion, a other flows in the city, the location features, and the features
Negative Binomial Regression (NB) model [12]. describing the dyadic relations between them, respectively.
Another major concern in this modelling scenario are the The problem of estimating OD flows has also been ad-
complex interactions often caused by spatial dependencies dressed with neural network methods [25]. As flows are most
and non-stationarity. The former arises from spill-over effects naturally modelled by graphs, most work has focused on the
from a location to its neighbourhoods, while the latter is use of graph neural networks for flow estimation. An early
caused by the influence of independent variables varying neural network model for graph structured data has been
across space. These issues have been addressed in literature by suggested in [26]. Later work has specifically focused on
spatial autocorrelation and geographically weighted modelling generalising Convolutional Neural Networks from the domain
techniques [13], [3], [14], [12]. of regular grids to the domain of irregular graphs [27], [28].
Another approach within the spatial interaction modelling One of the most commonly used graph neural network models
paradigm is the Huff model and its extensions [7]. Originally is the Graph Convolutional Neural Network (GCN) proposed
developed mainly for retail location choice and turnover pre- in [29].
diction, they represent a probabilistic formulation of the grav- Graph neural networks have previously been applied to
ity model. The Huff model considers OD flows as proportional urban planning tasks. In [5], they have been used to predict the
to the relative attractiveness and accessibility of the destination flow of bikes within a bike sharing system. Unlike our model,
compared to other competing destinations. The probability Pij flows are modelled as node-level features, which requires a
of a consumer at location i of choosing to shop at a retail different neural network model and does not allow to predict
location j is framed as: flows between specific pairs of nodes. Although [30] uses
graph neural networks to predict flows between parts of a
−β

j Dij city, their model operates on spatio-temporal data and focuses
Pij = Pn , (1) on the temporal aspect of the data. Beyond flow prediction,
α −β
j=1 Aj Dij
in [31], a graph neural network model has been proposed
where Aj is a measure of attractiveness of retail location j, for building site selection. A broader overview of machine
such as area or a linear combination of different features, Dij learning methods applied to the task of urban flow prediction
is the distance between locations i and j, α and β, estimated is given in [32]. In this work, we define neural network
from empirical observations, are attractiveness and distance models that make use of stationary node and edge features
decay parameters, respectively. and compare different neural network architectures based on
Along with traditional gravity methods, the Huff model and fully connected networks and graph neural networks.
its variations have found their way to numerous applications
III. DATA DESCRIPTION
including location selection of movie theaters [15], a university
campus [16], or the analysis of spatial access to health We publicly release1 a custom dataset of aggregate origin-
care [17]. destination (OD) flows of private cars in London augmented
However, these models suffer from too restrictive assump- with feature data describing city locations and dyadic relations
tions such as considering the ratio of the probabilities of an between them. The workflow of building the dataset is as
individual selecting two alternatives as being unaffected by follows:
the introduction of a third alternative. Although the compet- 1 Dataset will be released at [Link]
ing destinations model [18] has overcome this, it has the Code available at [Link]/FelixOpolka/Mobility-Flows-Neural-Networks.
(a) (b) (c)
Fig. 1: Examples of node (cell) features (a) Average Airbnb listing prices (b) Proportion of grid cell area allotted to industrial
activity (c) Number of museums and galleries per grid cell. Darker colours indicate higher values.

10−1 radius of gyration, etc. per cell. Examples of node


Flow data
features and their spatial distribution are visualised in
α = −2.088
Figure 1.
5) Similarly, the edge features encode information on 12
10−3 dyadic relations such as network distance, average time,
p(x) average speed, temporal correlation between car inci-
dence in cells, public transport connections, etc. The
10−5 detailed attribute description is provided with the dataset.

IV. P ROBLEM STATEMENT


−7 In this section, we describe the problem we are addressing
10
101 102 103 and state definitions of important terms.
Flow count We define a weighted attributed graph G =
(V, E, W, Xv , Xe ) with feature information associated with
Fig. 2: Log-log plot of the probability distribution of the OD
both nodes and edges. More specifically, V is the set of n
flows fitted with a power-law distribution p(x) ∝ x−α with
nodes, and E = {eij = (i, j) : i, j ∈ V} represents the set
exponent of α = −2.088.
of m edges in graph G. Furthermore, W ∈ Rn×n is the
weighted adjacency matrix, essentially the OD matrix, with
Wij ≥ 0 ∀i, j ∈ V corresponding to the flow between cells
1) The urban territory has been subdivided into n Cartesian i and j. Additionally, we denote the node feature matrix as
grid cells of size 500 × 500 m, and each such quadratic Xv ∈ Rn×p , where p is the number of node features. The edge
cell is considered a node in the graph. feature matrix, on the other hand, is denoted as Xe ∈ Rm×k ,
2) The GPS trajectories of around 10000 cars spanning where k is the number of edge features.
a period of one year, provided by a car insurance The urban mobility flow network T is a weighted undirected
company for research purposes, have been superimposed attributed graph whose nodes are 500 × 500 m city grid cells,
on the grid, and trip origins and destinations have been and the edges are the aggregate flows between them. The
extracted (Figure 3a). nodes and edges are additionally augmented by feature vectors
3) The OD network has been built from the extracted described in detail in Section III. Furthermore, each edge eij in
origin-destination pairs by aggregating the flow counts the urban mobility flow network T is associated with a target
over a year (Figure 3b). Since the aggregation spans (or ground truth) flow wij , which is the corresponding entry
such a long time period, the OD matrix is approximately in the weighted adjacency matrix W of T . It represents the
symmetric, and thus has been converted into a symmetric aggregate mobility flow between cell (node) i and cell (node)
matrix by averaging the matrix with its transpose. j in the network.
4) The node features have been built by engineering 35 In our prediction setting, we are given the urban mobility
features from various open sources [33], [34], [35] and flow network T = (V, E, W, Xv , Xe ) and a node of interest i
from the GPS data. These features include population for which the target flows Wi1 , . . . , Win are unknown. Hence,
density, average Airbnb prices, parking areas, areas we aim to learn a mapping f : {V, E, W, Xv , Xe } → Rn
covered by residential buildings, number of restaurants, from the urban mobility flow network to the missing flows, i.e.
bars, banks, museums, road network density, average [Wi1 , . . . , Win ] = f (i, W, Xv , Xe ) ∀i ∈ V. In other words, the
(a) (b) (c)
Fig. 3: (a) Car GPS trajectories over grid cells in London. (b) Origin-Destination (OD) flow network in London. (c) Target
flows between a node of interest and every other node.

aim is to predict the missing target flows (Figure 3c), given resulting representations at the central node and in the 1-hop
the features of node i and the rest of the graph. neighbourhood of the central node:
V. M ETHODOLOGY (l)
X 1 (l−1)
zi = p hj Θ, (3)
In the following, we describe three neural network models j∈N (i)∪{i}
(di + 1)(d j + 1)
that are trained to predict the unknown flows in the urban (l−1) (l)
×D
mobility flow network T . When a model makes a prediction where Θ ∈ RD is a learned weight matrix, N (i)
for the flow associated with an edge going from a node of refers to the 1-hop neighbourhood of node i, and di denotes
interest to another node in the graph, it can use all node and the degree of node i. This aggregation scheme is followed by a
edge features in the graph, as these features are available non-linearity and can be written more compactly using matrix
even for nodes of interest, i.e. sites of prospective urban multiplication as
development projects. Furthermore, it may use the ground truth 1 1
H (l) = ReLU(D̃ − 2 W̃ D̃ − 2 H (l−1) Θ). (4)
flows for edges that are not connected to a node of interest.
In a practical situation, this corresponds to the flows between where W̃ = W + I and D̃ is the degree matrix of W̃ .
existing locations in the city for which flow information is Equation 4 defines a graph convolutional layer and multiple
therefore available. such layers can be stacked to form a multi-layer graph neural
The first neural network architecture is a fully connected network. A GNN with k layers allows us to compute embed-
neural network operating on the features of the target edge dings encoding node feature information from within a k-hop
and the features of its two incident nodes. More specifically, neighbourhood.
when predicting the flow for target edge eij , we concatenate For the second model, we apply multiple graph convolutions
the node features xvi and xvj for incident node features, as as defined above on the flow-weighted geographical adjacency
well as the corresponding edge features xeij . The concatenated matrix W geo where Wijgeo is non-zero if and only if node i is in
vector the geographical neighbourhood of node j and Wijgeo = Wij ,
x̄ = [xvi , xeij , xvj ] (2) i.e. the flow between i and j. The resulting node embeddings
is passed into a fully connected neural network with ReLU- hi , hj ∈ RD for the two nodes incident to edge eij are added
non-linearities, defined as ReLU(zj ) = max(0, zj ), where to the representation of x̄ (see Equation 2 after the first fully
zj is the j th output of the linear transformation. Each fully connected layer:
connected layer is followed by batch normalisation [36] and (1)
hij = φ1 FCN(x̄) + φ2 [GNN(xi ) + GNN(xj )] , (5)
dropout [37] to counter overfitting. We refer to this model as
FCNN. where φ1 , φ2 are learned weighting coefficients. We note that
The second model builds upon the FCNN model through the both mentions of GNN(·) refer to the same sequence of graph
(1)
additional use of graph convolutions to generate embeddings convolutional layers. We then feed hij into a number of fully
of node neighbourhoods. We use a graph convolutional neural connected layers, again with dropout and batch normalisation,
network (GCN) [29] to generate node embeddings hi , hj for such that the resulting model contains the same number of
the two nodes incident to the target edge eij . GCN layers ex- fully connected layers as the FCNN model. We call the
tend fully-connected layers with an additional neighbourhood resulting model GNN-geo.
aggregation step before the non-linearity. The layer applies a Finally, we evaluate a third model, denoted by GNN-flow,
(l−1)
linear transformation to all node features hi in the graph which is equivalent to GNN-geo except graph convolutions are
and then, for each node, computes a weighted average of the performed using the flow-based adjacency matrix W flow =
GCN

vi xvi ·φ2

x̄ ·φ1
eij xeij || FCN + FCNs ŷij

vj xvj
·φ2
GCN

Fig. 4: Overview of the neural network model architectures. When predicting the flow for edge eij , all three models concatenate
the corresponding edge features xeij , and the node features xvi , xvj of the incident nodes. The resulting vector is fed into a
single fully connected layer. In case of the GNN-based models GNN-geo and GNN-flow, the network also perform graph
convolutions on the neighbourhoods of vi and vj and computes a weighted sum of both neighbourhood embeddings and the
edge embedding. A further set of fully connected layers maps the sum to the predicted flow ŷij . The FCNN model skips the
addition step and does not perform graph convolutions.

W , where Wijf low is the flow between i and j. Hence, the


adjacency matrix used by GNN-flow will contain additional 1 XX
MAE = |yij − ŷij | . (7)
edges to those used by GNN-geo. A visualisation of the model |E| i j
architectures is given in Figure 4.
The graph based models GNN-geo and GNN-flow require Binned MAE. Due to the highly skewed distribution of
flow information for the adjacency matrices. While this is the flow data, the vast majority of flows have a small flow
readily available for edges between two regular nodes, we have count, with only a handful of flows with a very large flow
to approximate flow between a regular node i and a node of value (see Figure 2). Because of this, the total MAE will
interest j. This is done by taking the average of the flows from be biased downwards. To account for this, we additionally
node i to each node in the neighbourhood of j, i.e. measure the MAE of all models within 4 bins with the
following boundaries: 0 ≤ 10.0 ≤ 100.0 ≤ 1000.0 ≤ 10000.0,
1 X
W̃ij = Wik . (6) corresponding to MAE0 , MAE1 , MAE2 , MAE3 , respectively.
|N (j)|
k∈N (j) Finally, we define the MAE bin mean as
We note that even though the FCNN does not use graph MAE0 + MAE1 + MAE2 + MAE3
convolutions and hence does not qualify as a common graph Bin mean MAE = , (8)
4
neural network, it does use graph structure information by
concatenating specifically the features xvi , xvj of the nodes where MAEi refers to MAE of the ith bin.
incident to the target edge eij . Mean absolute percentage error (MAPE). To display the
All models output the flow corresponding to the target model accuracy with respect to the ground-truth flow values,
edge eij and are trained to minimise the mean squared error we further use the mean absolute percentage error, defined as
between the predicted and the actual flow. More details on the
experimental setup are provided in Section VI-C. 1 X X yij − ŷij
MAPE = 100 × , (9)
|E| i j yij
VI. E XPERIMENTS
We evaluate the described model on the London dataset Sorensen similarity index. We use a modified version of the
described in Section III. In the following, we describe the Sorensen similarity index (SSI), which has been extensively
goodness-of-fit metrics we use to measure model performance, used in spatial interaction modelling [23], [38], and is defined
the baseline methods we compare our models to, and the as
experimental setup. 1 X X 2 min (yij , ŷij )
SSI = , (10)
|E| i j yij + ŷij
A. Goodness-of-fit measures
Mean absolute error (MAE). Let ŷij be the predicted flow and takes on values between 0 and 1, with values closer to 1
between i and j, yij be the ground truth flow, then denoting a better fit.
Common part of commuters. Further, we use a similar inferred from the edge features describing the dyadic rela-
metric, the common part of commuters, used specifically for tions between city locations. As opposed to conventional
mobility OD flow networks [38]: regression methods, this method intrinsically respects the
Pn network constraints.
2 i,j=1 min (yij , ŷij )
CP C = Pn Pn . (11) • Random Forest regression (RF): We follow the ap-
i,j=1 yij + i,j=1 ŷij proach proposed in [24] aimed at predicting inter-city
This measure takes on the value 0, when the flows in the two mobility flows with a set of attributes describing each
networks completely differ, and 1, when they are in perfect city. We adapt the same approach to our problem of intra-
agreement. city flow prediction. Following the described method, we
Common part of links. Finally, to measure the degree to use a Random Forest approach with eXtreme Gradient
which the topological structure of the original network has Boosting (XGBoost) [41] through 5-fold cross-validation,
been reconstructed, we use the common part of links (CPL) model and feature selection, and hyperparameter tuning.
[39] defined as C. Experimental setup
2 i,j=1 1yij >0 · 1ŷij >0
Pn
For training and evaluating the three proposed models, we
CP L = Pn , (12)
i,j=1 1yij >0 + i,j=1 1ŷij >0
Pn
divide the dataset into a training, validation, and test set of
edges. The subsets contain 70%, 10%, and 20% of the edges
where 1A is the indicator function of condition A. The respectively. To construct the test set, we randomly select
common part of links shows the proportion of links between nodes in the graph and add their incident edges to the test
the observed and predicted networks such that yij > 0 and set. We ensure that an equal number of edges fall in each of
ŷij > 0. It takes on the value zero if the two networks have the four bins split by flow magnitude. Hence, once a bin is full,
no common links and one if the networks are topologically no more edges are added to the test set that would fall into this
equivalent. bin. We use the same procedure to construct the validation set.
Nodes in the validation and test set are considered nodes of
B. Baseline models
interest, while nodes in the training set are considered regular
In this study, we compare the proposed model to the nodes.
following baselines, using the same experimental setup for all We train all models on the same training set. To address the
models: imbalance between flows of different magnitude, we resample
• Doubly constrained gravity model (DC-GM): The the data such that each bin contains the same number of
classical gravity model with a power law decay has samples. We perform hyperparameter search to determine the
several formulations with respect to preserving the total optimal dimension of the intermediate representations, i.e. the
in- nor out-flows during model calibration: unconstrained, outputs of the GCN and fully connected layers, the dropout
origin-constrained, destination-constrained, and doubly rate, and the number of fully connected and GCN layers. We
constrained. Here we take the latter. select models based on the bin mean MAE (see Equation 8)
• Huff model: A probabilistic formulation of the gravity achieved on the validation set. The selected models have a
model described in Section II. total of four fully connected layers. The GNN-based models
• Poisson regression: An instance of the Generalized use a single GCN layer. We use a dimensionality of 32 for
Linear Modelling framework, in which the dependent intermediate representations and the dropout rate is set to 0.5.
variable, being count data, is assumed to be drawn from We train for a total of 110 epochs using the Adam opti-
a Poisson distribution. miser [42] with a batch size of 256 and a learning rate of
• Negative Binomial regression (NB): A generalization of 0.01. We reduce the learning rate by a factor of ten after 50
the Poisson regression in which the restrictive assumption epochs and every 15 epochs after that. We stop training early
that the mean and the variance of the dependent variable once the performance of the model does no longer improve in
are equal is loosened. terms of bin mean MAE on the validation set.
• Spatial Autoregressive Model (SAM): An extension to We have also experimented with using different types of
the Generalized Linear Modelling framework by account- graph neural network layers including GAT layers [43], GIN
ing for spatial dependence among the flows by using layers [44], and Jumping Knowledge layers [45]. We did not
spatial lags represented by spatial weight matrices built find these layers to improve performance on the validation data
from observed data [3]. set and hence preferred the conceptually simpler GCN layers.
• Generalised hypergeometric ensemble multilayer net-
work regression (gHypE): This recent random graph ap- VII. R ESULTS
proach [40] provides a statistical ensemble of all possible We compare our models to the baseline ones in terms of
flow networks under the constraints of preserving in- and MAE in Table I. We find that all three neural network models
out-flows from each node, as well as respecting pairwise outperform all the spatial interaction models (DC-GM, Huff,
flow propensities of nodes. The multilayer network re- Poisson, NB, SAM) as well as gHypE and XGBoost in terms
gression considers these propensities as latent variables, of total MAE by a large margin. Crucially, the MAEs per bin
(a) (b)
Fig. 5: MAE residuals of flows associated with test nodes (a) GNN-geo. (b) XGBoost.

MAE Total [0; 10) [10; 102 ) [102 ; 103 ) [103 ; 104 ) bin mean
DC-GM 167.58 64.88 170.45 881.98 2176.35 823.42
Huff 122.89 48.21 99.86 511.41 1476.72 534.05
Poisson 106.74 40.69 88.56 475.23 1261.41 466.47
NB 92.62 33.02 76.96 431.44 1087.12 407.14
SAM 75.09 19.31 61.53 395.01 989.30 366.29
gHypE 58.11 9.02 53.10 346.96 832.26 310.34
XGBoost 31.59 ± 5.88 2.61 ± 0.89 45.12 ± 11.06 228.96 ± 39.96 549.83 ± 84.79 206.63 ± 34.18
FCNN 12.55 ± 0.91 0.33 ± 0.08 28.97 ± 4.93 161.12 ± 22.36 408.88 ± 36.59 149.82 ± 13.65
GNN-geo 13.34 ± 2.51 0.52 ± 0.40 31.63 ± 9.68 161.32 ± 9.09 422.04 ± 25.70 153.88 ± 9.74
GNN-flow 15.35 ± 4.23 0.63 ± 0.62 38.66 ± 16.65 170.06 ± 17.41 458.05 ± 64.56 166.85 ± 16.39

TABLE I: Comparison of model performance in terms of mean absolute error grouped by flow magnitude.

MAPE graph neural network, it does use graph structural information


SSI CPL CPC
[103 ; 104 )
by concatenating edge features with features of incident nodes.
DC-GM 0.39 162.59 0.38 0.49
Huff 0.48 106.91 0.56 0.54
Furthermore, previous work on mobility flow prediction has
Poisson 0.46 102.10 0.57 0.54 omitted an explicit comparison of GNNs to fully connected
NB 0.54 91.03 0.62 0.56 neural networks, hence it remains unclear whether GNNs offer
SAM 0.59 66.65 0.68 0.58
gHypE 0.62 52.99 0.79 0.60 a predictive advantage in the urban mobility setting.
XGBoost 0.67 ± 0.02 40.90 ± 5.85 0.86 ± 0.02 0.61 ± 0.01 Finally, we compare the neural network models to the
FCNN 0.71 ± 0.00 27.16 ± 2.23 1.0 ± 0.00 0.69 ± 0.01 baselines in terms of SSI, MAPE of the largest bin, CPL,
GNN-geo 0.70 ± 0.01 27.06 ± 1.65 1.0 ± 0.00 0.68 ± 0.04 and CPC. These results also confirm that the neural network
GNN-flow 0.71 ± 0.02 30.67 ± 4.18 1.0 ± 0.01 0.65 ± 0.05
models find a better fit to the data compared to the state-of-
TABLE II: Comparison of model performance in terms of the-art.
MAPE, SSI, CPL, and CPC. To further illustrate the effectiveness of the GNN models,
we represent the MAE residuals on the London diagrammatic
maps in Figure 5. These representations show the difference
reveal that the neural network models achieve high accuracy between predicted and ground-truth flows between the loca-
across bins relative to the magnitude of flows, hence the neural tions in the test set. We compare the state-of-the-art XGBoost
network does not only perform well on small flows, which are model with our GNN-flow model and observe that the latter
highly overrepresented in the dataset. results in spatially smoother residuals.
We also observe that there is no clear difference in the
VIII. C ONCLUSION
performance between the three neural network based models.
Surprisingly, the graph neural networks (GNN-geo, GNN-flow) In this paper, we formulated and addressed the problem of
do not outperform the fully connected neural network FCNN. learning urban mobility flows between a location of interest
This indicates that node neighbourhood information does not and every other location in the city, given the array of socio-
result in stronger predictive performance for this dataset and economic and structural features describing each location and
prediction task. We stress, however, that while FCNN is not a the pairwise dyadic relations between them. We proposed
three novel neural network architectures, using fully connected [17] N. Wan, B. Zou, and T. Sternberg, “A three-step floating catchment area
and graph convolutional layers, and compared them to a set method for analyzing spatial access to health services,” International
Journal of Geographical Information Science, vol. 26, no. 6, pp. 1073–
of strong baseline models. We find that the neural network 1089, 2012.
models achieve state-of-the-art performance and outperform [18] A. S. Fotheringham, “A new set of spatial-interaction models: the theory
the baselines by a large margin. of competing destinations,” Environment and Planning A: Economy and
Space, vol. 15, no. 1, pp. 15–36, 1983.
In fulfilment of the stated objective, our work has direct [19] M. De Beule, D. Van den Poel, and N. Van de Weghe, “An extended
utility to urban planners and policy makers in offering a huff-model for robustly benchmarking and predicting retail network
technique for assessing mobility flows between an urban performance,” Applied Geography, vol. 46, pp. 80–89, 2014.
[20] Y. Li and L. Liu, “Assessing the impact of retail location on store
development project location and other locations in the city. performance: A comparison of wal-mart and kmart stores in cincinnati,”
Applied Geography, vol. 32, no. 2, pp. 591–600, 2012.
ACKNOWLEDGMENT [21] A. P. Masucci, J. Serras, A. Johansson, and M. Batty, “Gravity versus
radiation models: On the importance of scale and heterogeneity in
F.L.O acknowledges funding from the Huawei Hisilicon commuting flows,” Physical Review E, vol. 88, no. 2, p. 022812, 2013.
Studentship at the Department of Computer Science and [22] X. Liang, J. Zhao, L. Dong, and K. Xu, “Unraveling the origin of
exponential law in intra-urban human mobility,” Scientific reports, vol. 3,
Technology of the University of Cambridge. This work is p. 2983, 2013.
partially funded by the EU H2020 programme under Grant [23] X.-Y. Yan, C. Zhao, Y. Fan, Z. Di, and W.-X. Wang, “Universal
Agreement No. 780754, “Track & Know”. predictability of mobility patterns in cities,” Journal of The Royal Society
Interface, vol. 11, no. 100, p. 20140834, 2014.
[24] G. Spadon, A. C. de Carvalho, J. F. Rodrigues-Jr, and L. G. Alves,
R EFERENCES “Reconstructing commuters network using machine learning and urban
indicators,” Scientific reports, vol. 9, no. 1, pp. 1–13, 2019.
[1] A. G. Wilson, “A family of spatial interaction models, and associated
[25] M. Lorenzo and M. Matteo, “Od matrices network estimation from
developments,” Environment and Planning A, vol. 3, no. 1, pp. 1–32,
link counts by neural networks,” Journal of Transportation Systems
1971.
Engineering and Information Technology, vol. 13, no. 4, pp. 84–92,
[2] A. S. Fotheringham and M. E. O’Kelly, Spatial interaction models:
2013.
formulations and applications, vol. 1. Kluwer Academic Publishers
Dordrecht, 1989. [26] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini,
“The graph neural network model,” Trans. Neur. Netw., vol. 20, no. 1,
[3] J. P. LeSage and R. K. Pace, “Spatial econometric modeling of origin-
p. 6180, 2009.
destination flows,” Journal of Regional Science, vol. 48, no. 5, pp. 941–
967, 2008. [27] J. Bruna, W. Zaremba, A. Szlam, and Y. Lecun, “Spectral networks and
[4] F. Simini, M. C. González, A. Maritan, and A.-L. Barabási, “A universal locally connected networks on graphs,” in International Conference on
model for mobility and migration patterns,” Nature, vol. 484, no. 7392, Learning Representations (ICLR2014), CBLS, April 2014, 2014.
p. 96, 2012. [28] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural
[5] D. Chai, L. Wang, and Q. Yang, “Bike flow prediction with multi-graph networks on graphs with fast localized spectral filtering,” in Proceedings
convolutional networks,” in Proceedings of the 26th ACM SIGSPATIAL of the 30th International Conference on Neural Information Processing
International Conference on Advances in Geographic Information Sys- Systems, NIPS16, (Red Hook, NY, USA), p. 38443852, Curran Asso-
tems, pp. 397–400, ACM, 2018. ciates Inc., 2016.
[6] F. Toqué, E. Côme, M. K. El Mahrsi, and L. Oukhellou, “Forecasting dy- [29] T. N. Kipf and M. Welling, “Semi-Supervised Classification with Graph
namic public transport origin-destination matrices with long-short term Convolutional Networks,” in Proceedings of the 5th International Con-
memory recurrent neural networks,” in 2016 IEEE 19th International ference on Learning Representations, ICLR ’17, 2017.
Conference on Intelligent Transportation Systems (ITSC), pp. 1071– [30] X. Wang, Z. Zhou, F. Xiao, K. Xing, Z. Yang, Y. Liu, and C. Peng,
1076, IEEE, 2016. “Spatio-temporal analysis and prediction of cellular traffic in metropo-
[7] D. L. Huff, “A probabilistic analysis of shopping center trade areas,” lis,” IEEE Transactions on Mobile Computing, vol. 18, no. 9, pp. 2190–
Land economics, vol. 39, no. 1, pp. 81–90, 1963. 2202, 2019.
[8] S. Erlander and N. F. Stewart, The gravity model in transportation [31] D. Zhu and Y. Liu, “Modelling spatial patterns using graph convolutional
analysis: theory and extensions, vol. 3. Vsp, 1990. networks,” in 10th International Conference on Geographic Information
[9] J. de Dios Ortuzar and L. G. Willumsen, Modelling transport. John Science (GIScience 2018), 2018.
wiley & sons, 2011. [32] P. Xie, T. Li, J. Liu, D. Shengdong, Y. Xin, and J. Zhang, “Urban flows
[10] D. P. McArthur, G. Kleppe, I. Thorsen, and J. Ubøe, “The spatial prediction from spatial-temporal data using machine learning: A survey,”
transferability of parameters in a gravity model of commuting flows,” 2019. arXiv preprint arXiv: 1908.10218.
Journal of Transport Geography, vol. 19, no. 4, pp. 596–605, 2011. [33] OpenStreetMap contributors, “Planet dump retrieved from
[11] R. Patuelli, A. Reggiani, S. P. Gorman, P. Nijkamp, and F.-J. Bade, [Link] .” [Link] 2017.
“Network analysis of commuting flows: A comparative static approach [34] Murray Cox, “Inside Airbnb retrieved from [Link]
to german data,” Networks and Spatial Economics, vol. 7, no. 4, pp. 315– [Link] .” [Link] 2019.
331, 2007. [35] Transport for London, “Open data retrieved from [Link]
[12] L. Zhang, J. Cheng, and C. Jin, “Spatial interaction modeling of for/open-data-users/ .” [Link] 2019.
od flow data: Comparing geographically weighted negative binomial [36] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
regression (gwnbr) and ols (gwolsr),” ISPRS International Journal of network training by reducing internal covariate shift,” in Proceedings
Geo-Information, vol. 8, no. 5, p. 220, 2019. of the 32nd International Conference on International Conference on
[13] A. S. Fotheringham, C. Brunsdon, and M. Charlton, Geographically Machine Learning - Volume 37, ICML15, p. 448456, [Link], 2015.
weighted regression: the analysis of spatially varying relationships. John [37] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhut-
Wiley & Sons, 2003. dinov, “Dropout: A simple way to prevent neural networks from over-
[14] A. R. da Silva and T. C. V. Rodrigues, “Geographically weighted fitting,” J. Mach. Learn. Res., vol. 15, no. 1, p. 19291958, 2014.
negative binomial regressionincorporating overdispersion,” Statistics and [38] M. Lenormand, S. Huet, F. Gargiulo, and G. Deffuant, “A universal
Computing, vol. 24, no. 5, pp. 769–783, 2014. model of commuting networks,” PloS one, vol. 7, no. 10, 2012.
[15] P. Davis, “Spatial competition in retail markets: movie theaters,” The [39] M. Lenormand, A. Bassolas, and J. J. Ramasco, “Systematic comparison
RAND Journal of Economics, vol. 37, no. 4, pp. 964–982, 2006. of trip distribution laws and models,” Journal of Transport Geography,
[16] G. Bruno and G. Improta, “Using gravity models for the evaluation of vol. 51, pp. 158–169, 2016.
new university site locations: A case study,” Computers & Operations [40] G. Casiraghi, “Multiplex network regression: How do relations drive
Research, vol. 35, no. 2, pp. 436–444, 2008. interactions?,” arXiv preprint arXiv:1702.02048, 2017.
[41] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,”
in Proceedings of the 22nd acm sigkdd international conference on
knowledge discovery and data mining, pp. 785–794, 2016.
[42] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza-
tion,” in 3rd International Conference on Learning Representations,
ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track
Proceedings (Y. Bengio and Y. LeCun, eds.), 2015.
[43] P. Velikovi, G. Cucurull, A. Casanova, A. Romero, P. Li, and Y. Bengio,
“Graph attention networks,” in International Conference on Learning
Representations, 2018.
[44] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph
neural networks?,” in International Conference on Learning Represen-
tations, 2019.
[45] K. Xu, C. Li, Y. Tian, T. Sonobe, K.-i. Kawarabayashi, and S. Jegelka,
“Representation learning on graphs with jumping knowledge networks,”
in Proceedings of the 35th International Conference on Machine Learn-
ing (J. Dy and A. Krause, eds.), vol. 80 of Proceedings of Machine
Learning Research, pp. 5453–5462, 2018.

You might also like