This invention generally relates to the technical field of intelligent traffic technology (mining and prediction of urban traffic spatio-temporal data), and more particularly, to a multi-modal data prediction method based on a causal Markov model.
An urban traffic system is a multi-modal complex system formed by mutually associating and aggregating a plurality of subsystems. Various travel modes represent various travel means, such as by bicycles, taxis, buses and passenger cars. Its purpose is to meet a variety of travel requirements while providing a variety of selections for urban residents. With the improvement of city infrastructure and increase in consumption level, the number of vehicles in a city continuously increases, causing the urban traffic infrastructure and its associated management fail to meet the travel requirements of urban residents, resulting in serious urban traffic congestion. The conflict between the uneven distribution of travel demands and the insufficient road resources is a major reason for the traffic congestion. Therefore, a comprehensive analysis of the travel demands is the key to solve this problem. The multi-modal traffic flow is capable of accurately reflecting the travel demands and describing the health degree of a traffic system, which allows the relevant authorities to make traffic management strategies according to the traffic flow in different conditions, thereby ensuring a smooth traffic operation. Thus, a multi-modal traffic prediction is crucial for the urban traffic management, which provides a traffic guidance and important data support for making traffic management strategies.
The purpose of multimodal traffic prediction is to synchronously predict the future traffic of multiple transportation vehicles based on the input historical flow data. Presently, most conventional traffic prediction methods merely predict the traffic flow in a single mode, such as the bicycle traffic flow or road speed, etc. These methods merely achieve a partial observation of the traffic system, failing to reflect a real-world situation in real scenarios. Though these methods are used for various traffic predictions, each mode requires the building of a prediction model, which consumes a lot of resources while failing to form an end-to-end unified framework. In recent years, the multi-modal traffic joint prediction has gradually attracted the attention of researchers. These methods normally adopt the multi-modal traffic flow to expand the channel dimension of input features or fuse various modal traffic features in the model. The purpose is to implicitly extract the spatio-temporal relationship of the multi-modal traffic data. However, these methods are lack of the description of causal relationships. Moreover, the dimension expansion of data features introduces a large number of confounding factors instead of improving the predicting ability of the model, resulting in poor prediction effect.
Presently, conventional multi-modal traffic prediction methods excessively emphasize the spatio-temporal correlations in traffic data, ignoring physical concepts affecting the generation of traffic data and causal relationships among these concepts. Under different conditions, the spatio-temporal correlations are considered unstable and a spurious correlation may exist.
The basic reason for causing the variation of the multi-modal traffic flow lies in the physical concepts (the regional attraction factor, the demand factors of different traffic modes and the traffic speed factor) affecting the generation of the flow data and the causal relationship among these concepts. Moreover, paying excessive attention on the correlations may lead to unstable prediction result of the multi-modal traffic flow. Therefore, the present invention provides a multi-modal data prediction method based on a causal Markov model, which starts from the generation process of the multi-modal traffic flow and divides the core physical concepts affecting the generation of flow data into three groups: 1) the attraction factors of the region in different time periods, 2) the demand factors of different traffic modes selected under different conditions, and 3) the traffic speed factor affected by the number of road vehicles. According to the present invention, the causal representation of the physical concepts is learned from the conditional information and the multi-modal traffic data, and the causal relationship among the physical concepts is further explored, so that an accurate prediction of the multi-modal traffic flow is achieved.
The multi-modal data prediction method based on a causal Markov model of the present invention is used for predicting the multi-modal traffic flow, thereby allowing relevant authorities to make traffic management strategies according to the traffic flow in different conditions. The method of the present invention comprises the following steps:
Step 1: collecting the regional data and traffic data of a research region, and constructing a causal graph of a causal Markov process;
First, obtaining the regional division, regional point of interest information, weather information and multi-modal traffic data of a research region, wherein the multi-modal traffic data includes the shared bicycle order data, taxi order data, bus order data and road traffic speed data;
Subsequently, taking the time position information, the regional point of interest information and the weather information as conditional feature variables, and taking the regional attraction factor, the bicycle demand factor, the taxi demand factor, the bus demand factor and the traffic speed factor as physical concept variables;
Finally, constructing a causal graph of a causal Markov process, and taking the bicycle traffic flow, the taxi traffic flow, the bus traffic flow and the regional speed of the sub-regions as traffic data observation variables; generating physical concept variables at a current time step from conditional feature variables at a current time step and physical concept variables at a previous time step, and then predicting traffic data observation variables at the current time step; describing a generation process of multi-modal traffic data observation variables using a joint distribution of physical concept variables and traffic data observation variables, and decomposing the joint distribution into a prior distribution of physical concept variables and a generation distribution of traffic data observation variables; describing the process of extracting physical concept variables from conditional feature variables and multi-modal traffic data using the posterior distribution of physical concept variables;
Step 2: building a causal Markov model by using a neural network, and solving the causal Markov process;
The causal Markov model comprises a prior network, a posterior network, a causal propagation module and a generation network; the prior network learns the prior distribution of the physical concept variables in the traffic system by using the input conditional feature variables; the posterior network learns the variational posterior distribution of the physical concept variables by using the input conditional feature variables and the multi-modal traffic data, and obtains an approximately real posterior distribution of the physical concept variables; both the prior network and the posterior network comprise a graph gated recurrent unit and share a causal propagation module; the causal propagation module inputs a causal representation of the physical concept variables, propagates the causal effect by using a learnable causal graph, and outputs a causal representation of the physical concept variables after the causal effect is propagated; the generation network inputs a causal representation of physical concept variables and outputs corresponding multi-modal traffic data observation variables;
Step 3: collecting historical data in a research region to train the causal Markov model, arranging the trained model on a traffic management system, and predicting the future bicycle traffic flow, taxi traffic flow, bus traffic flow and regional speed according to the historical bicycle traffic flow, taxi traffic flow, bus traffic flow and regional speed of each sub-region, thereby pre-warning the road congestion and assisting relevant authorities to make corresponding traffic management strategies.
Compared with the prior art, the present invention has the following advantages:
Drawings and detailed embodiments are combined hereinafter to further elaborate the technical solution of the present invention.
Due to the existence of spurious correlations among data in different conditions, excessively focusing on spatio-temporal correlations within these data may lead to unstable prediction results. Therefore, the present invention provides a multi-modal data prediction method based on a causal Markov model starting from the generation principle of multi-modal traffic data. As shown in
The following three steps are used to illustrate the implementation steps of the multi-modal data prediction method based on the causal Markov model and the effect verification of the method of the present invention:
Step 1: collecting regional data and multi-modal traffic data of a region to be studied, quantifying the data, constructing a causal graph of a causal Markov process among variables, and defining a joint distribution and a posterior distribution of physical concepts;
In an embodiment of the present invention, a region in Beijing is used as a research region, wherein a street map of the region is obtained and a regional POI is obtained therefrom, the weather data of the region is obtained from a weather station, and the shared bicycle order data, taxi order data, bus order data and road traffic speed data of the region are obtained from a traffic system; the order data is allocated to each sub-region of the research region to form multi-modal traffic flow data of each sub-region, and a mean value of the traffic speeds of vehicles on roads in each sub-region is calculated to form the sub-region speed data; the multi-modal traffic flow data and the regional speed data are collectively called the multi-modal traffic data;
In the present invention, the time position information, the regional POI information and the weather information are regarded as conditional feature variables, the regional attraction factor, the bicycle demand factor, the taxi demand factor, the bus demand factor and the traffic speed factor are regarded as physical concept variables, and the bicycle traffic flow, the taxi traffic flow, the bus traffic flow and the regional speed are regarded as multi-modal traffic data observation variables; according to the present invention, a causal graph among these variables is constructed, and a joint distribution among the variables and a posterior distribution of the physical concepts are defined; the joint distribution among the variables includes a prior distribution and a generation distribution, which are used to describe the generation process of the multi-modal traffic flow; the posterior distribution of the physical concept variables is used to describe the process of extracting a physical concept causal representation from the conditional information and multi-modal traffic data. It further comprising the steps 1.1-1.3:
Step 1.1: constructing a causal graph of the causal Markov process:
As shown in
wherein pθ represents the probability distribution, θ in formula (1) represents a system model parameter, and T represents a complete time sequence length, wherein the first item pθ(zt|zt-1, Ct) represents a prior distribution of a physical concept variable, representing a natural physical rule existing in the traffic system, which is a prior knowledge that is not affected by the current observation variable, wherein the second item pθ(xt|zt) represents a generation distribution, showing a process of generating observation data by an observation variable under the influence of a physical concept, wherein the generation distribution may be further decomposed into generation distributions of various traffic modes, which is defined as follows:
p
θ(xt|zt)=pθ(xtbike|ztbike)*pθ(xttaxi|zttaxi)*pθ(xtbus|ztbus)*pθ(xtv|ztv) (2)
Step 1.3: defining a posterior distribution among variables, wherein the posterior distribution is used to describe the potential factors influencing the generation of observation data under the condition that the external environment and the observation variable of the current system are known, which is specifically defined as follows:
Step 2: building a causal Markov model by using a neural network, and solving the causal Markov process in step 1, wherein the main purpose is to infer a causal representation of a potential core physical concept from the external environment and the observation variable of the current system; this step further comprising the steps 2.1-2.3:
Due to the high difficulty of obtaining a real physical concept posterior distribution pθ(zt<T|xt<T, Ct<T), the present invention calculates the variational posterior distribution qϕ(zt<T|xt<T, Ct<T) based on a variable Auto-Encoder (VAE) frame, thereby obtaining an approximately real posterior distribution, wherein ϕ represents a variational model parameter;
As shown in
As shown in
Step 2.1: establishing a posterior network by using the conditional information and multi-modal traffic data, wherein its purpose is to obtain an approximately real posterior distribution of a physical concept variable through learning a variational posterior distribution by using a neural network; as shown in
The graph gated recurrent unit (GraphGRU): the generation process of the multi-modal traffic flow satisfies the Markov property, and the evolution of physical concept variables is an inherent driving force for the spatio-temporal dependence of the multi-modal traffic observation data; therefore, the present invention provides a graph gated recurrent unit to model the evolution process of the system states at the current moment and the previous moment, and capture the spatio-temporal dependency into potential physical concept variables, which is defined as follows:
s
t
po,i
=FC(Ct∥xti)
r
t
po,i=σ(WriåG(stpo,i∥zt-1po,i)+bri)
u
t
po,i=σ(WuiåG(stpo,i∥zt-1po,i)+bui)
{tilde over (h)}
t
po,i=tanh(WhiåG(stpo,i□zt-1po,i)+bhi)
z
t
po,i
=u
t
po,i
□z
t-1
po,i+(1−utpo,i)□{tilde over (h)}tpo,i (4)
wherein i∈{poi, bike, taxi, bus, v} represents different traffic modes, wherein t represents a tth moment, namely, a collection time step, wherein stpo,i represents an input feature of the ith traffic mode, which is obtained by splicing the conditional information Ct ∈□N×C
Wå
G(X)+b=(I+D−1/2GD−1/2)XW+b (5)
wherein G∈□N×N represents an adjacent matrix of a regional distance, wherein D represents a diagonal matrix Dii=ΣjGij of an adjacent matrix t wherein X represents the input data, and wherein I represents a cell matrix;
The causal propagation module: physical concept variables are naturally causally related, and therefore the semantic representation of physical concepts should also be causally related; the present invention provides a causal propagation module for propagating causal effects among variables based on a learned causal relationship, which is defined as follows:
f
−1(zt)=ATf−1(zt)+εzt=f[(I−AT)−1ε]∈□N×5×d (6)
wherein A∈□5×5 represents an adjacent matrix of the casual relationship of physical concept variables, wherein element Aij represents that the causal effect of variable i on variable j is Aij, wherein zt={ztpoi, ztbike, zttaxi, ztbus, ztv}represents a physical concept variable, wherein the superscript T represents a matrix transpose, wherein ε□N(0, I) represents the random Gaussian noise, wherein I∈□H5×5 represents a unit matrix t and wherein f(·) represents any reversible transformation functions, wherein the present invention uses an affine transformation function with parameters, which is defined as follows:
wherein x represents an input vector of the function, wherein α,β∈□ is a learnable scalar parameter; as shown in
The generation of a variational posterior distribution: according to the present invention, a full connection layer is used to extract the mean μtpo and variance σtpo of the variational posterior distribution from the causal representation of the physical concepts output by the causal propagation module, and the generation process of the variational posterior distribution qϕ(zt|zt-1,xt,Ct) is as follows:
μtpo=FCμ(ztpo)
σtpo=FCσ(ztpo)
q
ϕ(zt|zt-1,xt,Ct)□N(μtpo,σtpo) (8)
wherein FCμ and FCσ represent two full connection layers, wherein N(μtpo, σtpo) represents a Gaussian distribution with a mean value of μtpo and a variance of σtpo, wherein by sampling from the variational posterior distribution, a causal representation of the posterior physical concepts may be obtained;
Step 2.2: presently, the research on VAE considers a prior distribution as an independent standard Gaussian distribution, and due to the lack of inductive bias, this unsupervised causal representation learning method cannot ensure the identifiability of the causal representation; to improve the causal identifiability of the model, the present invention uses the conditional information to establish a prior network, whose purpose is to model the physical rules naturally existing in the system by the physical concepts themselves and use the learnable distribution to approximate the rules; the present invention supervises the prior network and the posterior network, which makes the prior network better match the natural rules of the physical concepts while helping the posterior network to identify the causal representation of the physical concepts; the structure of the prior network resembles that of the posterior network, each of which is composed of a graph gated recurrent unit and a causal propagation module;
The graph gated recurrent unit: the prior network merely inputs the conditional information of the current system, which is defined as follows:
s
t
pr,i
=FC(Ct∥xti)
r
t
pr,i=σ(WriåG(stpr,i∥zt-1pr,j)+bri)
u
t
pr,i=σ(WuiåG(stpr,i∥zt-1pr,j)+bui)
{tilde over (h)}
t
pr,i=tanh(WhiåG(stpr,i□zt-1pr,j)+bhi)
z
t
pr,i
=u
t
pr,i
□z
t-1
pr,i+(1−utpr,i)□{tilde over (h)}tpr,i (4)
wherein the meaning of characters in the prior network is the same as that in the posterior network, wherein ztpr,i∈□N×d represents a prior physical concept variable of the ith traffic mode at the tth moment, wherein the superscript pr represents a prior network;
The causal propagation module: a prior network and a posterior network share a causal propagation module; in the present invention, the causal relationship is considered a stable natural phenomenon, which does not vary along the variation of time or space; therefore, a causal graph is globally shared, and the causal effect is propagated by using formula (6);
The generation of the prior distribution: the mean μtpr and variance σtpr of the prior distribution are extracted from the causal representation of the prior physical concepts output by the causal propagation module, and the generation process of the prior distribution pθ(zt|zt-1, Ct) is as follows:
μtpr=FCμ(ztpr)
σtpr=FCσ(ztpr)
p
θ(zt|zt-1,xt,Ct)□N(μtpr,σtpr) (10)
wherein FCμ and FCσ represent two full connection layers, wherein N(μtpr, σtpr) represents a Gaussian distribution with a mean value of μtpr and a variance of σtpr; by sampling from the prior distribution, the causal representation of the prior physical concepts may be obtained;
Step 2.3: the generation distribution at the moment t is shown in formula (2); the present invention uses two full connection layers to simulate the process of generating the multi-modal traffic observation data from physical concept variables; according to different types of the physical concept variables, the generation network generates traffic data observation variables corresponding to different modes;
The reconstruction process: as shown in
The prediction process: the prior network merely matches the prior distribution by using the conditional information at the current moment, which is irrelevant to the multi-modal traffic data at the current moment; therefore, when the prior physical concept variables are used to generate the multi-modal traffic data, the output is a prediction result;
Step 3: presetting a dataset D, learning the causal Markov model provided by the present invention, and then predicting the multi-modal traffic data of each sub-region in the research region by using the trained causal Markov model;
When inferring a causal representation, first, learning a variational posterior distribution of physical concepts from the historical conditional information and the multi-modal traffic data by a posterior network; modeling the natural physical rules existing in a traffic system from the historical conditional information, and learning a prior distribution of the physical concept variables by a prior network; subsequently, using KL divergence to regularize the distance between the variational posterior distribution and the prior distribution, thereby enabling the variational posterior distribution and the prior distribution to fully extract the useful information in the data; subsequently, sampling a causal representation of physical concepts from a variational posterior distribution; finally, reconstructing the input multi-modal traffic data by using a generation network and extracting a causal representation of the physical concept variables from the data using a variational auto-encoder; when predicting the multi-modal traffic data, first, using a prior network to infer a causal representation of physical concepts from the future conditional information at a future moment; finally, using the generation network to decode the causal representation of a future moment, and generating a multi-modal traffic flow at the future moment as the prediction result;
When training the model, the purpose of using the variational auto-encoder (VAE) is to minimize the KL divergence of the variational posterior distribution and the real posterior distribution of the data, wherein the derivation process is as follows:
wherein DKL,[A∥B] represents the KL divergence of the distribution A and distribution B, wherein according to the aforesaid formula, the evidence lower bound of the variational auto-encoder may be further derived, and the learning process of the causal Markov model may be converted into the dataset D to maximize the variational lower bound, wherein the derivation process is as follows:
wherein the formula (13) is a loss function of the causal Markov model, wherein the first item Eq
Three real datasets, namely, the traffic flow dataset of an urban region in Beijing, the urban road speed dataset and the external environment dataset of the urban region are used in embodiment 1 of the present invention. The details of the dataset fields are shown in Table 1.
The traffic flow dataset of an urban region in Beijing contains order records for three traffic modes (bicycles, buses and taxis) from Jun. 1, 2021 to Dec. 31, 2021. The dataset contains the following information: pick-up time, drop-off time, pick-up longitude, pick-up latitude, drop-off longitude and drop-off latitude. The research region is divided into 175 non-overlapped sub-regions. The inflow and outflow of each traffic mode in all sub-regions are counted.
The urban road speed dataset contains speed records of vehicles on the main roads in the urban region from Jun. 1, 2021 to Dec. 31, 2021. The present invention uses the average speed of each road segment within each region to represent the regional speed every 30 minutes.
The external environment dataset of the urban region collects corresponding meteorological information, time position and POI data as the conditional information. The present invention divides the dataset at intervals of 30 minutes to obtain 11753 samples. The present invention uses the historical data in 3 hours to predict data for the next 30 minutes, wherein 80% of the data is used for training, 10% of the data is used for validating, and the rest is used for testing.
The present invention uses the Pytorch deep learning framework to perform the whole experiment on a workstation equipped with 24 GB memory Nvidia GeForce RTX 3090 GPU. The number of feature channels in the causal Markov model is set to be d=64, the batch size is set to be 64, and the learning rate is set to be 0.001. The present invention adopts the Adam optimizer and multi-step learning rate decay strategy. The channel for conditional information cc=83, namely, the time position feature dimension is 56, the POI feature dimension is 5, and the weather feature dimension is 22. The time position and weather type adopt the one-hot encoding. The time position includes the day of the week, the time point of the day and whether it is a holiday. Except for the weather type, the Z-score standardization is applied to weather characteristics. For POI features, the total number of various POIs (including schools, hospitals, restaurants, office areas and shopping areas) in each sub-region is counted.
The present invention uses the root mean square error (RMSE) method, mean absolute error (MAE) method and mean absolute percentage error (MAPE) method to evaluate the performances of the model, which are defined as follows:
wherein {circumflex over (X)}i represents the predicted results, Xi represents the true results of the data, and N represents the number of regions. To achieve consistency, the present invention deploys the same environment, loss function, traffic data and external factors (namely, time factors and weather information) for all models. In the present invention, the causal Markov model is compared with advanced methods for traffic flow prediction, and the final average results are shown in Table 2.
The present invention evaluates the performances of the mean absolute error (MAE) method, root mean square error (RMSE) method and mean absolute percentage error (MAPE) method. For the sake of fairness, the present invention takes the same conditional information, traffic flow, and traffic speed as inputs for all models. Table 2 shows the overall prediction performances of the average MAE, RMSE, and MAPE for three independent experiments, and the prediction results for each mode are shown in
To verify the effect of the key components of the causal Markov model used in the present invention, an ablation experiment is performed as follows:
For the posterior network and prior network, four variants are designed:
The performances of all variant models are listed in Table 3:
It can be seen from Table 3 that, due to the lack of spatio-temporal dependencies, the performances of variants 1 and 2 are the lowest. The performance of variable 3 indicates the necessity of conditional information. Meanwhile, a model lack of the conditional information may be degraded into a common sequence variational auto-encoder. The prior network is deleted from variant 4, whose purpose is to obtain a stability principle of physical concepts, and the function of the posterior network is to obtain separated causal representations from the observation data and the conditional information. Under the supervision of the prior network, a model collapse may easily occur, resulting in the failure of obtaining a stable and effective causal representation. As shown in
Number | Date | Country | Kind |
---|---|---|---|
202211357946.4 | Nov 2022 | CN | national |