The present disclosure relates generally to prediction methods using volatile historical time series data possessing sharp and sudden peaks and valleys, and particularly real-time traffic prediction systems and methods for volatile road occupancy data.
Time-series-based prediction is an important area of focus in numerous applications. Time-series based prediction means predicting a type of information in the future, using historical values of the same type of information. Time-series-based prediction goes by many names and covers an enormous range of applications. Some common application areas include: financial prediction (e.g. predicting the value of a stock in the future based on the history and current value of the stock), traffic prediction e.g. (predicting the traffic speed in the future on a road segment based on the current and historical speeds on that road segment), retail sales prediction (e.g. predicting the amount of retail sales for a chain of stores given their current and historical sales levels), and many more.
For example, accurate short-term forecasting of traffic variables is essential for intelligent transportation systems applications, such as real-time route guidance and advanced traveler information systems. Hence, numerous modeling approaches have been proposed, including both nonparametric and parametric models.
Traffic forecasting models are usually evaluated on data from arterials and freeways, which are admittedly less variable than data from urban networks and not subject to the effects of traffic lights. In urban networks, neighborhood relationships and the definitions of spatial weight matrices for space-time parametric frameworks, are not straightforward; some locations may not be clearly upstream or downstream a given location. Furthermore, detectors can be dense in an urban network, so that locations with useful predictive information may be hard to identify; this again affects the construction of spatial weight matrices used in space-time modeling schemes. Erroneous and missing data are expected to be more frequent in urban networks, which makes essential the implementation of robust estimation procedures.
In order to achieve an acceptably good level of prediction accuracy on urban occupancy data, a new method needs to be developed.
A prediction modeling system and method for implementing forecasting models that involve numerous measurement locations, e.g., urban occupancy (road traffic) data.
The method involves a data volatility reduction technique based on computing a congestion threshold for each prediction location, and use that threshold in a filtering scheme. Through the use of this technique, significant accuracy gains are achieved and at virtually no loss of important information to the end user.
In one aspect, there is provided a method of predicting comprising: receiving a first time-series data set having one or more values for each time point to be predicted, receiving a second time-series data set of one or more values per time point with correlation to the first time-series data, estimating a functional relationship between the first time-series data and the second time-series data, for each value, over a multiplicity of time points, determining an extremal or other specified value of the functional relationship is determined of the second time-series data as a function of the first time-series data; modifying the first time-series data based on the extremal or other specified value so that first time-series data values beyond it are set to the value of the extremal or other specified solution, and predicting a future state of the first time-series data based on the modified first time-series data, wherein as programmed processing unit performs the receiving first and second time-series data, the estimating, the determining, the modifying and the predicting.
In a further aspect, there is provided a system for predicting comprising: a memory storage device, a processor in communications with the memory storage device, wherein the computer system performs a method to: receive a first time-series data set having one or more values for each time point to be predicted, receive a second time-series data set of one or more values per time point with correlation to the first time-series data, estimate a functional relationship between the first time-series data and the second time-series data, for each value, over a multiplicity of time points, determine an extremal or other specified value of the functional relationship is determined of the second time-series data as a function of the first time-series data, modify the first time-series data based on the extremal or other specified value so that first time-series data values beyond it are set to the value of the extremal or other specified solution, and predict a future state of the first time-series data based on the modified first time-series data.
In a further aspect, a computer program product is provided for performing operations. The computer program product includes a storage medium readable by a processing circuit and storing instructions run by the processing circuit for running a method. The method is the same as listed above.
Various objects, features and advantages of the present invention will become apparent to one skilled in the art, in view of the following detailed description taken in combination with the attached drawings, in which:
In fields or applications in which time-series-data is used by prediction models, there exist alternate time-series data that bears some correlation to the time-series data being predicted. As examples, the time-series data on the price of a stock may be related to macro-economic indicators; the traffic speed on a road segment is related to the traffic flow on that road segment; the amount of ice cream sales in a location may be related to the weather at that location.
A system and method is now described that leverages at least one alternate time-series data to improve the prediction accuracy of a first time-series data of interest. Broadly, there is redefined the data of interest via a projection to one or more values based on the relationship of that data to a different time-series data. The new, projected time-series data therefore has a lower volatility, while still capturing the important aspects of the information of interest. As a result of the lower volatility, prediction quality is improved by any state of the art prediction algorithm.
Generally,
The method uses a time-series data of one or more values for each time point to be predicted, and uses a second set of time-series data of one or more values per time point with correlation to the first time-series data. In one embodiment, the method includes estimating a functional relationship between the first time-series data and the second time-series data, for each value, over a multiplicity of time points. Further, the method includes determining an extremal or quantile value of the functional relationship of the second time-series data as a function of the first time-series data. The method then includes modifying the first time-series data based on the value of the prior extremal or quantile solution, in terms of the first time-series data, so that values beyond it are set to the value of the extremal or the quantile solution. The quantile value may be, for example, the first point in the second time-series data at which a given percent of the values fall below that quantile. Note that in a related traffic flow prediction example described herein below with respect to
For example, in
The system and method thus leverages an auxiliary or secondary time-series data source as a projection pre-processing step to any traffic prediction method employed. The resulting projected data leads to increased prediction accuracy while maintaining the salient aspects of the original data set as required, for rexample, by traffic management and route guidance applications.
There is now described an example prediction method that considers a time-series data of interest to be traffic occupancy levels on a road network. Traffic occupancy levels are typically detector-specific (a typical detector is an induction loop: an electromagnetic detection system which uses a moving magnet to induce an electrical current in a nearby wire) but may also be link-specific, and range from 0 to 100, for example, representing the percent of time that the detector is occupied by a vehicle in a pre-defined period of time (e.g. 5 min). When the source of the traffic occupancy data is an inductive loop detector, the occupancy measurement will be specific to that detector. If the source of the traffic occupancy data covers a road segment, e.g. through individual vehicle counts over a segment or some other form of traffic data collection, the occupancy level may represent an average occupancy over a link, or road segment. Traffic occupancy levels on a road network are typically updated in real-time, e.g. every 5 minutes, and as such constitute a time-series-based data stream.
The prediction system and method is useful to be able to predict traffic occupancy into the near-term future (e.g., 15 minutes, 30 minutes, etc. in advance for purposes of traffic regulation and traffic information and route guidance. Many algorithms are used for traffic prediction (see, e.g. Min and Wynter, 2011 and references therein). Traffic occupancy levels are known to be highly volatile and therefore difficult to predict using any known prediction algorithm.
Thus, in an exemplary embodiment, the system and method described herein define a relationship between traffic occupancy data (first time-series data) and another data stream, in this case, traffic volumes (alternate time-series data). Traffic volume data is produced in real-time like traffic occupancy data, e.g., usually on a same update frequency (e.g., every 5 min).
The importance of forecasted occupancy levels is significant for numerous applications from traffic management and signal timing adjustment to route guidance tools. Indeed, occupancy data is often available at or near signalized intersections where such applications are required.
The relationship linking real traffic volume to traffic occupancy is roughly in the form of a quadratic function as shown below in
However, in spite of the benefits accrued by using a state-of-the-art prediction methodology on many types of traffic data, occupancy levels pose a particular challenge to traffic prediction models. This is due to a number of different factors, but the high volatility of the occupancy data on urban networks is a significant one. In particular,in view of
Consider, for example,
In practice, however, in an urban road network, the occupancy levels on the far right of the distribution (e.g., see
Because the principal difficulty in achieving acceptable prediction accuracy on occupancy data stems from the volatility of the data on the right side of the distribution, the system and method herein is implemented to reduce the volatility while still maintaining the important signal in the original data. As described above, the signal needed from the data is primarily the type of state as well as the transition phase between uncongested and fully congested.
Thus, a valid volatility reduction procedure for the traffic occupancy data is provided. With that in hand, a prediction methodology may be applied (re-applied) to a new data feed, ŷ, with improved prediction performance.
The proposed approach involves a type of low-pass filtering where the cutoff threshold should be defined precisely by the point at which the fully congested state is achieved. In other words, it is sufficient for a transport management center to know that (i) either a current or predicted state is/will be fully congested, or (ii) the actual or predicted occupancy level, if it is/will be below the fully congested state. Hence a purely categorical model is not sufficient. Using a cutoff filter which is too low would negate the benefit of the occupancy prediction and a value too high would not reduce volatility sufficiently to achieve acceptable prediction accuracy.
Input to the method is the identification of the threshold level τ, at which the congested state is achieved, for every detector, s, with enough accuracy to maintain the critical occupancy level in the transition phase, yet reduce volatility enough to permit accurate prediction.
In general, a congestion threshold is a function of numerous parameters including road geometry, the location of traffic signals, etc. and can be complex to model precisely as shown in
For the prediction method, there is defined the functions qs(ys(t)) where q(t) is the volume (second or alternate or auxiliary time series data) and the occupancy is y(t) (first time series data) and s represents the detector(s), e.g., detector location(s) or network link for which a traffic condition(s) is/are sought to be forecasted. Here, for example purposes, use is made of the volume and occupancy data from detectors in the example city (e.g. Lyon, France). Due to the high variability of the data, two robust estimation approaches for qs(ys(t)) were tested. Both methods make use of parametric quantile regression, defined as solving an expression as follows:
Quantile regression is beneficial in this setting, and offers different results from a mean regression because of the asymmetry of the conditional density and the influence of the dispersion of the flow values as occupancy increases. In this setting, ζ are second-order functions with zero intercept. In one embodiment, ρ=0.5 which computes a median regression. In a second embodiment, a more conservative approach is taken and estimates the outer envelope of the data. In one embodiment, there is used ρ=0.9 to represent the 90th quantile as a proxy for the outer envelope.
Hence, only one projection threshold is needed, above which higher traffic occupancies are projected to the threshold. The threshold in this case represents the level at which the congested traffic state is reached. It is important to have predictions of the traffic occupancy for various purposes, but if the traffic state is considered “congested” then it is enough to know that it is “congested” and the precise occupancy level at or after that point is not of use. On the other hand, it is very important to know the occupancy level before that point of congestion so that control action can be taken in a timely fashion.
Therefore, the use of the alternate time series data is to enable the establishment of the congestion threshold for each detector. The real-time and historical occupancy data are then projected to that threshold for all values equal to or above the threshold. Prediction is performed in the new, projected data. Because the data exhibits less volatility, prediction quality is in general considerably improved, independently of the prediction technique employed.
The next step in the method involves obtaining the argmax, τs:=argmaxqs(ys) , of each calibrated curve, for every detector, s. Hence, τs represents the occupancy level at which the fully congested state occurs at detector s. Then, the congestion threshold method performs a unidimensional projection of the occupancy level onto that threshold according to the following expression:
ŷ
s
={y
s,τs}−,
where {·}− is the min operation, i.e., the minimum of the two values within the { }.
For example,
τ=argmax q(y)
ŷ
s(t)=min{y(t),τ)
The corresponding volume time series data obtained from the detector s for the same example time period is shown in the plot 100 of
Thus, alternately stated, the computer-implemented system and method herein transforms continuous variables and the corresponding forecasts (irrespective of the model used to produce them) to hybrid continuous-ordinal variables, by projecting values larger (or smaller) than location-specific (congestion) thresholds to these thresholds. For example, after a threshold in occupancies is reached, forecasts are as accurate as long as they are equal or larger than this threshold.
The method thus computes ŷs as the new filtered occupancy data for every detector s. Prediction of occupancy using the ŷs makes use of the prediction method described herein above. Comparative results are now presented.
More particularly,
The pair of bar charts in
Thus, the system and method leverages at least one alternate time-series data to improve the prediction accuracy of a first time-series data of interest. The method redefines the data of interest via a projection to one or more values based on the relationship of that data to a different time-series data. The new, projected time-series data therefore has a lower volatility, while still capturing the important aspects of the information of interest. As a result of the lower volatility, prediction quality is improved by any state of the art prediction algorithm.
The method is applicable to perform accurate predictions for all times of time series data, e.g., financial data. In general, financial data, such as stock prices, are highly volatile. However, in many cases it is not necessary to predict accurately the full range of stock ticker prices, but only the price in between one or two thresholds. For example, if stops are put in place wherein a stock would be bought if the price falls to some level or sold if it rises to some level, then it would be useful to predict the stock price in between those levels but not necessarily above or below those levels. In order to use the present methods, a secondary source of data would be needed to determine what those levels should be, and then the financial data would be projected from below to the lower level and/or from above to the higher level. The prediction algorithm would then be run on the projected data.
In one embodiment, a predictive modeling strategy employed divides traffic dynamics into two basic components: a location specific daily profile and a term that captures the deviation of a measurement from that profile. For traffic volumes, a daily profile is expected to be shaped as an asymmetric “M” whereas for speeds as an asymmetric “W”. Let d be the day-of-the-week index, s the location index and t the time-of-day index. The overall model structure for a traffic variable y is governed by equation 1) as follows:
y
d,s(t)=μd,s(t)+xd,s(t) (1)
where d=1, . . . , D, s=1, . . . , S, and t=1 . . . , T. S represents the number of locations for which traffic conditions are sought to be forcasted, and T is the total number of time intervals per day. D may be less than seven if there is sufficient evidence of similarity of traffic dynamics for two (or more) days of the week.
The profile μd,s captures the daily trend and can be viewed as a baseline forecasting model that is based only on historical data and neglects information from the recent past of the process. μd,s can be obtained by some form of weighted average that weighs more heavily recent historical data, principal component analysis, wavelet based decomposition or by an exponential smoothing filter. Decompositions are adopted very frequently in time-series analysis and within the context of short-term traffic forecasting are expected to lead to superior performance compared to models applied directly to traffic variables.
The second stage of the modeling procedure concentrates on the dynamics of the (short-term) deviation from the historical daily profile and adopts a regime-switching modeling framework. Specifically, for each location s a space-time threshold autoregressive model is adopted to account for transient behavior according to equation 2) as follows:
where
for rd,s=1 . . . , Rd,s+1 and a convention is used such that T0=0 and
The index rd,s specifies the operating regime. The thresholds
separate and characterize different regimes and in general may differ for different locations in the road network and different days of the week. In one embodiment, the number of thresholds and their magnitude are unknown quantities that need to be estimated.
The above predictive equation contains an intercept term that varies with location, traffic-regime within a day and day of the week. Ns is the number of neighboring locations of s that may provide useful information (at some previous time instances) with regard to short-term forecasting performance and p is the autoregressive order (maximum time-lag) of the model. Hence the first sum in (2) contains information on the recent past of the location of interest whereas the second sum contains information from its neighbors. The α's are unknown coefficients that need to be estimated; the statistically significant ones in the second sum signify which temporal lags of a neighboring location are expected to provide useful information with regard to short-term forecasting. The i in the expression (t−i) refers to the time lag, i.e. a time stamp prior to time t in terms of a number of periods. For instance, if i=2, then t−i is two time periods prior to time t. Finally, ε is assumed to be a martingale difference sequence with respect to the history of the time series up to time t−1; hence, it is assumed a serially uncorrelated (but not necessarily independent) sequence and its variance is not restricted to be equal across regimes.
The above model defines a threshold regression per measurement location, with an unknown number of regimes. Time-of-day is the threshold variable that defines subsamples in which the regression relationship is stable. Within regime rd,s, (2) is a linear regression model that can be estimated using existing methods such as minimizing the least squares deviation (OLS, also known as the L2 norm) or the least absolute deviation (LAD, also known as the L1 norm). However, direct estimation is expected to be inefficient as a fraction of the predictors will not contribute significantly to the predictive power of the model. Furthermore, direct estimation may be problematic (the variances of the estimated coefficients may be unacceptably high) or even infeasible due to multi-collinearity, especially when p and Ns are large.
In one embodiment, estimation and model selection per regime take place simultaneously for each location, using lasso penalized regression which enforces sparse solutions in problems with large numbers of predictors. Lasso is a constrained version of ordinary estimation methods and at the same time a widely used automatic model building procedure. Given a loss function g(.), lasso penalized regression within regime rd,s can be phrased as minimizing the criterion according to equation 3) as follows:
where, given that historical traffic data from Dw past weeks are available, for lad-lasso
whereas for conventional lasso
The second component of the sum is the lasso penalty term which shrinks coefficients toward the origin and tends to discourage models with large numbers of marginally relevant predictors. In one embodiment, the intercept αd,s is ignored in the lasso penalty, whose strength is determined by the positive tuning constant λ.
In one embodiment, the use of penalized estimation allows considerable flexibility with regard to the specification of matrices that define neighboring relationships in a road network. Using a modeling framework similar to those known in the art, different such matrices per regime and per time-lag of the model are defined at a pre-processing stage which would have been tedious for large S. By using a “lasso” technique there is defined a matrix that contains all neighboring associations that are relevant to the chosen autoregressive order. The automatic model selection feature of lasso shrinks towards zero the coefficients that correspond to non-significant time-lags of measurements taken at neighboring locations to the one modeled.
The gains resulting from implementing this prediction method come at the cost of a substantially increased number of predictors in the linear specification. The influential ones are identified by a two-step penalized estimation scheme, namely adaptive least absolute shrinkage and selection operator (LASSO); for recent applications of penalized estimation in transportation problems, the reader may consult.
In the forecasting experiments models estimated can be combined using: (i) the adaptive LASSO which performs L1-penalized minimization of squared residuals and (ii) the adaptive LAD-LASSO which produces L1-penalized least absolute deviation estimators. The latter are essentially median regression estimates which have been found to be particularly effective in terms of forecasting performance when response variables possess skewed response distributions that may contain outliers
It is understood that the congestion threshold calculations may be used in conjunction with other prediction methods in addition to the approach described herein above. For example, simpler methods as well may be appropriate, e.g., simple extrapolations from historical data (such as averages of values of the traffic parameter in the past), other statistical methods, be they linear regression or nonlinear methods such as neural networks, etc.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more tangible computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable medium(s) may be utilized. The tangible computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a system, apparatus, or device running an instruction. The computer readable medium excludes only a propagating signal.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a system, apparatus, or device running an instruction.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing. The computer readable medium excludes only a propagating signal.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may run entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which run via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which run on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more operable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be run substantially concurrently, or the blocks may sometimes be run in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The embodiments described above are illustrative examples and it should not be construed that the present invention is limited to these particular embodiments. Thus, various changes and modifications may be effected by one skilled in the art without departing from the spirit or scope of the invention as defined in the appended claims.