Method for Predicting Travel Destinations Based on Historical Data

FIELD OF THE INVENTION

The present invention relates generally to predicting travel destinations, and in particular, basing the predictions on historical data.

BACKGROUND OF THE INVENTION

Navigation systems are replacing paper maps and charts to assist drivers and captains navigate through unfamiliar areas to unfamiliar destinations. Most navigation systems include a global positioning system (GPS) to determine an exact location of a vehicle, boat or plane. As an advantage, data in navigations system can be continuously updated, augmented with additional en-route information, and easily transferred between systems.

Typically, a destination is set by the operator or a passenger. The destination can be based a location name, address, telephone number, a pre-selected geographical point selected from a list of pre-registered destinations, and the like. The knowledge of a particular route, in conjunction with status and environment data, e.g., traffic, and weather, can be used to assist the operator navigator to a particular destination.

U.S. Pat. No. 7,233,861 describes a method for predicting destinations and receiving vehicle position data. The vehicle position data include a current trip that is compared to a previous trip to predict a destination for the vehicle. A path to the destination can also be suggested.

U.S Patent Publication 20110238289 describes a navigation device and method for predicting the destination of a trip. The method determines starting parameters including starting point, starting time and date of the trip. A prediction algorithm is generated by using information of a trip history.

U,S Patent Publication 20130166096 describes a predictive destination entry system for a vehicle navigation system to aid in obtaining a destination for the vehicle. The navigation system uses a prior driving history or habits. This information is used for making predictions for the current destination desired by a user of the vehicle. The information can be segregated into distinct user profiles and can include the vehicle location, previous driving history of the vehicle, previous searching history of a user of the vehicle, or sensory input relating to one or more characteristics of the vehicle.

SUMMARY OF THE INVENTION

The embodiments of the invention provide a method in a navigation system, for predicting travel destinations according to a history of destinations. A model used for the prediction incorporates a database of destinations, which can include favorite, i.e., most probable, destinations for a user.

The model also uses a context that can include features such as a current time of day, day of week, current location, current direction, past location, weather, and so on. The model infers the destination even when the destination is not known precisely.

Specifically, a method predicts destinations during travel, based on feature vectors representing current states of the travel, probabilities of categories of the destinations using a predictive model based on previous states of the travel. A subset of the categories with highest probabilities are output for user selection.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flow diagram of a method for predicting travel destinations based on historical data according to embodiments of the invention;

FIG. 2 is a hierarchical destination category prediction model according to embodiments of the invention; and

FIG. 3 is a destination category prediction model with destination category dependencies according to embodiments of the invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Introduction

The embodiments of our invention provide a method in a navigation system, for predicting travel destinations according to a history of travel activity. In the examples described herein, the travel is performed by a vehicle. However, it is understood that other modes of travel can also be predicted by the methods described herein. The methods can be performed in a processor connected to memory, input/output interfaces connected by buses. Output devices can include displays or speakers to indicate the destinations to a user. Input devices can include location trajectories from a global positioning system (GPS) touch screens, keyboards and voice recognition systems to select a specific destination.

Method Overview

The method acquires navigation data 101, (vehicle) system bus data 102, weather data 103, and derived data. Some of the derived data can be obtained from the vehicle navigation system, vehicle buses, and weather data 101-103. The navigation system can include a GPS, as well as a wireless internet connection to various information servers. A vehicle bus is defined as any specialized internal communications network that interconnects components inside a vehicle (e.g., automobile, bus, train, industrial or agricultural vehicle, ship, or aircraft). The data are synchronized 110, and features are extracted 120 as feature vectors 121. Each feature vector collectively represents a previous state of the travel for some past time.

Training

During a training phase 155, which can be one time, intermittent, periodic or continuous, the features are stored in a training data base 151. The training also maintains a destination database 150 containing the locations, address, names, identifiers, categories associated with specific destinations, such as businesses, government facilities, residences, landmarks, and other geographically located entities. Such destination databases can also be located on a server. The destination categories can contain any semantic information relevant to destination selection, such as its type, quality, availability, and so on.

During the training, probabilities of destination categories are inferred 153. That is the probabilities are associated with categories of destinations. Probabilities of destination categories should not be confused with the identities of the destinations as usually found in prior art systems. The training also determines 152 observed trajectories during travel. In cases where the actual destination of the user is not known via the navigation user interface, the observed trajectories are used to infer a probabilities associated with each destination and its associated categories. The trajectories and the probabilistically inferred destination and destination categories that are inferred during training are used to construct a predictive model 160.

Operation

During operation, similar features of current states of actual travel are acquired in real-time and processed by a predictive procedure 130 to obtain probabilities 131 of destinations destination categories, and related actions, such as telephoning the destination. The predicted destinations, categories, and actions with the highest probabilities, e.g., the highest three, are displayed 140 or presented to the user as selection by other means on an output user interface 141, such as speech output. The number of selections with highest probabilities displayed can be user specified. Then, the user can select 142 a destination, destination category, or action using an input user interface, and routing information or a trajectory 143 can then be generated during the travel to the selected destination.

Theoretical Justification

The invention is based on the intuition that travelers exhibit regularity in their destination sequences, e.g.

home→drink/snack→work→store→home.

The embodiments of this invention take as input features derived from the current and past trajectories, such as the previous destinations, destination categories, as well as the time of day, day of week, status of the trip, direction of travel, and so-on. Prediction is treated as an inference task with variables representing the destination, and destination categories, as well as the ultimate location of arrival. When only the arrival location is observed, the training algorithm can infer the destination and destination categories as hidden variables.

Simplified Model

In a pair of random variables {x, s}, x represents the feature vector and s represent a location, e.g., longitude and latitude, e.g., an end point of a segment of a trip.

The feature vector x=[x₁, . . . , x_F] includes a trajectory identification (ID), segment ID for each trajectory, points ID for each segment, elevation, time, speed, and direction, and possibly their statistics, e.g., average, mean, deviation, etc., collectively a state of travel.

We infer a multinomial category (or “genre”) zε[1, . . . , C] of a destination d, which is a multinomial that indexes possible destinations from destination database, or a “favorite” destination obtained from a user.

We formulate this as a multinomial logistic regression model:

$p (z  x) = \frac{\exp (λ_{z}^{T} φ (x))}{Z_{A}},$

where Λ=[λ₁, . . . λ_C]^T, λ are weights, and Z_Λ=Σ_zexp(λ_z^Tφ(x)), and φ(x) is a vector-valued function of our input features x. Depending on the type of inference, we can also use a multinomial probit regression model, which is similar to multinomial logistic regression but may be more convenient for sampling-based methods.

Our intuition assumes that after the user has selected a category c with higher probabilities. the user most likely will select a destination d that is from that category:

$p (d  z) \propto {\begin{matrix} 1 & z \in cat (d) \\ 0 & otherwise \end{matrix},$

where “cat” is the set of categories identified with the destination d. This is a uniform multinomial over destinations consistent with the category c.

We assume that the user parks the vehicle at location s near the selected d. This can be modeled as

p(s|d)=N(s;loc(d),Σ),

where Σ=σ²I₂and σ is the standard deviation of the distance a person parks away from their destination, and loc(d)=[d_lat, d_lon]^Tis the location (latitude and longtitude) of the point of interest d.

Model Training

For training 155 such a model 160, we consider for example pairs x_i, s_i, where x_iis in the middle of a segment. An objective function to train is

$\begin{matrix} \begin{matrix} = \max_{Λ} \sum_{i}^{} \log p (x_{i}, s_{i}) = \sum_{i}^{} \log \sum_{z_{i}, d_{i}}^{} p (z_{i}  x_{i}) p (d_{i}  x_{i}) p (s_{i}  d_{i}) \\ = \sum_{i}^{} \log \sum_{z_{i}}^{} \frac{\exp (λ_{z_{i}}^{T} φ (x_{i}))}{\sum_{z}^{} \exp (λ_{z}^{T} φ (x_{i}))} \frac{1}{\langle _{i} (z_{i}) \rangle} (2) \\ \sum_{d_{i} \in _{i} (z_{i})}^{}  (s_{i}; loc (d_{i}), \sum) \end{matrix} & (1) \end{matrix}$

where we only sum over a set of destination

D
_i(z_i)={d_i: |loc(d_i)−s_i|<5σ, and z_iεcat(d_i)},

because p(z|d) and/or p(s|d) are zero or relatively small outside this set.

Regularization Approach

Logistic regression benefits with some form of L₁and/or L₂regularization. Transforming the features to a lower-dimensional subspace can also improve generalization performance.

The transformation model is

$p (z  x) = \frac{\exp (λ_{z}^{T} A φ (x))}{Z_{A, Λ}},$

where A is a (R×F) matrix that is shared for all classes and all users. Usually, R<F to perform dimensionality reduction.

As an objective function, the model is

$\begin{matrix} \begin{matrix} \max_{A, Λ} \sum_{i}^{} \log p (x_{i}, s_{i}) = \sum_{i}^{} \log \sum_{z_{i}, d_{i}}^{} p (z_{i}  x_{i}) p (d_{i}  x_{i}) p (s_{i}  d_{i}) \\ = \sum_{i}^{} \log \sum_{z_{i}}^{} \frac{\exp (λ_{z_{i}}^{T} φ (x_{i}))}{\sum_{z}^{} \exp (λ_{z}^{T} φ (x_{i}))} \frac{1}{\langle _{i} (z_{i}) \rangle} (4) \\ \sum_{d_{i} \in _{i} (c_{i})}^{}  (s_{i}; loc (d_{i}), \sum) \end{matrix} & (3) \end{matrix}$

We add L₁and L₂regularization so that the objective function becomes

$\begin{matrix} \begin{matrix} \max_{A, Λ} \sum_{i}^{} \log p (x_{i}, s_{i}) = \sum_{i}^{} \log \sum_{z_{i}}^{} \frac{\exp (λ_{z_{i}}^{T} φ (x_{i}))}{\sum_{z}^{} \exp (λ_{z}^{T} φ (x_{i}))} \frac{1}{\langle _{i} (z_{i}) \rangle} \\ = \sum_{d_{i} \in _{i} (c_{i})}^{}  (s_{i}; loc (d_{i}), \sum) - \\ α \sum_{z}^{} \langle λ_{z} \rangle - β \sum_{z}^{} {\langle λ_{z} \rangle}_{2} (6) \end{matrix} & (5) \end{matrix}$

where α=0.5 and β=0.5 are optimal for the regularization of the parameters of the model. We do not add regularizers to A.

Probit Model for Category Prediction

Instead of modeling p(z|x) using logistic regression, we find it useful to use probit regression, which can be easier to handle from a generative model point of view. We use an auxiliary variable YεR^C×Nthat we regress onto with data x and the parameters (regressors) wεR^C×N. Following a conventional noise model ε: N(0,1), which results in y_ci=w_cφ(x_i)+ε, with w_cthe 1×N row vector of class c regressors and φ(x_i) the N×1 column vector of inner products for the ith element, leads to the following Gaussian probability distribution:

p(y_ci|w_c,φ(x_i))=N(w_cφ(x_i),1).

The link from the auxiliary variable y_cito the discrete target category of interest z_iε1, . . . , C is

z
_i
=j,if y_ij>y_ij′,∀j≠j′,

and by the following marginalization

p(z_i: =J|w,φ(x_i))=∫p(t_i=j|y_i)p(y_i|w,φ(x))dy_i,

where p(z_i=j|y_i) is a delta function, results in the multinomial probit likelihood

$p (z_{i} = j  w, φ (x)) =  {\prod_{j^{'} \neq j}^{} Φ (u + (w_{j} - w_{j^{'}}) φ (x_{i}))},$

where E is the expectation taken with respect to the conventional normal distribution

p(u)=N(0,1) and Φ

is the normal cumulative density function.

Category Prediction Model

Recall, we have {x_i,s_i}_i=1^N, where x_iεR^Dis the D-dimensional feature vector and s_iis the location of the end point. We want to predict the category for each time i. For each category, we can construct either a linear classifier or a non-linear classifier. For linear case, φ(x_i)=x_i, and for the non-linear case, φ(x_i)=[K(x_i,x₁),K(x_i,x₂), . . . , K(x_i,x_N)] where K(.,.) is a kernel function.

The regressors w_icfollow a conventional normal distribution with zero mean and variance α_ic⁻¹where α_icfollows a Gamma distribution with hyperparameters τ,v. By setting τ,v to be sufficiently small values, e.g., (<10⁻⁵), only a small subset of the regressors w_ncare non-zero, subsequently leading to sparsity.

We assume that for each category c, there is a unique distribution μ_cover the destination {{circumflex over (d)}_n}_nεL_c, where L_cis the inferred destinations 153 whose categories include c, and {circumflex over (d)}_ndenotes the destination indexed as n.

The model of the final destination d_iis obtained from a multinomial-Dirichlet (Dir) distribution. With the assumption that one parks the vehicle near the destination. We model s_iusing a Gaussian distribution with the mean of the location of the selected destination d_i, and variance σ²I₂, σ²can be fixed or further imposed with a Gamma prior probability.

FIG. 2 shows our model graphically with the variables as described herein and summarized as follows:

$\begin{matrix} y_{ic} ~  (w_{c}^{T} φ (x_{i}), 1) \\ w_{c} ~  (0, α_{ic}^{- 1}) \\ α_{ic} ~ Gamma (τ, v) \\ z_{i} = c, if y_{ic} > y_{ij}, \forall c \neq j \\ d_{i} ~ \sum_{n \in ℒ_{d_{i}}}^{} μ_{d_{i} n} δ_{{\hat{d}}_{n}} \\ μ_{c} ~ Dir (\frac{γ}{\langle ℒ_{c} \rangle}, \dots, \frac{γ}{\langle ℒ_{c} \rangle}) \\ s_{i} ~  ({loc}_{(d_{i})}, σ^{2} I_{2}) \\ σ^{2} ~ InverseGamma (a_{0}, b_{0}) \end{matrix}$

We can also learn the parameters for each categorized destination as a user preference. However, this may take more training data to learn. In this case, we need to include a hierarchy of information about the categorized destinations to constrain this further. For example we can have a “genre” g, and a “name” or “brand” b, (e.g., “starbucks” versus “dunkin donuts”), and the actual destination d, e.g., a particular “starbucks” at a particular address. We can formulate these as a tree structure: c→g→b→d, and the relationships can be deterministic. bεbrand(d), gεgenre(b), cεcat(g).

We formulate these as sets in case there are more than one tags associated with each item, but in general each item in the tree has a single parent. This way the user's preferences for genres and brand names can be included without having to learn parameters at the level of actual locations d.

$p (b  g) \propto {\begin{matrix} π_{b  g} & g \in genre (b) \\ 0 & otherwise \end{matrix},$

We can also include other users data, so we can formulate a global prior

p(π)=Dir(π;γ),

to constrain these probabilities.

Location Prediction

If we want to predict the next location to which a user will travel from previous locations, then we can consider clustering locations to reduce the complexity of inference. We use a discrete set of clustered regions, rεR. We can infer the current region r_igiven previous regions using an N-gram based Markov model p(r_i|r_i−1, r_i−2, . . . , r_i−n+1), where n is the order of the Markov model, and an N-gram is the sequence of regions, r_i,r_i−1,r_i−2, . . . ,r_i−n+1. N-gram models can be smoothed to provide probabilities for unseen N-grams.

We can also consider a model in which users travels to nearby regions:

$p (r_{i}  r_{i - 1}) = \frac{ (loc (r_{i})  loc (r_{i - 1}), \sum_{region}^{})}{\sum_{r_{i}^{'}}^{}  (loc (r_{i}^{'})  loc (r_{i - 1}), \sum_{region}^{})} .$

We can also consider combining these via an auxiliary random variable o_iwhich indicates whether the user travels to nearby locations, or via the Markov dynamics above:

$p (r_{i}  o_{i}, r_{i - 1}) = {\begin{matrix} \frac{ (loc (r_{i})  loc (r_{i - 1}), \sum_{region}^{})}{\sum_{r_{i}^{'}}^{}  (loc (r_{i}^{'})  loc (r_{i - 1}), \sum_{region}^{})} & o_{i} = 1 \\ π_{ri  r_{i - 1}} & otherwise \end{matrix},$

Combining this with a prior probability p(o_i), and assuming that the r_iare observed, we can optimize the objective function to learn π_r_i_|r_i−1:

$\begin{matrix} \begin{matrix} \log p (r_{i}) = & \sum_{i}^{} \log \sum_{o_{i}}^{} p (o_{i}) p (r_{i}  o_{i}, r_{i - 1}) \\ \geq \sum_{i}^{} \sum_{o_{i}}^{} q (o_{i}) (\begin{matrix} \log p (o_{i}) + \log p (r_{i}  o_{i}, r_{i - 1}) - \\ \log q (o_{i}) \end{matrix}) . (8) \end{matrix} & (7) \end{matrix}$

Because of the redundancy between the two components, it may not work well to learn p(o_i), and may be better to set it using cross-validation, or to place a Dirichlet prior probability on it to favor a uniform distribution.

Discriminative Model for Region Prediction

It may be difficult to combine other context features, such as time of day and so on in an N-gram model for region prediction. As an alternative, we can use a classifier based approach such as a logistic regression or the probit regression model described above. In this case, we can define p(r_i|x_i) in a similar way as p(z_i|x_i). The features x_iin this case contains features representing the previous destinations r_i−1,r_i−2, . . . , r_i−n+1, in addition to any other features used for category prediction.

Location Dependence for Destination Category Selection

We can also model the dependency between the predicted region r, predicted category z, and the destination d. The region prediction and category prediction can be combined through a destination likelihood as follows:

$p (d_{i}  r_{i}, z_{i}) \propto {\begin{matrix}  (loc (d_{i})  loc (r_{i}), \sum_{dest}^{}) & z_{i} \in cat (d_{i}) \\ 0 & otherwise \end{matrix},$

Destination Database Dependency

We can have more than one destination database 150 and the databases can have different importance in determining user destinations. In particular, users can have a collection of “favorite” destinations. Here, we treat these as a database of destinations that has a higher prior probability than those from a generic database. Therefore, we use a multinomial random variable f₁: Mult (λ) that indicates the database selected by the user for predicting a destination for trip segment i. To implement the selection of the destination database, we define the set L_c,kas the library of the destinations from database k whose categories include c. Then

$d_{i} : \sum_{n \in L_{z_{i}, f_{i}}} μ_{d_{i} n} δ_{{\hat{d}}_{n}},$

where {circumflex over (d)}_ndenotes the destination indexed as n.

The data is assumed to be distributed according to the model:

destination index probability λ:Dirichlet(η)

variance parameter σ²:InverseGamma(c₀,d₀)

destination probability μ_c:Dirichlet(γ)

regressor w_c: N(0, α_c⁻¹I_N)Gamma(α_c; a₀,b₀)

For each point i=1, . . . , N

- destination database index f_i:Multinomial (λ)
- latent variable y_ic:N(w_c^Tφ(x_i),1)
- index z_i=c if y_ic>y_ij∀c≠j
- destination

$d_{i} : \sum_{n \in L_{z_{i}, f_{i}}} μ_{d_{i} n} δ_{{\hat{d}}_{n}}$

- parking location s_i: N(loc(d_i), σ²I₂)

FIG. 3 shows a destination category prediction model with destination database dependency with variable as defined herein.

Unsupervised Region Modeling

In the above model, regions are treated as pre-defined locations derived either by tiling the geographic space, or clustering destinations and/or locations frequently traveled to by users. It is a reasonable extension to consider the spatial distribution over destination locations as a region model. In this case, the locations of the regions can be learned in the context of the model in an unsupervised way.

Trajectory Modeling

In the above model, location prediction is based on region history. The prediction can also be based on geographic features including direction of travel, road segments, distance along route, ease of navigation to destinations given current route and map information, traffic information. Such modeling is a reasonable extension of the method, to improve prediction and generalization to new locations.

Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications can be made within the spirit and scope of the invention. Therefore, it is the object of the appended s to cover all such variations and modifications as come within the true spirit and scope of the invention.

Method for Predicting Travel Destinations Based on Historical Data

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

US Classifications

International Classifications

Abstract

Description

Claims