The present invention relates to a system, method and non-transitory computer-readable medium for context modeling of user behavior and machine learning of the user behavior in order to optimize user behavior across users, context, and content with different kinds of behaviors.
User behavior is commonly defined as the user's predictable pattern for a given context. Thus, context modeling plays a critical role in user behavior learning. Context modeling can be used in user segmentation, recommendation, and other contextual and personal services. For a next trip prediction, it would be significant if we were able to model the temporal and spatial information with rich semantics based on user behavior, so we are able to predict the user's next trip anywhere and anytime.
Machine learning can be used to learn user behavior from the data, and thus predict a next destination, trip and route. With machine learning, a vehicle can be preconditioned before a trip, a user can be notified of the right time to leave based on real time traffic, and setup of navigation can be assisted right before the trip. Further, relevant traffic information can be updated during the driving along a route, relevant information about the predicted destination can be suggested, e.g., parking, alternative route and last mile information. In the contextual and personal service, it is highly expected to model the context, e.g., temporal and spatial information, being able to model user behavior, and then build a predicted model. The present invention proposes an effective algorithmic framework for context modeling based on user behavior in which the prediction is the objective.
The most commonly used context information is time and location, also called temporal and spatial information. There is a difficulty in modeling this context information, however, due to sparse data in large temporal and spatial space. An association rule mining approach has been applied to predict a user's trip, but it has a limited ability to predict the user's next destination due to lack of contextual modeling covering all the time and locations, as well as their semantics.
Sentiance has developed a deep learning based solution that is trained to encode geo-spatial relations and semantics similarities describing a location's surroundings. See, e.g., https://www.sentiance.com/2018/05/03/venue-mapping/. A challenge with this solution, however, is that the location is semantically modeled by geographical similarity without taking the user's interaction and behavior into account, which limits the performance of the user experience.
Though there are some known instances of context embedding, such as location, known forms of context embedding do not provide an algorithmic framework for context and content modeling that aims to optimize user behavior across users, context, content with different kind of behaviors. The present invention provides such an algorithmic framework. With context modeling, a system is able to better understand the user's context and behavior, and provide and improve the contextual and personal experience, e.g., recommendation, prediction, user segmentation, and other contextual and personal services. The approach proposed herein is a novel and effective algorithmic solution for modeling rich semantics of context and content across users with different kind of behaviors.
Other objects, advantages and novel features of the present invention will become apparent from the following detailed description of one or more preferred embodiments when considered in conjunction with the accompanying drawings.
User behavior is defined as taking certain action on certain content in a given context, e.g., drive to a destination for a given date, time, location, etc., or purchase an item for a given date, time, location and other context or desire. User interactions are further described below.
An interaction record is an item in the interaction set T={I1, I2, . . . , }, where In(1≤n≤Q) denotes a kind of user interaction such as departure, arrive, purchase, etc. Content O is a set of items that the user interaction is applied on, let O={O1, O2, . . . , OK} where On(1≤n≤K) represents a content such as an item.
With regard to context, given a contextual feature, set F={f1, f2, . . . , fP} a context Ci is a group of contextual feature-value pairs, i.e., Ci={(x1:v1), (x2:v2), . . . , (xL:vL)}, where xjϵF and vn is the value for xn(1≤n≤L).
A user behavior record (instance) ri=<Ii, Oi, Ci> is composed of a user interaction from interaction, content and context denoted as I, O and C. A user behavior matrix R=(r1 r2 . . . rT)t, where R is a sequence of user behavior records ordered by timestamps. User modeling is a learned user pattern from user behavior matrix R.
Feature modeling is further described below. User behavior is defined as taking certain action on certain content in a given context. All user interaction I, content O, and context C are modeled to construct the feature modeling layer consisting of the raw input.
In order to give a quantitative analysis of user interaction/content/context, representation learning is applied to generate an “embedding” vector for different objects. An embedding is a mapping of a discrete categorical variable to a vector of continuous numbers. The semantic distance or similarity between different objects such as two locations, two users, two words or sentences, or even two timestamp can be determined from the embedding. Normally, embedding can be trained in a data-driven framework to preserve the semantic meaning of objects.
For a trip prediction task, for example, we assume we have the user interaction set {I1, I2, . . . , }, content set {O1, O2, . . . , OK}, and context set {C1, C2, . . . , CP}. The embedding matrix of the raw features can be modeled as follows:
E({I})=[[I1,1,I1,2, . . . ,I1,H], . . . ,[IQ,1,IQ,2, . . . ,IQ,H]]
E({O})=[[O1,1,O1,2, . . . ,O1,H], . . . ,[OK,1,OK,2, . . . ,OK,H]]
E({C})=[[C1,1,C1,2, . . . ,C1,H], . . . ,[CP,1,CP,2, . . . ,CP,H]]
where H is the pre-defined feature size of the embedding vector, and Q, K, P is the size of user interaction, content, and context, respectively. Both of the time and location information associated with the user behavior may be represented, for example, by a 128 dimension space.
The context embedding is further described below. In the feature embedding layer, our goal is, given the raw input consisting of an index of interaction embedding I, content embedding O, and context embedding C, to construct the feature embedding layer for behavior record r. Let a tuple (q, k, p) be the input so that each element is the lookup index of user interaction I, content O, and context C respectively. The feature embedding is represented by a (3, H)-size matrix where each row is a (1, H)-size vector extracted from different embedding matrix based on the index q, k, p, where H is, for example, 128. To simplify the understanding, we just use one embedding for content and one for context, but this can be extended to multiple context and content embeddings. We applied linear transformation to generate the behavior record r as feature embedding as follows:
r=concatenateaxis=1(E(Iq),E(Ok),E(Cp))×w+b
where concatenateaxis=1 ( ) is an operation that concatenates the three (1, H)-size vectors along axis 1 to generate a (3, H)-size matrix, and w and b are linear transformation parameters that need to be trained. This feature is illustrated in
The behavior embedding is further described below. Given user's behavior record r through behavior modeling, we have user behavior R that consists of a sequence of user behavior record ordered by timestamps. Assume the user has T number of behavior records, we concatenate all r 501 along axis t to generate a (H, T)-size matrix 502: R=(r1 r2 . . . rT)t, as shown in
The supervised training is further described below. Once we have the user behavior matrix R that is (H, T)-size, we can apply a self-attention-mechanism based network to train the network by optimizing the objective function for a given task. A self-attention mechanism was introduced by Google in 2017 that can learn the representation learning (embedding) of sequential input. Using this framework, natural language understanding can be used and surrounding words can give context and influence each other in attention modeling. Using the self-attention mechanism, the calculation of user embedding layer is simplified as follows:
f(Q,Ki)=QTKi
a
i=softmax(f(Q,Ki))=exp(f(Q,Ki))/(Σj exp(f(Q,Kj))
Attention(Q,K,V)=ΣiaiVi
where Q, K, V represents the query, key, value separately that are concepts used in the attention mechanism, and here Q=K=V=R. After calculation, the output is a (H, H)-size matrix that represents personal feature used for supervised training for a given task. Thus, we have a fixed-length personal feature that can be fed into the downstream task training. In our experiment, we apply this approach for predicted destination.
As described below, telemetry and annotation are used as an example application of the above-described modeling. User ID-based, user permission is required to cope with user privacy and General Data Protection Regulation (GDPR). In this example, all data is transmitted directly into cloud storage. Data is gathered from mobile devices (e.g., mobile phones) and vehicles. The location from mobile phones (e.g., iOS, Android phones) is represented as “Geofence” including latitude/longitude (lat/lon) and a timestamp of entering and leaving of a given location. Location tracking of the mobile devices may include, for example, lat/lon, timestamp, and update when having significant movement or at a sampled time interval.
Telemetry and location may also be obtained from vehicles, wherein location tracking may include, for example, lat/lon, timestamp, and update when having significant movement or at a sampled time interval. The telemetry data may be annotated. For example, all data may be annotated by the source (phone or vehicle), user and vehicle ID if applicable, time stamp, and the like.
According to another example, a predicted trip may be determined based on the above-described modeling. In one use case, a request and a response by application programming interface (API) are provided. This may include smart search, in which the destination(s) a user searches in the app are recommended and/or smart navigation, in which top possible destinations are recommended when the user starts to drive, based on the user and user's context. Alternatively, the user behavior matrix can be used to recommend music, movies, apps, and the like to particular users.
In another use case, a notification by push to the user's device(s) is implemented. For example, a time-to-leave notification, informing the user when to leave to arrive at a desired time due to real time traffic change may be pushed to the user. Also, smart preconditioning, in which the user is reminded to precondition their vehicle or the vehicle is automatically preconditioned 30 minutes before the trip (e.g., heating/cooling), may be implemented.
According to an experiment performed by the inventors, the deployment of the proposed model on trip pattern prediction task that predicts which location user will visit at certain time given his/her trip history was explored. The dataset includes user location tracking including driving. Raw features of the experiment include: user ID, location_gps_grid_ID, timestamp, 200 users, 5000 locations through a 200 m×200 m grid by map segmentation, performed over a 6-month period. We assume we have a user interaction for user u to be={(visit location i0 at time t0), . . . , (visit location iT at time TT)}. We use the first k of Iu to predict the k+1-th visit in the training set, where data contains both location i and timestamp t information for the visit, and we use the first n−1 visit to predict the last one in the test set.
We model the context feature through introducing three different types of object embedding for location, hour of day, day of week, respectively. We reported the top N-best matching accuracy that is widely used in a recommendation system to measure the performance. It correctly returns 1 for top N matching if the recommended item is within the top N results based on a set of M items ranked by the predicted preference of a user. The results are shown below.
The result shows a promising potential of deploying user behavior learning framework to support downstream prediction task.
In another exemplary embodiment of the present invention, a non-transitory computer-readable medium is encoded with a computer program that performs the above-described method. Common forms of non-transitory computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH-EPROM, any other memory chip or cartridge, or any other medium from which a computer can read.
The present invention provides a number of significant advantages over conventional systems and methods. The present invention provides the predicted trip algorithm framework that supports notification-based and request-based use cases and scenarios. Due to the embedding framework, the information provided to the user is enriched by temporal and spatial context modeling by measuring the similarity (e.g., home or office) among different users from different places, and by inferring improved predictions from time, location and trip.
The present invention provides an algorithm that can achieve the following benefits: low complexity, which improves the service online computation; improved user experience by leveraging personal context to have better predicted performance; and a technical solution to address data sparsity. Further, the predictive performance is improved by addressing the fine granularity of personal context and the coarse granularity of data sparsity. The algorithm according to the present invention is able to extend to other contexts, behaviors and services. Additionally, the present invention enables more smart services, e.g. smart search, smart preconditioning, smart charging, time-to-leave, except the predicted trip for smart navigation. For the predicted destination, arrival time and route, the present invention is able to provide to the user richer information about traffic and happenings along the route, reserve the parking and restaurant, etc., all around the destinations, as well as provide deal recommendations, local pedestrian map for the last mile of a trip or connected transportation service, and the like.
The foregoing disclosure has been set forth merely to illustrate the invention and is not intended to be limiting. Since modifications of the disclosed embodiments incorporating the spirit and substance of the invention may occur to persons skilled in the art, the invention should be construed to include everything within the scope of the appended claims and equivalents thereof.