The data-rich environment of the Internet gives companies the ability to engage with greater numbers of potential customers through Internet-accessible web sites. Better understanding of user interaction with navigable content of these websites can lead to better website design and/or increased sales or other goals. Although attempts have been made to predict user actions in certain contexts, in many real-world situations, users are faced with many actions simultaneously, which can increase complexity and lead to poor results.
The details of one or more implementations for multitask user behavior prediction are set forth in the accompanying figures and the detailed description below.
User interaction with navigable content, such as interconnected web pages of a website, can indicate a user's intent and purpose. Accurately modeling user interaction with navigable content may enable better site design, leading to increased user engagement. A content service, such as a website, platform, and/or the like, may utilize prediction logic that predicts future user actions to, inter alia, serve users with relevant content. Although these types of predictions may be capable of increasing content relevancy, in real-world settings, users are often faced with multiple potential actions simultaneously. As described in further detail herein, the disclosed prediction logic jointly models multiple, sequential user actions and can utilize such models to simultaneously predict multiple user interactions from sequential user interaction metadata (referred to also herein as user-interaction data streams, or istreams).
Consider, for example, a website that includes and/or is coupled to prediction logic configured to, inter alia, capture user interactions with navigable content of the website (e.g., webpages). The prediction logic may acquire istreams data that includes a sequence or stream of user actions captured as respective users traverse webpages of the website. The istream data may correspond to multiple potential actions available to users at respective webpages, such as “conversion actions,” duration of visits to the respective webpages, navigation to other webpages, and so on. As user herein, a “conversion” or “conversion action” refers to a desired or target user action, such as: user submission of a form, click-through, user engagement, user interaction with an engagement component (e.g., an instant messaging component), and/or the like.
The prediction logic uses istream data to, inter alia, model dependencies between multiple actions. For example, the time duration a user spends at a webpage may be indicative of the interest of the user in content of the webpage, which in turn informs how likely the user is to perform a conversion action. The prediction logic may also model how user actions evolve as users traverse respective paths through the navigable content of the website. By jointly modeling multiple, sequential user actions, the prediction logic can develop a multi-dimensional model of future user behavior that yields a better understanding of user intent and preferences, resulting in improved prediction performance for high-interest tasks (e.g., conversion actions). In the example above, accurate prediction of conversion actions may be used to adapt navigable content of the website to produce increased conversion rates.
The computing system 102 may include a processor 103, memory 104, non-transitory storage 105, a human-machine interface (HMI) component(s) 106, a data interface 107, and so on. The processor 103 may include any suitable processing component(s) including, but not limited to: a processor chip, processing circuitry, a processing unit, a central processing unit (CPU), a general-purpose processor, an application-specific integrated circuit (ASIC), programmable processing elements, a Field Programmable Gate Array (FPGA), and/or the like. The memory 104 may include any suitable memory component(s) and/or device(s) including, but not limited to: volatile memory, non-volatile memory, random access memory (RAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), cache memory, and/or the like. The non-transitory storage 105 may include any suitable non-transitory, persistent, and/or non-volatile storage component(s) and/or device(s) including, but not limited to: a non-transitory storage device, a persistent storage device, an internal storage device, an external storage device, a remote storage device, Network Attached Storage (NAS) resources, a magnetic disk drive, a hard disk drive (HDD), a solid-state storage device (SSD), a Flash memory device, and/or the like. The HMI components 106 may include, but not limited to: input devices, output devices, input/output (I/O) devices, visual output devices, display devices, monitors, touch screens, a keyboard, gesture input devices, a mouse, a haptic feedback device, an audio output device, a neural interface device, and/or the like. The data interface 107 may include any suitable data and/or communication component(s), interface(s) and/or device(s), including, but not limited to: I/O ports, I/O interconnects, I/O interfaces, communication ports, communication interconnects, communication interfaces, network ports, network interconnects, network interfaces, and/or the like.
The data interface 107 may couple the computing system 102 to a network 108. The network 108 may include any suitable electronic communication network(s) and/or combination of networks, including, but not limited to: wired networks, wireless networks, a local area network (LAN), a wide area network (WAN), a virtual private network (VPN), Internet Protocol (IP) networks, Transmission Control Protocol/Internet Protocol (TCP/IP) networks, the Internet, and/or the like. The data interface 107 may be further configured to couple the computing system 102 to users 109 through and/or by use of respective user devices, such as client computing devices, user computing devices, mobile computing devices, mobile communication devices, smartphones, and/or the like. In the
The computing system 102 may include, implement, and/or be coupled to prediction logic 110. In some implementations, portions of the prediction logic 110 (and/or one or more component(s) thereof) are implemented by use of resources of the computing system 102. Portions of the prediction logic 110 can be configured to operate on and/or by use of the processor 103 of the computing system 102, utilize memory 104 of the computing device 102, and so on. Portions of the prediction logic 110 may be implemented and/or realized by executable instructions maintained within a non-transitory storage medium, such as, for example, the non-transitory storage 106 of the computing system 102. Alternatively, or in addition, portions of the prediction logic 110 may be implemented and/or realized by hardware components, such as application-specific processing hardware, an application-specific integrated circuit (ASIC), FPGA, dedicated memory resources, and/or the like.
The prediction logic 110 can be configured to jointly model multiple, sequential user actions pertaining to navigable content 120, such as a website or the like. The navigable content 120 may be hosted by the computing system 102 and users 109 may access and/or interact with the navigable content 120 through the network 108. Alternatively, or in addition, the navigable content 120 (and/or portions thereof) may be hosted by another network-accessible service or system, not illustrated in
The prediction logic 110 receives istream data 124 that includes, inter alia, a sequence of user actions, each corresponding to a respective content item (i) of the navigable content 120 (e.g., a respective web page i) at a respective time or timestamp (t). The prediction logic 110 includes and/or is coupled to a multi-task machine-learning and/or machine-learned (ML) engine 130. The ML engine 130 includes and/or is coupled to an ML model 132 trained to simultaneously predict multiple user actions based on and/or in response to user interaction sequences, such as sequences of user actions encoded within the received istream data 124. As illustrated in
The ML engine 130 device, train, refine, and/or otherwise maintain an ML model 132 that includes and/or is embodied by an RNN 232. The RNN 232 may be trained by use of a training dataset 229 that includes, inter alia, a plurality of istreams 124. Training istreams 224 may encode and/or be derived from istreams 124 (e.g., a sequence of actions captured during traversal of the navigable content 120 by respective users 109).
In the
In the
The ML engine 130 may be configured to train the RNN 232 to jointly model multiple, sequential user actions (as indicated by interaction data 228 of the training istreams 224). The RNN 232 may be trained using any suitable technique or mechanism. In one example, incoming users are observed while accessing the navigable content 120 (website) and istream data 124 are captured during respective navigation sessions. The istream data 124 may be captured over a plurality of different user sessions, times, operating conditions, and/or the like. The training dataset 229 may be formed and/or derived from the captured istream data 124. The training dataset 229 may, therefore, include a plurality of training istreams 224 each including and/or being derived from one or more captured istreams 124. The training dataset 229 may be split into three subsets, including a training set, a validation set, and test set. The training set may include about 80% of the training istreams 224 of the training dataset 229, and the validation and test sets may include about 10%, respectively. The training istreams 224 may be selected according to any suitable mechanism or criteria (e.g., may be randomly selected). The ML engine 130 may train the RNN 232 to replicate user actions of the training dataset 229 and may test and/or validate the trained RNN 232 by use of training istreams 224 of the test and/or validation sets. In some examples, the ML engine 130 trains the RNN 232 to implement a binary prediction and/or classification task. More specifically, to predict the probability that an istream 124 will result in a target action, such as a conversion action. Alternatively, the ML engine 130 may train the RNN 232 to predict subsequent user behavior, such as a sequence of future user actions that may or may not result in the user 109 performing a target action.
The ML engine 130 may utilize the trained RNN 232 to determine prediction data 134. The ML engine 130 receives live, real-world istream data 124, inputs the istream data 124 into the RNN 232, and causes the RNN 232 to produce corresponding prediction data 134. In some implementations, the prediction data 134 includes a probability that the user 109 associated with the istream data 124 will subsequently perform a target action, such as a conversion action (a target probability 135). Alternatively, or in addition, the prediction data 134 may predict one or more subsequent actions in the input istream 124 (istream predictions 244). The istream predictions 244 may be represented and/or quantified using any suitable mechanism. As illustrated in
) includes nodes 221 representing a set of webpages
={p1, p2, . . . , pm} of the navigable content 120 (a website). Edges 222 of the graph 220, ϵ={e1, e2, . . . , eq}, represent links and/or references between respective webpages. A given webpage (pi) may be associated with a set of adjacent webpages, which can be encoded within an adjacency list of the webpage,
i={
1,
2, . . . ,
k}, for webpage pi.
The prediction logic 110 may include and/or be coupled to an agent 320 that monitors and/or captures istream data 124 pertaining to user navigation. The agent 320 may monitor interactions of users 109 at a content server 302, such as the computing system 102, an external content server or service, and/or the like. Alternatively, or in addition, the content server 302 can be configured to push istream data 124 to the prediction logic 110.
The ML engine 130 can use the istream data 124 to construct a dataset 229 to train the ML model 132 (a training dataset 229) and/or predict future actions of one or more users 109, as disclosed herein. The agent 320 may acquire information pertaining to navigation of the website by users 109, ={u1, u2 . . . , un}, including user actions,
={Y1,1, . . . , Yn,t} performed by respective users 109 (ui) at respective webpages (pi) at respective times (t). The user actions may be maintained within entries 225 of the istream data 124. The entries 225 may be arranged in a time sequence. The entries 225 may include interaction metadata 228 specifying actions performed by a user 109 at specified webpages (pi) at respective times (t) of a time sequence. The interaction metadata 228 may include tuples Yi,t=
yi,t, di,t, pi,t+1
, where:
The agent 320 may be further configured to capture a sequence of entries 225 pertaining to navigation of respective users 109, and arrange the entries 225 into istreams 124 for the respective users; e.g., a set of n istreams 124, S={s1, s2, . . . , sn}, each including a sequence of user actions performed while navigating through webpages of the website by a respective user 109.
The ML model 130 is configured to receive input istream data 324 asand produce probability estimates for multiple tasks (prediction data 134) by use of the ML model 132. The prediction data 134 may include a tuple of probabilities of future user actions n,t+f={Ŷ1,t+1, . . . , Ŷn,t+f}, where Ŷn,t+f=
(pi,t+f),
(di,t+f),
(pi,t+f+1)
and f denotes the number of future time periods predicted by the ML model 132. The prediction data 134 may include a plurality of conditional and/or joint probabilities for respective actions, each indicating a respective predicted value of the action and a corresponding probability. As disclosed in further detail herein, the prediction data 134 may include a conversion prediction 340 that may include a predicted value of a conversion action {0, 1} being performed on a specified webpage (pi) at time (t) and a probability of the predicted conversion action, a duration prediction 342 that may include a predicted value for the duration the user 109 will remain at the specified webpage (pi) at (t) and a corresponding probability, a next page prediction 344 that may include a predicted value for a next webpage visited by the user 109 and a probability of corresponding prediction, and so on.
In the
In some implementations, webpages may be identified and/or represented by one-hot vectors having a same distance from one another; e.g., a same Euclidean distance (√{square root over (2)}) from each other in . At each time t, the ML engine 130 inputs a one-hot representation of pi,t∈
|P| (where the length of Si may vary from user 109 to user 109). The input pi,t, however, may be extremely sparse and, as such, be a poor representation of relationships between webpages. More specifically, the one-hot representations may be incorporate webpage connectivity and/or adjacency lists in next page predictions 344. The ML engine 130 may improve prediction performance by, inter alia, including an embedding layer 330 configured to learn lower-dimensional webpage representations, which may obviate the need for inefficient, less accurate one-hot representations. The embedding layer 330 can learn lower dimensional webpage representations (embedding vector(s) 331) using any suitable technique. In the
δ through linear mapping, as follows: ei,j=WembedPi,t, where Wembed∈
is a matrix of learnable parameters.
The lower dimensional webpage representations (embedding vectors 311) may be included in the input istream data 324 utilized within other layers of the ML engine 130. As disclosed herein, the embedding vectors 311 may enable the ML engine 130 to incorporate structural characteristics of the navigable content 120 (and/or graph 220) into the ML model 132.
In the
in
t
=a(Winei,j+Zinht−1+bin)
f
t
=a(Wfeim+Zfht−1+bf)
c
t
=f
t
·c
t−1
+in
t·tanh(Wcei,j+Zcht−1+bc)
o
t
=a(Woei,j+Zoht−1+bo)
h
t
=o
t·tanh(ct)
In the equations above, a(⋅) denotes a nonlinear activation function, tanh(⋅) represents a hyperbolic tangent, and Zj, Wj, bj are learnable parameters, where j∈{int, fy, ct, ot}. The matrix Zj is a mapping from the previous state ht−1 to j, Wj learns the relationship of the embedding vector, and bj is the bias term.
The hidden state vector 335 (ht) produced by the LSTM cell 333 may be provided to an output layer 336 of the ML model 132, which may be configured to predict actions yi,t, dit, and i,t+1 (predictions 340, 342, and/or 344) based, at least in part, on the hidden state vector 333 ht and current interaction metadata 228-t. The output layer 336 may be configured to model the joint likelihood of a plurality of user actions using any suitable technique or mechanism. In the
(yi,t,dit,pi,t+1)=
(yi,t|ht)
(dit|yi,t, ht)
(pi,t+1|dit, yi,t, ht) Eq. 1
The conditional probabilities may include a conversion prediction 340, including a prediction that the user 109 will subsequently perform a conversion action (0 or 1) and/or a probability thereof, which may be obtained as follows:
(
i,t|ht)=σ(Wformhtt+bform) Eq. 2
In Eq. 2, σ(⋅) is the sigmoid function. The resulting estimate of i,t may be used to determine predictions for other actions, such as a duration prediction 342
(dit), a next webpage prediction 344
(
i,t+1), and so on, per Eq. 1.
To improve prediction of other tasks, the ML engine 130 can embed the conversion prediction 340 into a distributed representation (γi,t), γi,t∈. The distributed representation may eliminate the presence of a binary component of feature vector(s) when predicting other conditional probabilities, such as the duration prediction 342 (di,t), the next page prediction 344 (pi,t+1) and so on. The ML engine 130 may encode dimensions of the conversion prediction 340 (γit) to be
The ML engine 130 may produce a distributed representation of the conversion prediction 340 as an embedded vector, or embedded conversion prediction vector 341 (γi,t), γi,t=Ω, where Ω∈is a maxtrix of learnable parameters and
it∈
2×1 is the vector of conversion predictions 340.
The embedded conversion prediction vector 341 is concatenated with the hidden state variable 335 (ht) to produce a feature vector 337 for regression of the duration prediction 344, as follows
A gaussian distribution with mean μ and unit variance may be assumed for (di,t), such that
(dit|
i,t, ht)˜
(μ,1), where μ=
durationTβt+bduration. The ML engine 130 can be further configured to generate duration prediction(s) 342 for {circumflex over (d)}i,t from the dot product of the weight and feature vectors for page duration, as follows, {circumflex over (d)}i,t=
durationTβt+bduration.
The ML engine 130 may then concatenate the normalized duration prediction 342 ({circumflex over (d)}i,t) with the embedded conversion prediction vector 341 (ŷi,t) to obtain a feature vector 339 (πt) for use in generating the next page prediction 344,
The ML engine may calculate a softmax of the feature vector 339 and weight matrix over the adjacency matrix of the current webpage (pi) with a softmax adjacency function: pi,t+1=softmaxadj(Wpageπt+bpage) where
where i is the adjacency list for the webpage (pi). The ML engine 130 may, therefore, encode structure of the graph 220 (navigable content 120) into the sequential predictive ML model 132.
The ML engine 130 can iteratively generate hidden state vectors (ht) 335 and corresponding predictions 340, 342, and 344 for time-sequence of entries 225 included in the input istream 325 (each entry 225 corresponding to a respective time of the time sequence from t=0 or 1 to 7). The hidden state vectors 335 (ht) for respective times t of the time sequence may incorporate hidden state vectors 335 (ht−1) of previous times t−1 of the time sequence. In some implementations, predictions 340, 342, and/or 344 generated in response to the entry 225 corresponding to the last time of the time sequence included in the input istream 324 are incorporated into the prediction data 134 output by the ML engine 130.
As disclosed herein, the ML engine 130 can be configured to train the ML model 132 using, inter alia, a training dataset 229 that includes a plurality of training istream data 224. The ML model 132 may be trained to maximize the probability of the conditional likelihood for the multiple actions (conversion action, duration, and next page), as follows:
To facilitate training, the ML engine 130 may be configured to minimize the negative log likelihood and suppress the negative on the left-hand (for notational purposes below). The ML engine 130 may also insert a tuning parameter λ=[λ1,λ2,λ3] to control the contribution of each task to the overall loss, as follows:
The ML engine 130 can employ cross-entropy loss for (yi,t) and
(pi,t+1), and mean squared error loss for
(dit), to minimize the expression above. In some implements, the ML engine 130 incorporates an L2 penalty over the weights for pi,t+1, which may be controlled by a hyperparameter (α). These additional terms act to regulate the next page prediction task and, in turn, improve generalization performance, as follows:
Training the ML model 132 may include iteratively applying training istream data 224, evaluating errors between prediction data 134 produced by the ML model 132 and actual user actions of the training istreams 224, and adjusting, refining, and/or optimizing weights of the embedding layer 330, LSTM layer 332, and/or LSTM cell(s) 333 accordingly.
The hidden state h1 335-1 determined for the first time in the sequence t=1 may be used to determine the hidden state 335-2 (h2) and/or predictions for the next entry 225-2 in the time sequence (t=2). Evaluation of the next entry 225-2 of the input istream 324 (t=2) may include determining an embedded vector 331-2, hidden state vector (h2) 335-2 (by a second instance of the LSTM cell 333-2), a conversion prediction 340-2, a duration prediction 342-2, a next page prediction 344-2, and so on. A last instance of the LSTM cell 333-T may produce a last hidden state vector 335-T (hT) from the hidden state vector (hT−1) 335-T−1 of the next-to- lasttime of the sequence and the embedded vector 331-T generated from the last entry 225-T. The hidden state vector 335-T (hT) may be used to generate a conversion prediction 340-T, duration prediction 342-T, and next page prediction 344-2 for t=T, and so on. The predictions 340, 342, and/or 344 determined for the last time of the time sequence may be incorporated into the prediction data 134 output by the ML engine 130 in response to the input istream data 324.
is a matrix of learnable parameters and
it∈
2×1 is the vector of conversion predictions 340).
The ML engine 130 further includes and/or is coupled to duration prediction logic 542 configured to derive duration predictions 342-t from feature vectors 337 produced by, inter alia, concatenating the hidden state vector (ht) 335-t with the embedded conversion prediction vector 341, as disclosed herein. Next page prediction logic 544 may be configured to generate feature vectors 339 by, inter alia, concatenating the hidden state vector (ht) 335-t with the embedded conversion prediction vector 341 and duration prediction 342-t, as disclosed herein.
Example methods are described in this section with reference to the flow charts and flow diagrams of
At 604, an embedding layer 330 of the ML engine 130 determines embedding vectors 331 for respective entries 225 of the istream 224. The embedding layer 330 may be trained to learn lower dimensional webpage representations (embedding vectors 331 ), ei,t∈δ through linear mapping, as follows: ei,j=Wembedpi,t, where Wembed∈
is a matrix of learnable parameters. At 604, the embedding layer 330 may determine respective embedding vectors 3314 for respective times t of the time sequence represented by the istream 124 and/or entries 225-t of the istream 124.
At 606, the ML engine 130 generates hidden state vectors (ht) 335-t for respective times t (and/or corresponding entries 225-t) based on the embedded vectors 331-t generated at 604 and hidden state vectors (ht−1) 335-t-1 of previous times t−1 of the time sequence. The hidden state vectors (ht) 335-t may be generated by an LSTM cell 333 of an LSTM layer 332, as disclosed herein. At 606, the LSTM layer 332 may determine respective hidden state vectors 335-t for respective times t of the time sequence represented by the istream 124 and/or respective entries 225-t of the istream 124.
At 608, the ML engine 130 generates a plurality of predictions 340, 342, and 344, each corresponding to a respective task of a plurality of tasks and/or possible actions. The ML engine 130 may generate a conversion prediction 340-t, a duration prediction 342-t, and next page prediction 344-t. The conversion prediction 340-t may be generated by conversion prediction logic 540 of the ML engine 130, the duration prediction 342-t may be generated by duration prediction logic 542 of the ML engine 130, and the next page prediction 344 -t may be generated by next page prediction logic 544 of the ML engine 130, as disclosed herein. At 608, predictions 340-T, 342-T, and/or 344-T determined for a last time T of the time sequence (and/or a last entry 255-T of the istream 124) may be incorporated into prediction data 134 output in response to the istream 124.
In some implementations, 604, 606, and 608 may be implemented iteratively, with 604, 606, and 608 being implemented for the entry 225-1 (t=1), a next entry 225-2 in the time sequence (t=2), and so on, to the last entry 225-T (t=T) in the time sequence.
At 704, an ML engine 130 utilizes a trained ML model 132 to determine predictions for each of the plurality of actions at respective times of the time sequence, including determining hidden state vectors 335 corresponding to the respective times. The ML engine 130 may determine a conversion prediction 340, duration prediction 342, and next page prediction 344 for respective times of the time sequence (and/or respective entries 225).
At 706, the ML engine 130 incorporates hidden state vectors 335 determined for previous times of the time sequence (e.g., hidden state vectors (ht−1) 335-t−1 into predictions determined for subsequent times of the time sequence (e.g., hidden state vectors (ht) 335-t). Predictions determined for a last time T of the time sequence may be incorporated into prediction data 134 output in response to the sequence of entries 225.
In some implementations, methods 600 and/or 700 may further include using prediction data 134 to implement interventions to drive improved conversion rates. The prediction data 134 may identify paths through the navigable content 120 that are more likely to result in conversion actions. A content server 402 (and/or content manager) may utilize the prediction data 134 to adapt navigable content 120 to lead users 109 to paths corresponding to higher conversion rates (higher conversion predictions 340). Alternatively, or in addition, the content server 402 can use prediction data 134 produced in response to istream data 124 associated with a current user session to dynamically adapt the navigable content 120 to increase the likelihood that the user 109 will perform a conversion action (e.g., modify design and/or content of one or more webpages, and/or the like).
Although the subject matter has been described in language specific to structural features and/or methodological operations, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific examples, features, or operations described herein, including orders in which they are performed. Moreover, although implementations for multitask behavior prediction with content encoding have been described in language specific to certain features and/or methods, the subject of the appended claims is not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as example implementations for multitask behavior prediction with content encoding.
The present disclosure claims priority to U.S. Provisional Patent Application Ser. No. 62/854,970 filed May 31, 2019, the disclosure of which is incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
62854970 | May 2019 | US |