The invention relates to user interactions in online services. More particularly, the invention relates to identification of points in a user Web journey where the user is more likely to accept an offer for interactive assistance.
Users commonly initiate visits to a websites of one or more organizations, where the visits seek to make purchases, to locate information about goods or services, to initiate customer service support requests, to compare product information, and so on. To improve user experiences, such organizations typically enhance these Web interaction progressions, or journeys, by offering interaction services to the users. The interaction services can include invitations for Web-based chats, customized product searches, etc. The invitations can be offered at any point in the Web journey. While some of the users can find the invitations for chats, searches, and so on to be helpful, other users find the invitations distracting, disruptive, invasive, or even annoying. As a result, the organizations have sought to classify the users by their likelihood to accept an invitation and identify at what point in the Web journey a chat invitation should be initiated.
Current approaches to such classification and identification use a set of rules that decide when to offer interactive assistance to a user. These rules are created manually by investigating the data. One disadvantage of such approach is that it is not data-driven and automatic, i.e. good rules can only be created after a significant investment of manual effort. As a consequence, the approach is not scalable. Also, sizeable manual effort must be dedicated to formulating rules for each platform. Working on platforms where user behavior changes over time requires that this manual effort be invested multiple times to formulate new rules to account for such changes.
Embodiments of the invention accurately identify those points in a user's website journey where an invitation for an interactive session may be offered to users, e.g. those points at which an invitation made to a user may have a higher propensity to be accepted by the user. Embodiments of the invention provide an approach that is data-driven and automatic. A technique is provided that, given ample data regarding visits to a website and data regarding offers of interactive assistance made, and responses to, such offers, learns to identify accurately those points in the user's journey where such offers may be made. For the current user, offers made at these points are highly likely to be accepted. This approach bypasses the need for manual analysis that previous approaches require. In embodiments of the invention, a model provided in accordance with this technique is only re-trained on new data to account for changing user behavior or change in the website. As a result, the herein disclosed technique is highly scalable and convenient.
Users typically interact with one or more organizations to make purchases, obtain product information, initiate customer service queries, and so on. The users connect to one or more organizational websites and then make a journey on those sites to obtain the desired information. Embodiments of the invention monitor user journey information to classify the users by their likelihood of accepting invitations for interactive services at any given point in the Web journey. The interaction services include Web-based chats, voice chats, customized searches, and so on. The offerings of the interactive services are based on the classifications of the users and on identifying points in the Web journey at which certain classes of users have a high propensity for accepting the invitation. The classifications are based on a support vector machine (SVM). Offers are made to classifications of users who have a high propensity to accept, and are not made to classifications of users who have a low propensity to accept. The invitation acceptance rates are monitored and stored. The stored acceptance rate data is analyzed and used to modify classification models.
Once the user connects to a website, the Web server monitors the journey of the user. The journey of the user can include the link of a website that has led to the current website, the sequence of pages visited by the user on the website, time spent by the user on these pages, and so on.
Based on the user's journey and user's characteristics the Web server uses a support vector machine (SVM) to classify the user into a specific class. These characteristics are, for example, the location from which the user visits; the time at which the user visits; the user's OS or the browser, device, ISP, of re-direction by another website; whether the user is a repeat visitor; search terms used on a search engine to come to this website; extensions added to the user's browser; etc. This information is gathered from the various http/Web requests that user's machine makes to access the website.
In machine learning, SVMs are supervised learning models having associated learning algorithms that analyze data and recognize patterns. SVMs are used for, for example, for classification and regression analysis. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other. A SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a line such that on each side, the gap between the line and the points on the side are maximized. In cases where a perfect separation of points from different categories by a line is not possible, the SVM seeks the best possible such line. New examples are then mapped into that same space and predicted to belong to a category based on which side of the line they fall on.
In embodiments of the invention, the SVM uses a rational kernel for classification. Rational kernels define a general kernel framework based on weighted finite-state transducers or rational relations to extend kernel methods to the analysis of variable-length sequences or, more generally, weighted automata. The rational kernel and a corresponding weighted transducer are created offline (see
In embodiments of the invention the Web server uses past history to learn a model. Past history comprises the details of user from a past visit, such as the browser history from a previous visit, location, etc., and the user's journey-related details, such as which pages were visited, the order in which they were visited, how much time was spent per page, whether the user made a purchase, whether the user chatted, etc. In an embodiment of the invention, the past history forms the training data on which the SVMs are trained to learn a model.
SVMs are extremely robust classifiers for binary classification problems when the points to be separated are linearly separable. Their utility is extended to non-linearly separable data by using kernels that implicitly map data to a higher dimension where such data are more likely to be linearly separable. In spaces with more than two dimensions, the term hyperplane is applied, rather than a line, which is a generalization of the notion of a line. Here, data is not linearly separable if it is not possible to find a hyperplane separating points belonging to the different categories.
Based on the class into which the user is placed, the Web server makes a decision to offer the user an invitation for an interaction. If the user is placed into the class of users who may refuse an invitation for an interaction at this point, the Web server does not offer an invitation to the user. If the user is placed into the class of users who may accept an invitation for an interaction at this point in time, the Web server offers an invitation to the user. The invitation may be, for example, an offer to chat with an agent, where the chat may be any of a text-based chat or a voice-based chat.
After the invitation is offered to the user, the Web server monitors the user's response and stores the user's response in a suitable location. The Web server can thereafter apply the user's response for further analysis. The response of the user becomes part of the data that is used for updating and/or re-training the model. This is how the webserver uses the response. For example, a user may accept or reject an offered chat. The accepts and rejects are stored along with the various other information gathered. During updating of the model, this data serves as additional examples that helps the model understand at what points in what types of journeys a chat is likely to be accepted or rejected.
Once the user connects to a website, the controller 22 monitors the user's journey. In embodiments of the invention, this is done by javascript that captures user interactions on a page of the various webpages of the website and that sends the information to a server. Examples of data captured via monitoring is the URL of the pages visited, sequence in which the pages are visited on a website, whether certain buttons are clicked, time spent on various pages, etc.
Based on the user's journey and user's characteristics received from the controller, the classification engine 21 uses a support vector machine (SVM) to classify the user into a specific class. As discussed above, the SVM uses a rational kernel that is constructed offline (see
In embodiments of the invention, the classification engine uses past history and actions taken, so far, by the user in the current session to perform the classification. Based on the class into which the user is placed, the controller 22 makes a decision to offer an invitation to the user for an interaction. If the user is placed into the class of users who may refuse an invitation for an interaction at this point, the controller does not offer an invitation to the user. If the user is placed into the class of users who may accept an invitation for an interaction at this point in time, the controller offers an invitation to the user. In embodiments of the invention, the invitation is an offer to chat with an agent, where the chat may be any of a text-based chat or a voice-based chat.
After offering the invitation to the user, the controller monitors the user's response and stores the user's response in the database 24. The controller may apply the user's response for future analysis.
Once the user connects (301) to a website, the Web server 11 monitors (302) the user's journey. The Web server uses the rational kernel, constructed offline, with the SVM to classify (303) the user into a specific class.
The Web server performs (304) a check into which class the user is placed. If the user is placed into the class of users who may refuse an invitation for an interaction at this point, the Web server does not offer (305) an invitation to the user. If the user is placed into the class of users who may accept an invitation for an interaction at this point in time, the Web server offers (306) an invitation to the user. In embodiments of the invention, the invitation is an offer to chat with an agent, where the chat may be any of a text-based chat or a voice-based chat.
After the invitation is offered to the user, the Web server monitors (307) the user's response and stores (308) the user's response in a suitable location. The Web server may then apply the user's response for future analysis.
The techniques disclosed herein may be applied at multiple points during a user journey. In some embodiments of the invention, application of such techniques is event triggered, for example when a user visits a Web page. Events can be user visiting a page, the user clicking on a particular button, the user pulling a dropdown, etc. The techniques herein disclosed may also be applied on a page-by-page basis, i.e. on every page visited by the user.
While a user is browsing various webpages during a Web journey, a decision is made at every page of the user's visit whether some form of interactive assistance, such as chat, should be offered to the user. This decision is made by a model that is built and/or trained offline based on data collected up to the present point in the Web journey, i.e. the data of various users and their visits.
Examples of the data collected include the geographic region from which the user visits the webpage, the browser that the user is using, the user's IP address, the time of day of the user's visit, the URLs of pages that the user visits, the page types of visited pages, etc. All of this data is collected by monitoring the user's Web journeys.
As discussed above, embodiments of the invention use a support vector machine (SVM) with rational kernels as a model. Rational kernels can represent sequences of varying lengths, i.e. Web journeys are sequences of differing lengths because different users may visit a different number of pages. These sequences can be visualized using weighted transducers.
Both of these attributes are desirable because Web journeys are of varying lengths, e.g. one user may browse five pages, and another user may browse ten pages. The innate capability of a model to handle sequences of differing lengths is valuable. The ability to visualize the kernels provides an intuitive understanding of some aspects of the decision making process that the model uses.
Finally, SVMs promise good and robust off-the-shelf performance. The use of SVMs with rational kernels helps, for example, to resolve both the need for a good classifier (SVMs) and the need for certain domain specific flexibility (rational kernels).
In
Embodiments of the invention use the transducer to traverse a pair of sequences which represent journeys on a website simultaneously. A path in the transducer corresponds to a pair of journeys if the first journey can be obtained by concatenating the first character in the labels of the edges in the path, known as the input label of the path; and the second journey can be obtained by concatenating the second character in the labels of the edges in the path, known as the output label of the path. Of interest is finding paths in the transducers that begin at a starting state and end at a final state. Such paths are known as accepting paths.
Consider the pair of journeys ‘ab’ and ‘ba.’ The path in the transducer with edges from state 0 to state 1 followed by the edge from state 1 to state 3 forms an accepting path for this pair because the input label of this path is ‘ab’ and the output label is ‘ba.’
The utility of this transducer for an SVM is that the transducer assigns a weight to every pair of journeys. For a pair of journeys, a weight is calculated from the transducer in the following manner:
For a pair of journeys, denote by x and y, and a transducer denoted by T, the weight assigned to this pair is denoted by T(x,y).
For example, in calculating T(‘aab’, ‘baa’) using
The final weight is interpreted as a notion of similarity between the journeys. The SVM may use this as its kernel function. Typically, this kernel value is further transformed to make the learning of the SVM optimal.
To train the SVM, specify a rational kernel, and feed it data of journeys along with the responses. The SVM uses the rational kernel iteratively to calculate kernel values for every pair of journeys, and uses this to train itself.
This also makes adjustments, based on domain knowledge, easy and convenient. The similarities calculated by a transducer depends on the weights of the edges and the final states. To reflect domain understanding, we can modify a transducer by either adjusting its structure or the weights so that certain journeys are preferentially treated.
Look closely at what modifying a transducer achieves. T(x,y), which is equivalently denoted as a kernel function, K(x,y), is in some sense, a measure of similarity between the inputs x and y. Any modification effectively only changes how this similarity is computed.
This is important to note. Domain knowledge may be incorporated in different ways such as feature selection, adding rules, assigning labels, using a specific distance function, unequal loss functions, etc. In embodiments of the invention, it is done by modifying the notion of similarity used.
How is a domain knowledge input, such as “all sequences starting with a, followed by at least one b, should get a positive label” used, given that only the similarity function is controlled?
Assume that there is already some positively labeled instances in the dataset that conform to this pattern: “start with a, followed by at least one b.” Now modify the kernel to return a high value of similarity for sequences that follow the pattern. This groups together such instances in the projected high-dimensional space of the SVM. This, in turn, helps the soft-margin training process, using the modified kernel, to identify a hyperplane that keeps all, or most of, these instances on the same side. Because it is assumed that there already are some positive instances to begin with, on this side, all the other instances are classified as positive.
Before continuing the discussion, consider a good way to represent domain knowledge. Earlier, reference was made an input of the form: “all sequences starting with a, followed by at least one b, should get a positive label.” There should be a standard way to express such domain knowledge so that one can modify the transducers algorithmically.
For purposes of the discussion herein, use regular expressions (regexps for short) for the following reasons:
The following lists some of the notations/terminology used:
The regular expression associated with the pattern “all sequences starting with a, followed by at least one b′” is R=a·b·(b)*.
Consider modifying a weighted transducer T given a regular expression. Embodiments of the invention provide a very simple construction to achieve this.
Begin with converting a regular expression into a weighted transducer. Given the finite state accepter for R, follow these steps to generate a transducer TR:
If xL(R), TR(x,y) does not have any accepting path, and by definition TR(x,y)=0.
Also construct the transducer TR−1, the inverse of TR. As shown in
Define the modified transducer Tm as,
T
m(x,y)=TR(x,y)+T(x,y)+TR−1(x,y)
where, T is the original transducer.
The following shows how Tm(x,y) is computed:
This is the desired behavior, i.e. sequences that match regexp R now receive a higher kernel value relative to T(x,y).
Thus, a convenient way is shown for including domain knowledge in a natural and coherent manner into the model.
wf can be changed to reflect how much Tm(x,y) should differ from T(x,y).
Consider the question of ensuring enough journeys that match regexp have a positive label. This can be done in the following ways:
The previous and this section, taken together, provide a comprehensive way to use rational kernels with domain knowledge inputs.
Embodiments of the invention use the weighted transducer to represent paths that can be taken by users in the website, and more, importantly, how the similarity between such paths may be calculated. Because the similarity calculation can be influenced in the weighted transducer representation, one may pick a transducer, and its weights, to be conducive to the particular data, i.e. the particular website, user behavior on that website, etc. Because an SVM heavily relies on the rational kernel, this enables the SVM to make optimal use of the data for learning. In many cases, this also means that the SVM can learn with relatively less data.
The computing system 40 may include one or more central processing units (“processors”) 45, memory 41, input/output devices 44, e.g. keyboard and pointing devices, touch devices, display devices, storage devices 42, e.g. disk drives, and network adapters 43, e.g. network interfaces, that are connected to an interconnect 46.
In
The memory 41 and storage devices 42 are computer-readable storage media that may store instructions that implement at least portions of the various embodiments of the invention. In addition, the data structures and message structures may be stored or transmitted via a data transmission medium, e.g. a signal on a communications link. Various communications links may be used, e.g. the Internet, a local area network, a wide area network, or a point-to-point dial-up connection. Thus, computer readable media can include computer-readable storage media, e.g. non-transitory media, and computer-readable transmission media.
The instructions stored in memory 41 can be implemented as software and/or firmware to program one or more processors to carry out the actions described above. In some embodiments of the invention, such software or firmware may be initially provided to the processing system 40 by downloading it from a remote system through the computing system, e.g. via the network adapter 43.
The various embodiments of the invention introduced herein can be implemented by, for example, programmable circuitry, e.g. one or more microprocessors, programmed with software and/or firmware, entirely in special-purpose hardwired, i.e. non-programmable, circuitry, or in a combination of such forms. Special-purpose hardwired circuitry may be in the form of, for example, one or more ASICs, PLDs, FPGAs, etc.
Although the invention is described herein with reference to the preferred embodiment, one skilled in the art will readily appreciate that other applications may be substituted for those set forth herein without departing from the spirit and scope of the present invention. Accordingly, the invention should only be limited by the Claims included below.
This application is a continuation of U.S. patent application Ser. No. 14/247,100, filed Apr. 7, 2014, which claims priority to U.S. provisional patent application Ser. No. 61/813,984, filed Apr. 19, 2013, each of which are incorporated herein in their entirety by this reference thereto.
Number | Date | Country | |
---|---|---|---|
61813984 | Apr 2013 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 14247100 | Apr 2014 | US |
Child | 15810851 | US |