The invention relates generally to computer systems, and more particularly to an improved system and method for predicting user navigation within sponsored search advertisements.
Implicit user feedback, including click-through and subsequent browsing behavior, is crucial for evaluating and improving the quality of results returned by search engines. Several studies have used post-result browsing behavior including the sites visited, the number of clicks, and the dwell time on site in order to improve the ranking of organic search results. In particular, some studies have focused on how implicit measures can be utilized to improve Web search. For instance, see E. Agichtein, E. Brill, and S. Dumais, Improving Web Search Ranking by Incorporating User Behavior Information, In Proceedings of SIGIR 2006, and T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay, Evaluating the Accuracy of Implicit Feedback from Clicks and Query Reformulations in Web Search, ACM Transactions Information Systems, 25(2):7, 2007. In particular, E. Agichtein, E. Brill, and S. Dumais found that implicit feedback can improve the accuracy of an organic search ranking algorithm by almost 31%. As a result, various methods have been proposed for how to incorporate implicit measures into ranking organic search results. For instance, there is work on how to interpret click-through data accurately for organic search results (see T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay, Accurately Interpreting Clickthrough Data as Implicit Feedback, In Proceedings of SIGIR 2005), identify relevant websites using past user activity for organic search results (see E. Agichtein and Z. Zheng, Identifying “Best Bet” Web Search Results by Mining Past User Behavior, In Proceedings of KDD 2006), and rank pages based on user feedback for organic search results (see F. Radlinski and T. Joachims, Query Chains: Learning to Rank from Implicit Feedback, In Proceedings of SIGKDD 2005).
However, there has failed to be similar studies of user behavior on sponsored search results (which are the advertisements displayed by search engines next to the organic search results) to exploit post-result user behavior for better ranking of sponsored advertisements. Revenue from sponsored search results provides much of the economic foundation of modern web search engines. However, sponsored search advertising differs significantly from organic search in several important ways. First, while organic search is focused on satisfying users by addressing search queries, sponsored search advertising has to optimize for advertising revenue while accounting for user satisfaction and the constraints and objectives of advertisers. Second, the ranking of advertisements in sponsored search advertising differs from that of organic search. Specifically, advertisements in sponsored search advertising are ranked in accordance with their expected revenue which is a function of advertisement bid amounts and predicted click-through rates. The latter is estimated mainly using the relevance between the short creative of the advertisement and the query terms, and broad query-wide features. More detailed features proven to be useful for ranking in organic search, for instance, hyperlink structure and anchor text information are frequently unavailable for short-lived advertisements. Third, click-through rates in sponsored search advertising tend to be lower than for organic results, suggesting users might interact differently with these results than with organic results.
What is needed is a way to improve the user experience in sponsored search advertising through the incorporation of implicit user feedback. Such a system and method should provide a prediction mechanism for forecasting user behavior in previously unseen scenarios.
Briefly, the present invention may provide a system and method for predicting user navigation within sponsored search advertisements. In various embodiments, a web browser executing on a client device may be operably coupled to a server for receiving a list of sponsored advertisements from the server for display by the web browser on a search results page. The server may include an operably coupled prediction engine that predicts user navigation originating from a sponsored advertisement for display on a web page of search results by predicting a probability of a click on the sponsored advertisement and by predicting a probability of a dwell time on a plurality of web pages of a website of the sponsored advertisement. The prediction engine may include a click prediction classifier that predicts the probability of a click on a sponsored advertisement and may include a dwell time prediction classifier that predicts the probability of a dwell time on the web pages of a website of a sponsored advertisement. The server may also include an operably coupled navigation information ranking engine that ranks the list of sponsored advertisements by the probability of a click on each of the sponsored advertisements and by the probability of the dwell time on the web pages of each of the websites of the sponsored advertisements.
To predict user navigation within sponsored search advertisements, a click prediction classifier may be trained to predict a probability of a click on a sponsored advertisement using sets of features from sets of training data. Each set of features may include features of a user entity, features of a query, and features of a list of sponsored search results. A dwell time prediction classifier may also be trained using the sets of features from the sets of training data to predict a probability of a dwell time on web pages of a website of a sponsored advertisement. In an embodiment, a binary classifier may be trained by logistic regression using the sets of features from the sets of training data. The performance of training a binary classifier may be determined by measuring an area under a validation receiver operating characteristic curve. The classifiers may be output in an embodiment when the performance of training each binary classifier may be within a defined threshold.
A list of sponsored advertisements may be ranked at least in part by the prediction of user navigation within sponsored search advertisements and served for display on a search results page. To do so, a list of sponsored advertisements for display on a web page of a plurality of search results may be received. A probability of user navigation may be predicted for each of the sponsored advertisements using a probability of a click on each of the sponsored advertisements and a probability of a dwell time on web pages of a website of each of the sponsored advertisements. In an embodiment, features of a user entity, features of a search query and features of the list of sponsored advertisements may be input to a click prediction classifier applied to predict a probability of a click on each of the sponsored advertisements in the list and may also be input to a dwell time prediction classifier to predict a probability of a dwell time on web pages of a website of each of the sponsored advertisements in the list. The list of the sponsored advertisements may be ranked in order using, at least in part, the probability of user navigation from each of the sponsored advertisements and output, for example, by serving the list of the sponsored advertisements in rank order to a web browser executing on a client device for display on a web page of search results.
Advantageously, the present invention may accurately predict a probability of user navigation within sponsored search advertisements. Other advantages will become apparent from the following detailed description when taken in conjunction with the drawings, in which:
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, and so forth, which perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in local and/or remote computer storage media including memory storage devices.
With reference to
The computer system 100 may include a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by the computer system 100 and includes both volatile and nonvolatile media. For example, computer-readable media may include volatile and nonvolatile computer storage media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by the computer system 100. Communication media may include computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. For instance, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.
The system memory 104 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 106 and random access memory (RAM) 110. A basic input/output system 108 (BIOS), containing the basic routines that help to transfer information between elements within computer system 100, such as during start-up, is typically stored in ROM 106. Additionally, RAM 110 may contain operating system 112, application programs 114, other executable code 116 and program data 118. RAM 110 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by CPU 102.
The computer system 100 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media, discussed above and illustrated in
The computer system 100 may operate in a networked environment using a network 136 to one or more remote computers, such as a remote computer 146. The remote computer 146 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer system 100. The network 136 depicted in
Predicting User Behavior within Sponsored Search Advertisements
The present invention is generally directed towards a system and method for predicting user behavior within sponsored search advertisements. To predict user navigation within sponsored search advertisements, a click prediction classifier may be trained to predict a probability of a click on a sponsored advertisement using sets of features from sets of training data. Each set of features may include features of a user entity, features of a query, and features of a list of sponsored search results. A dwell time prediction classifier may also be trained using the sets of features from the sets of training data to predict a probability of a dwell time on web pages of a website of a sponsored advertisement. A list of sponsored advertisements to be ranked at least in part by the prediction of user navigation within sponsored search advertisements may be received. A probability of user navigation may be predicted for each of the sponsored advertisements using a probability of a click on each of the sponsored advertisements and a probability of a dwell time on web pages of a website of each of the sponsored advertisements. The list of the sponsored advertisements may be ranked in order at least in part by the probability of user navigation from each of the sponsored advertisements and may be served in rank order to a web browser executing on a client device for display on a web page of search results.
As used herein, sponsored search advertisement means an advertisement that includes a URL of a website and that is paid for or promoted by a sponsor, such as an advertiser, for display with organic search results. As used herein, sponsored search results means a list of sponsored search advertisements that are displayed next to the organic search results on a web page. A sponsored advertisement means a sponsored search advertisement.
As will be seen, the present invention may accurately predict a probability of user navigation within sponsored search advertisements and may improve the user experience in sponsored search advertising through the incorporation of implicit user feedback. As will be understood, the various block diagrams, flow charts and scenarios described herein are only examples, and there are many other scenarios to which the present invention will apply.
Turning to
In various embodiments, one or more client computers 202 may be operably coupled to a search server 208 and to an advertisement server 218 by a network 206. The client computer 202 may be a computer such as computer system 100 of
The search server 208 may be any type of computer system or computing device such as computer system 100 of
The advertisement server 218 may be any type of computer system or computing device such as computer system 100 of
Sponsored advertisement applications may use the present invention to provide a list of sponsored search advertisements for display on the search results page of a client browser in online advertising. When a user may submit a search query request, the present invention may be used to predict user navigation within sponsored search advertisements and to rank a list of sponsored search advertisements for display on a search results page. In various embodiments, the list of sponsored search advertisements may appear in the sponsored search results area of the search results page. To predict user navigation from sponsored search advertisements, the user navigation trail of the sequence of web pages viewed on a website of a sponsored search advertisement may be analyzed to determine (a) the number of clicks the user makes on the trail which represents the trail length, and (b) the total time spent in the trail which represents the trail duration. In an embodiment, these two numbers may provide a useful synopsis of user navigation behavior originating from sponsored search advertisements and may be used to predict user navigation originating from sponsored search advertisements.
At step 306, a click prediction classifier may be trained using the training sets and the features. In an embodiment, a cost-sensitive binary classifier may be trained using the training sets and the features which may be described in more detail in conjunction with
At step 310, a model with the click prediction classifier and the dwell time prediction classifier may be output to predict user navigation originating from a sponsored search result such as a sponsored search advertisement. In an embodiment, the model output may be prediction engine 220 in
At step 404, an initial value for each cost parameter may be selected. In an embodiment, an initial value may be randomly selected within the range of values received for each cost parameter. After selecting initial values for each cost parameter, a cost-sensitive binary classifier may be trained at step 406 using logistic regression with an input vector x and a weight vector w. In an embodiment a ridge estimator may be used to generalize the classifier for the training data set and to avoid overfitting the classifier for the training data set. The weight vector w may be optimized using Newton's method in an embodiment to allow convergence on a solution with fewer input training cases.
At step 408, the performance of the trained binary classifier may be determined by measuring the area under the receiver operating characteristic (ROC) curve of a validation ROC. As is well-known in the art, the ROC curve may graph the true positive rate for training data classified by a binary classifier against the false positive rate for training data classified by a binary classifier. Thus, the area under the ROC curve represents a measure of binary classifier performance. At step 408, the difference between the measure of the area under the ROC curve in the current iteration of training the binary classifier and the measure of the area under the ROC curve in the current iteration of training the binary classifier may be calculated. If the difference is less than a threshold, then the training classifier may be output at step 414. Otherwise, a new value for each cost parameter may be selected at step 412 and processing may continue at step 406 to train the binary classifier again with new values for the cost parameters.
At step 506, the probability of an additional click may be predicted for the sponsored search advertisement. In an embodiment, a click prediction classifier may predict the probability of an additional click by a user for the sponsored search advertisement. At step 508, a probability of a dwell time of at least one minute may be predicted for the sponsored search advertisement. In an embodiment, a dwell time prediction classifier may predict the probability of a dwell time of at least one minute by a user for the sponsored search advertisement. Those skilled in the art will appreciate that a click prediction classifier may be trained to predict various navigation trail lengths by a user, including two, three or more clicks by a user on web pages of a website of a sponsored advertisement. Similarly, those skilled in the art will appreciate that a dwell time prediction classifier may be trained to predict various navigation trail durations by a user including dwell times of tens of seconds, a half minute or two or more minutes on web pages of a website of a sponsored advertisement.
At step 510, it may be determined whether the last sponsored search result in the sponsored search results list has been processed. If not, then processing may continue at step 504 where the next sponsored search result, such as a sponsored search advertisement, may be obtained from the list of sponsored search results. Otherwise, if it may be determined that the last sponsored search result in the sponsored search results list has been processed, then the sponsored search advertisements in the search results list may be ranked at step 512 at least in part by the predicted probabilities of an additional click and a dwell time of at least one minute for each of the sponsored search advertisements. And at step 514, the ranked list of sponsored search advertisements may be output. In an embodiment, the ranked list of sponsored search advertisements may be stored in a computer-readable storage medium and sent to a client browser for display on the search results page of a client browser.
Thus the present invention may accurately forecast user behavior and improve the user experience in sponsored search advertising through the incorporation of implicit user feedback. Advantageously, the present invention may accurately predict user navigation within sponsored search results in previously unseen scenarios. Importantly, even slight increases in accuracy in predicting user navigation within sponsored search results can result in substantial increased revenue and in a better user experience, given the large scale of search engine traffic. Those skilled in the art will appreciate that there may be other implementations of incorporating implicit user feedback to predict user navigation within sponsored search advertising. In addition to a click prediction classifier and a dwell time prediction classifier, other prediction classifiers may be trained using features of a user entity, features of a query, and features of a list of sponsored search results. For example, a prediction classifier may be trained using features of web pages of a website of a sponsored search advertisement in the sponsored search results.
As can be seen from the foregoing detailed description, the present invention provides an improved system and method for predicting user navigation within sponsored search advertisements. A list of sponsored advertisements for display on a web page of search results may be received, and a click prediction classifier may be applied to predict a click probability of each sponsored advertisement and a dwell time prediction classifier may be applied to predict a dwell time probability on web pages of a website of each sponsored advertisement. A probability of user navigation may be predicted for each sponsored advertisement using a probability of a click on each sponsored advertisement and a probability of a dwell time on web pages of a website of each sponsored advertisement. The list of the sponsored advertisements may be ranked in order at least in part by the probability of user navigation and served to a web browser executing on a client device for display on a web page of search results. Advantageously, the probability of user navigation may be accurately predicted by the present invention for serving sponsored advertisements in online advertising. As a result, the system and method provide significant advantages and benefits needed in contemporary computing and in online advertising applications.
While the invention is susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the invention to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention.