A system and method of ranking search results based on relevance information extracted from user click data, and in particular exploiting sequential, supervised learning in search result ranking.
One determinant of the effectiveness of a search engine is the quality of the ranking function(s) used by the search engine. The ranking can be used to order items in the search results and/or whether or not to cull items from the set of search results, for example. A key contributor to effective ranking is a set of features or descriptors to represent a query-document pair that are accurate indicators of the degree of relevance of the document with respect to the query. Different data sources are explored in building the ranking functions. Conventional information retrieval systems relied heavily on exploring textual data. For example, feature-oriented probabilistic indexing methods use textual features such as the number of query terms, length of the document text, term frequencies for the terms in the query to represent a query-document pair; and vector space models use the raw term and document statistics to compute the similarity between a document and a query. Another conventional method uses the hyperlink structures of web documents, among them are those based on PageRanks and anchor texts, which substantially contributed to the popularity of the Google search engine.
Several machine learning based ranking methods have been proposed, including RankSVM, RankNet and GBrank. Although these ranking methods are quite different in terms of ranking models and optimization techniques, all of them can be regarded as “local ranking”, in the sense that the ranking model is defined on a single document. More particularly, in “local ranking” the ranking score of a current document is largely based on the feature vector for the document without considering the possible relationships that the document may have with other documents to be ranked. For many applications, the local ranking of a document is only a loose approximation, since relational information among documents typically exists, e.g., in some cases two similar documents are preferred having similar relevance scores, and in other cases a parent document should be potentially ranked higher than its child documents.
A ranking model uses both local, as defined on a single document, and global, and as defined on more than one document, information, and provides an improved ranking of the documents, or other search items, as a function of all the documents to be ranked. In accordance with one or more embodiments, the ranking model uses user click data, users' click decisions among different documents displayed in a search session, which tend to rely on the relevance judgment of a single document and on the relative relevance among the documents displayed; and user click sequences as an indicator of the relevance of the documents with regard to the query.
In accordance with one or more embodiments, relevance information is extracted from user click data via global ranking. A global ranking framework of modeling user click sequences using one or more sequential supervised methods, such as, without limitation, conditional random field (CRF), sliding window and recurrent sliding window methods, or frameworks, is described. In accordance with one or more embodiments, the sliding and/or recurrent slicking window method can be implemented using the GBrank training method.
In accordance with one or more embodiments, a method is provided, the method comprising training a relevance prediction model using data for a plurality of queries, the data for a query comprising information identifying the query and documents of a result set retrieved using the query, the data further comprising user click information identifying each user click and corresponding document in the result set and a time of the user click, the training comprising determining a plurality of feature vector sets corresponding to the plurality of queries, a feature vector set for a query comprising a feature vector for each document in the result set of the query, the feature vector identifying a plurality of features and a corresponding plurality of feature values, the plurality of features for a document comprising at least one feature that relates the document to at least one other document in the result set of the query using the user click information to determine whether or not a user click sequence involving the document and the at least one other document exists, determining a plurality of label sets corresponding to the plurality of queries, a label set for a query comprising a label for each document in the result set of the query, the label comprising an assessment of the document's relevance to the query, and generating the relevance prediction model using the feature vector and label sets. Ranking predictions are obtained for the documents in a result set of a query using the relevance prediction model.
In accordance with one or more embodiments, a system comprising at least one server is provided, the at least one server comprising a training data generator, a relevance predictor model generator, and a relevance predictor. The training data generator uses data for a plurality of queries to determine a plurality of feature vector sets and a plurality of label sets corresponding to the plurality of queries, the data for a query comprising information identifying the query and documents of a result set retrieved using the query, the data further comprising user click information identifying each user click and corresponding document in the result set and a time of the user click, a feature vector set for a query comprising a feature vector for each document in the result set of the query, the feature vector identifying a plurality of features and a corresponding plurality of feature values, the plurality of features for a document comprising at least one feature that relates the document to at least one other document in the result set of the query using the user click information to determine whether or not a user click sequence involving the document and the at least one other document exists, and a label set for a query comprising a label for each document in the result set of the query, the label comprising an assessment of the document's relevance to the query. The a relevance predictor model generator generates a relevance prediction model using the plurality of feature vector and label sets, and the relevance predictor obtains, using the generated relevance prediction model, ranking predictions for documents in a result set of a query.
In accordance with one or more embodiments, a computer-readable medium is provided, which medium tangibly stores thereon computer-executable process steps. The process steps comprise training a relevance prediction model using data for a plurality of queries, and obtaining ranking predictions for documents in a result set of a query using the generated relevance prediction model. The data for a query comprising information identifying the query and documents of a result set retrieved using the query, the data further comprising user click information identifying each user click and corresponding document in the result set and a time of the user click. Training a relevance prediction model using the data for a plurality of queries comprises determining a plurality of feature vector sets corresponding to the plurality of queries, a feature vector set for a query comprising a feature vector for each document in the result set of the query, the feature vector identifying a plurality of features and a corresponding plurality of feature values, the plurality of features for a document comprising at least one feature that relates the document to at least one other document in the result set of the query using the user click information to determine whether or not a user click sequence involving the document and the at least one other document exists, determining a plurality of label sets corresponding to the plurality of queries, a label set for a query comprising a label for each document in the result set of the query, the label comprising an assessment of the document's relevance to the query, and generating the relevance prediction model using the feature vector and label sets.
In accordance with one or more embodiments, a system is provided that comprises one or more computing devices configured to provide functionality in accordance with such embodiments. In accordance with one or more embodiments, functionality is embodied in steps of a method performed by at least one computing device. In accordance with one or more embodiments, program code to implement functionality in accordance with one or more such embodiments is embodied in, by and/or on a computer-readable medium.
The above-mentioned features and objects of the present disclosure will become more apparent with reference to the following description taken in conjunction with the accompanying drawings wherein like reference numerals denote like elements and in which:
In general, the present disclosure includes a global and topical ranking using user click data system, method and architecture.
Certain embodiments of the present disclosure will now be discussed with reference to the aforementioned figures, wherein like reference numerals refer to like components.
In accordance with one or more embodiments disclosed herein, relevance information is extracted from user click data via a global ranking framework; relational information among the documents as manifested by an aggregation of user clicks is used. Experiments on the click data collected from a commercial search engine demonstrate the effectiveness of this approach, and its superior performance over a set of widely used unsupervised methods, such as the cascade model and the heuristic rule based methods. Since user click data is inherently noisy, a supervised approach, which uses human judgment information as part of the training data used to generate a relevance predictor model, provides a degree of reliability over an unsupervised approach. Advantageously, by exploring supervised learning in click data modeling, a click model such as that disclosed in accordance with one or more embodiments can reliably extract relevance information by calibrating with human relevance judgments.
In accordance with one or more embodiments, user sequential click information is exploited, as a reliable relevance indicator for the documents displayed in a search result, and a global ranking function is trained using click information within a supervised learning framework, which uses judgments, such as human judgments, together with the click information, to train the global ranking function.
In accordance with one or more embodiments, click data from a plurality of query sessions is used to train one or more relevance predictor models, and a trained relevance predictor model is used to rank items in a search query according to relevance. In accordance with one or more embodiments, global feature vectors extracted from the training data, which takes into account click data sequences between items in a query session, is used. In accordance with one or more embodiments, a feature vector includes values extracted from training data, and the training data comprises click data corresponding to search result items.
Internet 100 is used by search engine 102 to crawl network stores 116 and as a mechanism to communicate with user device(s) 114, for example. It should be apparent that Internet 100 can be any network, including without limitation one or more of the World Wide Web, wide area network, local area network, etc.
As is discussed in more detail below, user click log 106 comprises information identifying a plurality of query, or click, sessions, each session containing information identifying the query submitted to search engine 102, the documents included in the search result set, and the click information indicating whether a document is clicked or not, and a time stamp of each click identifying the timing of each click. In accordance with one or more embodiments, training data generator 128 generates training data using data from user click log 106, such as and without limitation user click data, and human judge input received via human judge interface 118. Training data generator 128 can comprise a training data aggregator, which aggregates data from multiple sessions for a given query in accordance with one or more embodiments. In accordance with one or more embodiments, training data generator 128 can comprise a vector generator, which extracts features from the training data and generates a feature vector corresponding to a document in a search result set. In accordance with one or more embodiments, the vector generator generates a label vector identifying a relevance measure for each document in the search result set, which relevance measure is identified using human judgment input. In accordance with one or more embodiments, training data generator comprises a topical training data generator for generating training data for a given topic, or query, category.
Model generator 108 generates one or more relevance predictor models 110 using training data generated by training data generator 128. In accordance with one or more embodiments, model generator 108 uses a model generation method, such as and without limitation conditional random fields (CRF), sliding, or recurrent sliding, window method. In accordance with one or more embodiments, the sliding and/or recurrent slicking window method can be implemented using the GBrank training method. In accordance with one or more such embodiments, model generator 108 provides training data, which comprises local and global feature data corresponding to the training data, to the model generation method to generate a relevance predictor model 110. Local and global feature vectors corresponding to a set of search result items to be ranked can then be provided, by search engine 102, for example, to the relevance predictor model 110 to obtain ranking information, which is used to rank the items in the search result. In accordance with one or more embodiments, a feature vector includes values extracted from click data corresponding to the set of search result items.
A set of search results, x(q), for a query, q, that retrieves a number, n, documents, x1, x2, . . . , xn, can be expressed as follows:
x
(q)
={x
1
(q)
,x
2
(q)
,K,x
n
q} Exp. (1)
In accordance with one or more embodiments, a training data set includes a plurality of queries, a plurality of feature vectors associated with each query and a label associated with each feature vector. By way of a non-limiting example, each query has a set of search results containing at least one item, or document. As is discussed below, all or a portion, e.g., the first ten, of the documents in a search result set can be considered, and each item considered has an associated feature vector and a label. Each label used in the training data set is provided by a human judge; each label comprises information of a human judge's assessment of the relevance assessment of an item, or document, to a query. Each feature vector comprises a plurality of features and a value for each of the plurality of features. In accordance with one or more embodiments, the feature vector comprises both global and local features. In accordance with one or more embodiments, features for a query session comprise features extracted using click data for the query session. In accordance with one or more alternate embodiments, the feature vector comprises global features. In accordance with one or more embodiments, various types of click features can be used in the model and aggregated click features can be extracted from user click, or query, sessions.
Examples of features used in a model in accordance with one or more embodiments are listed in a table shown in
In accordance with one or more embodiments, a feature's value is based on a single query session, e.g., one user's interaction with a search result set returned for a given query. In such a case and by way of a non-limiting example, the Position feature identifies the position, or rank, of the document in the search result set, e.g., a location as the first, second, third, etc. for display by the user's device 114. A query can be associated with multiple sessions, e.g., more than one user enters the same query, the same user enters the same query multiple times, etc. Each session has associated click data, which can be used to determine feature values. In accordance with one or more embodiments, multiple sessions for the same query are aggregated to determine the query's feature vector values. By way of a non-limiting example, the aggregate is determined to be the average of the feature values determined for each query session used to generate the aggregate. By way of a non-limiting example, an aggregate value of the Position feature identifies the average position of the document in the multiple sessions considered for the same query. In accordance with one or more embodiments, feature data is extracted from training data aggregated for a query, i.e., an aggregated query session. In accordance with one or more such embodiments, the aggregated query session data can be expressed as, for example:
<q, 10-document list, an aggregation of user clicks> Exp. (2)
With reference to Exp. (1) above, where aggregate session data is used in accordance with at least one embodiment, Exp. (1) denotes a sequence of feature vectors extracted from the aggregated sessions, with xi(q) representing the feature vector extracted for the document i. More particularly, in accordance with one or more embodiments, to form vector xi(q), a feature vector xi,j(q) is extracted from click data for each user, j, where jε{1, 2, K}, xi(q) is formed by averaging over {xi,j(q), ∀jε{1, 2, K}},i.e., xi(q) is an aggregated feature vector for document i.
In the example shown in
Session data such as that shown in
For purposes of training the model, in accordance with one or more embodiments, each query-document pair is assigned a label by human judges with y′i(q)=f(xi(q)), ∀=1, K, n, in Exp. (3) below representing the sequence of assigned relevance labels. One or more human judges can be used to identify a relevance label for each of the documents, x. The relevance labels assigned by human judge(s) for the documents retrieved in query, q, as identified in Exp. (1), can be expressed as follows:
y
(q)
={y
1
(q)
,y
2
(q)
,K,y
n
(q)}, Exp. (3)
where y1 represents a human judge's relevance label for document x1, y2 represents a human judge's relevance label for document x2, etc. In accordance with one or more embodiments, each query-document pair is assigned a relevance label from an ordinal set. By way of a non-limiting example, a set of relevance labels can be as follows:
{Perfect, Excellent, Good, Fair, Bad}, Exp. (4)
each of which indicate a degree to which a document is relevant to a query, with Perfect being used to indicate a greatest degree of relevance and Bad being used to indicate the least degree of relevance, for example. In accordance with one or more embodiments, the relevance labels can be given a numeric value, such as without limitation, from 0 to 4, with Bad having a value of 0 and Perfect having a value of 4.
Each feature vector in the training set corresponds to a document in a set of search results for a query, and comprises a value for each feature in a set of features. By way of a non-limiting example, a feature vector, xdoc
x
doc
(q)
=v
1
(q,doc
)
,v
2
(q,doc
)
,K,v
n
(q,doc
), Exp. (5)
where n is the number of feature vectors. By way of an example, if the feature vector contains values for the features shown in
Data store 104 stores resources retrieved by the crawler component of search engine 102. In addition, data store 104 can store one or more sets of training data. One or more of the relevance predictor models 110 generated by the model generator 108 are used by relevance predictor 112 to generate a relevance prediction for a document and query pair. A relevance prediction generated by relevance predictor 112 can be used by search engine 102 in one or more of its functions, e.g., crawling, searching, and ranking In accordance with one or more embodiments, data store stores human judgment data.
Local and Global Ranking
A local ranking model defines relevance for a single document, and relevance prediction using a local ranking model, f, can be expressed, without limitation, as follows:
y
i
(q)
=f(xi(q)),∀=1,K,n, Exp. (6)
where y1 represents a predicted, or estimated, relevance label for a document, xi in the set of documents x1 to xn retrieved for query, q, the relevance label being determined using a local ranking model, f.
In contrast to a local ranking model, a global ranking model takes into account all of the documents x1 to xn for a query, q, as its inputs and uses both local and global information for the documents. By way of a non-limiting example, relevance prediction using a global ranking model, F, can be expressed as follows, for example:
y
i
(q)
=F(x(q)), Exp. (7)
In accordance with one or more embodiments disclosed, a global relevance prediction model, which uses local and global information among the documents to produce a document rank, is provided. In accordance with one or more embodiments, the function, F, in Exp. (7) can be learned from the training data, as discussed herein, using a training method, such as and without limitation, a CRF, sliding window method or recurrent sliding window training method adapted to use global ranking.
A local model is defined on a single document, and is therefore incapable of modeling user interactions with the documents in search results. In contrast, a global model advantageously can take into account sequential click data for all the documents in a search result, or an aggregate search result, and can predict relevance labels of all the documents jointly. By way of a non-limiting example, sequential click patterns embedded in an aggregation of user clicks can provide substantial relevance information of the documents displayed in the search results. An average number of sessions for a query in which a document at a certain position is skipped (not clicked) from all the sessions for the query is referred to herein as a skip rate. Empirically, in considering the skip rates for three relevance grades—Perfect, Good and Bad—observation shows that the skip rates are substantially higher for documents at the bottom of the result set regardless of the relevance grades of the documents. Documents with a Perfect relevance label generate more clicks at the top positions, but documents with Bad relevance label also garner substantial clicks on par with documents having a Good relevance label. This demonstrates that users tend to click the top documents even though the relevance grades of the documents are low and the raw click frequencies alone will not be a reliable indicator of relevance. Advantageously, information identifying the sequential nature of user clicks can be used in accordance with one or more embodiments. By way of a non-limiting example, with regard to a query: pregnant man, data identifying the sequence of clicks in a query session can be examined in connection with positions of documents in the result set. Two documents, referred to based on their respective positions in the result set as the second and third documents, have relevance labels Good and Excellent, respectively. The click logs from query, or click, sessions, indicate that there are 521 sessions with at least one click on the second document and 340 sessions on the third one. Relying on click frequency, even after discounting the factor of click frequency difference caused by ranking positions at 2 and 3, it is possible that one can be misled to an incorrect conclusion that the second document is more relevant than the third one. However, from examination of the data, there are 266 sessions where the second document, the document labeled Good, is clicked before the third document labeled Excellent, while there are only 12 sessions in which a reversed click order is observed. This sequential click pattern explains the “relevance disorder, i.e., most of the time, the users who clicked the second document labeled Good were dissatisfied with the information they acquired, and proceeded to click the third one labeled Excellent; however, if the users clicked the third document labeled Excellent, they seldom needed to click the second one labeled Good, indicating the higher relevance of the third document relative to the second document. Similar scenarios and sequential click patterns can be observed using other aggregated sessions. The example illustrates that sequential click patterns embedded in an aggregation of user clicks can provide substantial relevance information of the documents displayed in the search results.
In accordance with one or more embodiments, global ranking comprises ranking-targeted sequential learning. In accordance with at least one embodiment, click modeling uses a sequence of aggregated click features (statistics), rather than using single user's click sequence, as an input to the global ranking For a given query, generally, different users, or even the same user at different times, may have different click sequences, and some are actually quite different from others; but over many user sessions, certain consistent patterns may emerge, and can form the basis for the click model used to infer the relevance labels of the documents.
Training Data
In accordance with one or more embodiments, data collected from a commercial search engine for a period of time is obtained and used to generate training data. The collected data comprises information identifying a plurality of query, or click, sessions, where each session contains information identifying the query submitted to the search engine, the documents displayed in the result set, and the click information indicating whether a document is clicked or not, and the click time stamps. In accordance with one or more embodiments, a subset of the documents, e.g., the top ten documents in each user click session, such as the documents displayed in the first page of the result set. In some cases, in response to query input, search engines may return the top ten documents in varying orders, or some new documents may appear in the top ten documents due to search infrastructure changes and/or ranking feature updating. In accordance with at least one of the embodiments, all of the user sessions in the collection involving the same query are aggregated, and the user sessions that have the most frequent top ten documents are selected for the collection. The aggregate data for a query can be expressed using Exp. (2) above. Advantageously, a unique aggregated session can be used for each query in the dataset.
In accordance with one or more embodiments, each query-document pairing is assigned a label from an ordinal set identified in Exp. (4) to indicate the degree of relevance of the document with respect to the query in question, and to calculate click statistics and analyze user click behaviors. In accordance with one or more embodiments, the label is assigned using human judge input.
In accordance with one or more embodiments, user click data is collected from a commercial search engine over a certain period of time; a number of queries, such as and without limitation, 9677 queries, and corresponding sessions, such as and without limitation 9677 aggregated sessions), from the user click logs 106 that are both frequently queried by the users and have click rates over 1.0, where the click rate is defined as follows:
where i is an index into the sessions of a query.
Such a selection of queries ensures that each aggregated session will have enough user clicks to accumulate statistically significant click features. Input from human judges to label the top ten documents of each of the 9677 queries is obtained, to label each document as perfect, excellent, good, fair, or bad according to the document's degree of relevance with respect to the query. The obtained dataset can then be used to examine the performance of the proposed click modeling methods.
Conditional Random Fields (CRF) Model
Conditional random fields (CRFs) is a probabilistic model that can be used for sequential labeling in accordance with at least one embodiment of the present disclosure. Compared to hidden Markov models (HMMs), which define a joint probability distribution p(x, y) over an observation sequence x and a label sequence y, the CRF model defines a conditional probability distribution p(y|x) directly, which is used to label a sequence of observations x by selecting the label sequence y that maximizes the conditional probability. Because the CRF model is conditional, dependencies among the observations x do not need to be explicitly represented, affording the use of rich, global features of the input. Therefore, no effort is wasted on modeling the observations, and one is free from having to make unwarranted independence assumptions as required by the HMMs.
A CRF is a conditional distribution p(y|x) with an associated graphical structure, defining the dependencies among the components yi of y globally conditioned on the observations x. One structure that can be used for modeling sequences is a linear chain, and the corresponding conditional distribution is defined as follows:
where fj(yt,yt−1,x) is a transition feature function, gk(yt,x) is an observation feature function and
Λ={λ1,λ2,Λ,μ1,μ2,Λ} Exp. (10)
are the parameters to be estimated. In general, the feature functions in Exp. (9) are defined on the entire observation sequence x. To minimize computational issues and to avoid overfitting, it is possible to use a subset of x in each feature functions, and j and k in Exp. (9) iterate over arbitrary subsets of x, either in time dimension or in feature dimension.
Given independent and identically-distributed (i.i.d.) training data D={xi,yi}i−1N, where N is a number of queries, a maximum likelihood estimate can be used to compute the parameters Λ from
which is a concave function and can be optimized efficiently by using a quasi-Newton method, such as BFGS. Once the parameters Λ are determined, given a new observation sequence x*, the most probable label sequence y* can be computed by using the Viterbi function.
The following approximation can be used to produce continuous ranking scores. Besides generating the most probable label sequence y*, the Viterbi function also yields the class probabilities for each label yi in y, i.e., p(yi=g|x*), ∀iε{1, 2, . . . , T} and g ε{0, 1, 2, 3, 4}, where g denotes a relevance grade, with g=4 corresponding to Perfect and g=0 to Bad, and so on. The expected relevance can be used to convert class probabilities into ranking scores:
There is improved performance of the approximation provided by Exp. (12) over the Viterbi function. In addition, the expected relevance generated using Exp. (12) can be used to convert classification categories into soft ranking scores.
Note that the CRF discussed herein in connection with embodiments of the present disclosure approaches the ranking problem as a classification/regression problem, and optimizes the CRF parameters in a maximum likelihood estimate without considering score ranks.
(Recurrent) Sliding Window Model(s)
In accordance with one or more embodiments, a simplified sequential learning method, such as and without limitation, a sliding window method or a recurrent sliding window method, are adapted to global ranking. A sliding window method used in accordance with one or more embodiments converts the sequential supervised learning problem into an ordinary supervised learning problem. In accordance with one or more embodiments, in a ranking context, the scoring function ƒ maps a set of consecutive observations in a window of width w into a ranking score. In particular, let d=(w−1)/2 be the half-width of the window. The scoring function uses
i=(xi−d,xi−d+1,Λ,xi,Λ,xi+d−1,xi+d) Exp. (13)
as an extended feature to predict the ranking score i=ƒ({circumflex over (x)}i), ∀ε{1, 2, Λ, T}. The sliding window method provides an approximation of the CRF, which has as an advantage its simplicity, and advantageously allows classical ranking methods to be applied to the global ranking problem.
Similarly, in a recurrent sliding window method, the predicted scores of the old observations are combined with the extended feature to predict the score of the current observation. Particularly, when predicting the score for xi, available predicted scores, e.g., i−d, Λ, i−1 can be used in addition to the sliding window to form the extended feature when predicting i, i.e., the extended feature for xi becomes
i=(i−d,Λ,i−1,xi−d,xi−d+1,Λ,xi,Λ,xi+d) Exp. (14)
In contrast to the sliding window method, the recurrent sliding window method is able to capture predictive information not being captured by the simple sliding window method. By way of a non-limiting example, if xi, is being clicked and xi−1 is not, the recurrent sliding window method likely will predict the relevance, i, of document xi to be greater than the relevance i−1 of document xi−1.
GBrank Model
Generally, GBrank is a learning to rank method trained on preference data, which is generated using absolute and/or relative relevance judgments, or labels. In accordance with one or more embodiments, human judgments are also referred to herein as absolute relevance judgments, with each judgment corresponding to a query-document pair and indicating a degree of relevance of the document to the query; relevance judgments extracted from clickthrough data, such as and without limitation user clickthroughs of search results, or converted from the absolute relevance judgments, are referred to as relative relevance judgment. By way of a non-limiting example, a user's on a document in a set of search results can be considered an implicit preference over another document in the set. As is discussed in more detail below, further analysis can be done to determine preferences using the clickthrough data. Absolute and/or relative judgments can be used to generate the preference data. In accordance with one or more embodiments, preference data is in the form of pair-wise comparisons, i.e., one document is more relevant than another with respect to a query. By way of a non-limiting example, given a query q and two documents u and v, if u has a higher human relevance label than v, e.g., Perfect versus Good, the preference u φ v, where φ indicates that the element to the left of the symbol is preferred over the element to the right of the symbol, is included in the extracted preference set, and vice versa. The relevance assigned to the documents by human judges can be considered for all pairs of documents within a search session that have unequal relevance labels. By considering all the queries in the dataset, a set of preference data can be extracted, which can be denoted as:
S={
u
i
,v
i
|u
i
φv
i
,i=1,2,Λ,M} Exp. (15)
The learning to ranking function is cast as computing a ranking function h, such that h matches a given set of preferences as close as possible, e.g., h(ui)≧h(vi), if uiφvi, i=1, 2, Λ, M. A squared hinge loss function can be used as a smooth surrogate of the total number of contradicting pairs in given preference data with respect to the function h. It can be said that u φ v is a contradicting pair with respect to h if h(u)<h(v). The following objective function, a squared hinge loss, can be used, in accordance with one or more embodiments, to measure the risk, R, of a given ranking function h:
and the following minimization can be solved for:
where H is a function class, chosen to be linear combinations of regression trees, in accordance with one or more embodiments. The minimization problem can be solved by using functional gradient descent. The following provides a GBrank method for use in learning ranking function h using gradient boosting in accordance with one or more embodiments.
Start with an initial guess of h, h0, for k=1; 2; . . . , K, where K is a number of iterations:
1. Using hk−1 as the current approximation of h, S is separated into two disjoint sets, as follows:
S
+={(ui,vi)εS|hk−1(ui)≧hk−1(vi)+τ},
where τ is a fixed constant value such as and without limitation 0<τ≧1
and
S
−={(ui,vi)εS|hk−1(ui)<hk−1(vi)+τ}
2. Fit a regression function (decision tree) gk(x) on the following training data
(ui,[hk−1(vi)−hk−1(ui)+τ]),
(vi,[hk−1(vi)−hk−1(ui)+τ]),∀ui,viεS−
3. Form the new ranking function as hk(x)=hk−1(x)+ηgk(x), where η is a shrinkage factor.
In accordance with one or more embodiments, the shrinkage factor, η, and the number of iterations K, can be determined using cross-validation.
In accordance with one or more embodiments, step 504 is an optional step, at which multiple sessions for the same query are aggregated, as discussed herein. At step 506, feature data is extracted using the training data obtained at step 502, and optionally at step 504. As discussed herein, in accordance with one or more embodiments, one or more features are used to represent relationships between documents determined using the presence and/or absences of document click sequences identified using the training data. It should be apparent that additional features, such as and without limitation features of the documents and/or query, can be used in combination with the document click sequence features to train a model in accordance with one or more embodiments.
In accordance with one or more embodiments, a supervised approach is used to train a model using relevance labels obtained at step 508; a relevance label is associated with a query-document pair and identifies a relevance of the document to the query. In accordance with one or more embodiments, the relevance labels are obtained from human judges that assess the relevance of the document to the query and assign a score based on the assessment. In accordance with one or more embodiments disclose herein, a relevance label for a document, or document pair, can be determined using click data. At step 510, one or more relevance predictor models 110 are generated using the feature and label vectors from steps 506 and 508.
Referring again to
Topical Ranking
In accordance with one or more embodiments, relevance predictor model(s) 110 comprises a general relevance predictor model and/or a plurality of topical relevance predictor models, each topical model corresponding to a topic, or a query category. By way of some non-limiting examples, query categories can include a category of navigation queries, a category of news queries, a category of product categories, etc. In accordance with one or more embodiments, an analyzer, e.g., a query linguistic analyzer, can be used to segment a query into one or more tags and identify a type, e.g., a semantic concept, meaning, etc. for each identified tag. In accordance with one or more such embodiments, topical training data generator of the training data generator 128 can comprise the linguistic analyzer. The output of query linguistic analyzer, e.g., tag and tag type, is used to determine whether a query document pair belongs to a topic or topic class. By way of some non-limiting examples, a tag having a product-related type, such as product brand, manufacturer name, model number, etc., can be considered to belong to a product topic class; and person-related tags, e.g., person name tag type can be considered to belong to a person class. More than one tag type can be used to identify a topic or topic class. By way of another non-limiting example, a query that contains tags of type business name and a location-related tag type, such as street name, city name, state name, etc., can be considered to belong to a local query topic class.
In accordance with one or more embodiments, relevance predictor model generator 108 uses the output of the query linguistic analyzer to identify queries to obtain training data to train a topical relevance predictor model 110, which is then used by relevance predictor module 112 to rank documents in a set of search results retrieved using a query determined to fall in the topic or category for which the topical relevance predictor model 110 was generated. In accordance with one or more embodiments, the query linguistic analyzer can be used by relevance predictor module 112 to identify a category or topic for a query, and then select a topical relevance predictor model 110 corresponding to the identified category or topic of the query. In accordance with one or more embodiments, the relevance predictor module 112 can use the selected topical relevance predictor model 110 alone or in combination with a generic relevance predictor model 110, both of which can be generated by the relevance predictor model generator 108 in accordance with one or more embodiments of the present disclosure.
In accordance with one or more embodiments, a topical ranking uses a dedicated model for the queries belonging to the category (topic). Such a dedicated model can be trained based on the labeled data belonging to this topic, which is referred to herein as dedicated training data. However, the amount of dedicated training data for one topic is usually insufficient, primarily due to the cost and time involved in obtaining the relevance labeling from human judges for training data needed to generate a topical relevance predictor model 110 for the topic.
In accordance with one or more embodiments, clickthrough data is extracted and incorporated with dedicated training data to generate a topical relevance predictor model 110 for a topic. By way of a non-limiting example, the clickthrough data is extracted by a topical training data generator of training data generator 128. Advantageously, the clickthrough data is used to address insufficiencies, absence or paucity, of human judgment relevance labels for training data used in topical ranking In accordance with one or more embodiments, clickthrough data is used to generate a relevance predictor model 110 for a given query topic, or category. In accordance with one or more such embodiments, pair-wise preference data is generated and is input to relevance predictor model generator 108, which uses a GBrank method, to train a topical relevance predictor model 110 for a given topic, or query, category.
Embodiments of the present disclosure can use various methods, or strategies, to extract relative relevance, or pair-wise, judgments from clickthrough data. Advantageously, use of such methods, or strategies, can minimize biases and other potential errors in interpreting individual click behavior, click information from different query sessions is aggregated before applying heuristic rules. In accordance with one or more embodiments, heuristic rules are used to extract skip-above pairs and skip-next pairs, using skip above, which is also referred to as click>skip above, and the skip next, which is also referred to as click>no-click next, strategies. The skip above strategy proposes that given a clicked-on document, any document in a higher position in the result set displayed to the user that was not clicked on can be considered to be less relevant. The skip next strategy proposes that for two adjacent documents in the search result set, if the first document, i.e., the document immediate above the second document in the result set displayed to the user, is clicked on, but the second is not, the first document can be considered to be more relevant than the second document. In accordance with one or more embodiments, the skip above strategy can be used to identify pair-wise preferences, or judgments, between two documents in an order that is the reverse of the order used to position the documents in the result set, and the skip next strategy can be used to confirm the result set order. Alternatively, the skip above strategy can indicate that the result set order is appropriate, and/or that pair-wise preferences, or judgments, between documents indicated by the result set order are appropriate, if the conditions associated with the skip above strategy are not found in the user click data; and the skip next strategy can indicate that the result set order is not accurate in a case that the conditions associated with the skip next strategy are not found in the user click data.
In accordance with one or more embodiments, for a tuple (q; url1; url2; pos1; pos2) where q represents a query, url1 and url2 are universal resource locators that represent two documents, pos1 and pos2 represent the respective ranking positions of the two documents in a one or more sets of search results, with pos1>pos2, to indicate that url1 has higher rank than url2. In accordance with one or more embodiments, metrics, such as and without limitation, those shown in
In accordance with one or more embodiments, a skip-above pair-wise judgment is found between url1 and url2: if ncc is much larger than cnc, in accordance with a first threshold, and
are both much smaller than 1, in accordance with a second threshold. If these conditions exist and url1 is ranked higher than url2 in query q, most users clicked on url2 but did not click url1. In this case, a skip-above pairing is identified for url1 and url2, i.e., url2 is more relevant than url1. In accordance with one or more embodiments, in order to have highly accurate skip-above pairs, a set of thresholds are applied to only extract the pairs that have a high impression and ncc exceeds cnc by a large enough margin. In accordance with one or more such embodiments, the first threshold is used in connection with the “much larger” determination between ncc and cnc; such that a difference between ncc and cnc satisfies the first threshold indicating an acceptable degree, or margin, of difference between ncc and cnc. Furthermore and in accordance with one or more such embodiments, the second threshold is used in connection with the “much smaller” determination, such that the differences between
and 1, and
and 1 satisfy a second threshold indicating an acceptable degree, or margin, of difference. In accordance with one or more embodiments, the second threshold can be a single threshold, or two separate thresholds, each of which corresponds to one of the “much smaller” determinations.
In accordance with one or more embodiments, a skip-next pair-wise judgment is found: if pos1=pos2−1, indicating that url1 is positioned immediately above url2 in the search results, cnc is much larger than ncc, in accordance with a first threshold, and
are both much smaller than 1, in accordance with a second threshold. If these conditions exist and url2 is ranked, or positioned, immediately below url1 in query q, most users click url1 but do not click url2. In this case, this tuple is regarded as a skip-next pairing. In accordance with one or more such embodiments, the first threshold is used in connection with the “much larger” determination between cnc and ncc; such that a difference between cnc and ncc satisfies the first threshold indicating an acceptable degree, or margin, of difference between cnc and ncc. Furthermore and in accordance with one or more such embodiments, the second threshold is used in connection with the “much smaller” determination, such that the differences between
and 1, and
and 1 satisfy a second threshold indicating an acceptable degree, or margin, of difference. In accordance with one or more embodiments, the second threshold can be a single threshold, or two separate thresholds, each of which corresponds to one of the “much smaller” determinations.
In accordance with one or more embodiments, other pair-wise strategies can be used to identify pair-wise relevance judgments, and preferences, using clickthrough data. In accordance with one or more embodiments, with the GBrank method: for each pair-wise preference, if a pair-wise ordering of a current ranking function contradicts the pair-wise preference, the current ranking function, h, is modified to optimize its agreement with the pair-wise preference, as closely as possible without impacting its overall agreement with the preferences as a whole, i.e., to minimize the error or differences between the estimated ranking(s) generated by the ranking function, h, and the ranking(s) suggested by the preference data.
Data store 808, which can include data store 104, can be used to store training and/or evaluation data sets, click logs, resources associated with URLs, relevance predictor models, absolute and/or relative judgments and/or preference data; and/or program code to configure a server 802 to execute the search engine 102, relevance predictor model generator 108 and/or relevance predictor 112, training data generator 128, human judgment interface 118, configuration information, etc.
The user computer 804, and/or user device 114, can be any computing device, including without limitation a personal computer, personal digital assistant (PDA), wireless device, cell phone, internet appliance, media player, home theater system, and media center, or the like. For the purposes of this disclosure a computing device includes a processor and memory for storing and executing program code, data and software, and may be provided with an operating system that allows the execution of software applications in order to manipulate data. A computing device such as server 802 and the user computer 804 can include one or more processors, memory, a removable media reader, network interface, display and interface, and one or more input devices, e.g., keyboard, keypad, mouse, etc. and input device interface, for example. One skilled in the art will recognize that server 802 and user computer 804 may be configured in many different ways and implemented using many different combinations of hardware, software, or firmware.
In accordance with one or more embodiments, a computing device 802 can make a user interface available to a user computer 804 via the network 806. The user interface made available to the user computer 804 can include content items, or identifiers (e.g., URLs) selected for the user interface based on relevance, or ranking, prediction(s) generated in accordance with one or more embodiments of the present invention. In accordance with one or more embodiments, computing device 802 makes a user interface available to a user computer 804 by communicating a definition of the user interface to the user computer 804 via the network 806. The user interface definition can be specified using any of a number of languages, including without limitation a markup language such as Hypertext Markup Language, scripts, applets and the like. The user interface definition can be processed by an application executing on the user computer 804, such as a browser application, to output the user interface on a display coupled, e.g., a display directly or indirectly connected, to the user computer 804.
In accordance with one or more embodiments, computing device 802 can serve content to a user computer 804 executing a browser application via a network 806. In accordance with one or more embodiments, computing device 802 can serve search results to a user computer 804 in response to receiving a query received from user computer 804, and receive click data in the form of URL selections, for example. In accordance with one or more embodiments, human judge interface 118 can comprise one or more web pages identifying a query and documents in a result set generated using the query, and at least one computing device 802 configured to transmit the one or more web pages for display at the user computer 804 for the judge, and to receive the judge's input, which includes the judge's assessment of a document's relevance to a query.
In an embodiment the network 806 may be the Internet, an intranet (a private version of the Internet), or any other type of network. An intranet is a computer network allowing data transfer between computing devices on the network. Such a network may comprise personal computers, mainframes, servers, network-enabled hard drives, and any other computing device capable of connecting to other computing devices via an intranet. An intranet uses the same Internet protocol suit as the Internet. Two of the most important elements in the suit are the transmission control protocol (TCP) and the Internet protocol (IP).
It should be apparent that embodiments of the present disclosure can be implemented in a client-server environment such as that shown in
Memory 904 interfaces with computer bus 902 so as to provide information stored in memory 904 to CPU 912 during execution of software programs such as an operating system, application programs, device drivers, and software modules that comprise program code, and/or computer-executable process steps, incorporating functionality described herein, e.g., one or more of process flows described herein. CPU 912 first loads computer-executable process steps from storage, e.g., memory 904, fixed disk 906, removable media drive, and/or other storage device. CPU 912 can then execute the stored process steps in order to execute the loaded computer-executable process steps. Stored data, e.g., data stored by a storage device, can be accessed by CPU 912 during the execution of computer-executable process steps.
Persistent storage, e.g., fixed disk 906, can be used to store an operating system and one or more application programs. Persistent storage can also be used to store device drivers, such as one or more of a digital camera driver, monitor driver, printer driver, scanner driver, or other device drivers, web pages, content files, playlists and other files. Persistent storage can further include program modules and data files used to implement one or more embodiments of the present disclosure, e.g., listing selection module(s), targeting information collection module(s), and listing notification module(s), the functionality and use of which in the implementation of the present disclosure are discussed in detail herein.
For the purposes of this disclosure a computer readable medium stores computer data, which data can include computer program code executable by a computer, in machine readable form. By way of example, and not limitation, a computer readable medium may comprise computer storage media and communication media. Computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer.
Those skilled in the art will recognize that the methods and systems of the present disclosure may be implemented in many manners and as such are not to be limited by the foregoing exemplary embodiments and examples. In other words, functional elements being performed by single or multiple components, in various combinations of hardware and software or firmware, and individual functions, may be distributed among software applications at either the client or server or both. In this regard, any number of the features of the different embodiments described herein may be combined into single or multiple embodiments, and alternate embodiments having fewer than, or more than, all of the features described herein are possible. Functionality may also be, in whole or in part, distributed among multiple components, in manners now known or to become known. Thus, myriad software/hardware/firmware combinations are possible in achieving the functions, features, interfaces and preferences described herein. Moreover, the scope of the present disclosure covers conventionally known manners for carrying out the described features and functions and interfaces, as well as those variations and modifications that may be made to the hardware or software or firmware components described herein as would be understood by those skilled in the art now and hereafter.
While the system and method have been described in terms of one or more embodiments, it is to be understood that the disclosure need not be limited to the disclosed embodiments. It is intended to cover various modifications and similar arrangements included within the spirit and scope of the claims, the scope of which should be accorded the broadest interpretation so as to encompass all such modifications and similar structures. The present disclosure includes any and all embodiments of the following claims.