Training a ranking function using propagated document relevance

Information

  • Patent Application
  • 20070203908
  • Publication Number
    20070203908
  • Date Filed
    February 27, 2006
    18 years ago
  • Date Published
    August 30, 2007
    17 years ago
Abstract
A method and system for propagating the relevance of labeled documents to a query to unlabeled documents is provided. The propagation system provides training data that includes queries, documents labeled with their relevance to the queries, and unlabeled documents. The propagation system then calculates the similarity between pairs of documents in the training data. The propagation system then propagates the relevance of the labeled documents to similar, but unlabeled, documents. The propagation system may iteratively propagate labels of the documents until the labels converge on a solution. The training data with the propagated relevances can then be used to train a ranking function.
Description

BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a diagram that illustrates a portion of a graph of documents.



FIG. 2 is a block diagram that illustrates components of the propagation system in one embodiment.



FIG. 3 is a flow diagram that illustrates the processing of the create ranking function component of the propagation system in one embodiment.



FIG. 4 is a flow diagram that illustrates the processing of the propagate relevance component of the propagation system in one embodiment.



FIG. 5 is a flow diagram that illustrates the processing of the build graph component of the propagation system in one embodiment.



FIG. 6 is a flow diagram that illustrates the processing of the generate weights for graph component of the propagation system in one embodiment.



FIG. 7 is a flow diagram that illustrates the processing of the normalize weights of graph component of the propagation system in one embodiment.



FIG. 8 is a flow diagram that illustrates the processing of the propagate relevance based on graph component of the propagation system in one embodiment.


Claims
  • I/We claim:
  • 1. A system for training a document ranking component, comprising: a training data store that contains training data including representations of documents and, for each query of a plurality of queries, a labeling of some of the documents with relevance of the documents to the query;a propagate relevance component that propagates relevance of the labeled documents to the unlabeled documents based on similarity between documents; anda training component that trains a document ranking component to rank relevance of documents to queries based on the propagated relevance of the documents of the training data.
  • 2. The system of claim 1 wherein the document ranking component implements a classification algorithm selected from a group consisting of a neural network algorithm, an adaptive boosting algorithm, and a support vector machine algorithm.
  • 3. The system of claim 1 wherein the document ranking component implements a regression based algorithm.
  • 4. The system of claim 1 wherein the propagate relevance component propagates relevance separately for each query and the training component trains the document ranking component using the separately propagated relevances.
  • 5. The system of claim 1 wherein the propagate relevance component propagates relevance simultaneously for multiple queries and the training component trains the document ranking component using the simultaneously propagated relevances.
  • 6. The system of claim 1 including a graph component that creates a graph with the documents represented as nodes connected by edges representing similarity between documents.
  • 7. The system of claim 6 wherein the graph component includes: a build graph component that builds a graph in which nodes representing similar documents are connected via edges; anda generate weights component that generates weights for the edges based on similarity of the documents represented by the connected nodes.
  • 8. The system of claim 7 wherein the build graph component establishes edges between nodes using a nearest neighbor algorithm.
  • 9. The system of claim 1 wherein the propagate relevance component propagates relevance using a manifold ranking based algorithm.
  • 10. A computer-readable medium containing instructions for controlling a computer system to train a document ranking component, by a method comprising: providing representations of documents along with a labeling of some of the documents that indicates relevance of a document to a query;creating a graph with the documents represented as nodes being connected by edges representing similarity between documents represented by the connected nodes;propagating relevance of the labeled documents to the unlabeled documents based on similarity between documents as indicated by the created graph and based on a manifold ranking based algorithm; andtraining a document ranking component to rank relevance of documents to queries based on the propagated relevance of the documents.
  • 11. The computer-readable medium of claim 10 wherein the document ranking component implements a classification algorithm selected from a group consisting of a Bayes net algorithm, an adaptive boosting algorithm, and a support vector machine algorithm.
  • 12. The computer-readable medium of claim 10 wherein the document ranking component implements a regression based ranking algorithm.
  • 13. The computer-readable medium of claim 10 wherein the propagating of the relevance propagates relevance separately for each query and the training of the document ranking component trains using the separately propagated relevance.
  • 14. The computer-readable medium of claim 10 wherein the propagate relevance component propagates relevance [inter-query propagation].
  • 15. The computer-readable medium of claim 10 wherein the creating of a graph includes: building a graph in which nodes representing similar documents are connected via edges; andgenerating weights for the edges based on similarity of the documents represented by the connected nodes.
  • 16. A system for training a document ranking component, comprising: a component that provides representations of documents along with a labeling of some of the documents that indicates relevance of the documents to queries;a component that creates a graph with the documents represented as nodes being connected by edges representing similarity between documents represented by the connected nodes;a component that propagates relevance of the labeled documents to the unlabeled documents based on similarity between documents as indicated by the created graph; anda component that generates a document ranking component to rank relevance of documents to queries based on the propagated relevance of the documents.
  • 17. The system of claim 16 wherein the component that propagates relevance propagates relevance based on a manifold ranking based algorithm.
  • 18. The system of claim 17 wherein the component that propagates relevance propagates relevance simultaneously for multiple queries and the component that generates the document ranking component generates the component using the simultaneously propagated relevance.
  • 19. The system of claim 16 wherein the component that creates a graph builds a graph, generates weights for the edges based on similarity of the documents represented by the connected nodes.
  • 20. The system of claim 16 wherein the document ranking component implements a regression based ranking algorithm.