Claims
- 1. A method of fulfilling an information need employing an index stored on a computer-readable medium and comprised of preanalyzed contexts of terms appearing within a plurality of documents, comprising the steps of:
receiving a query comprised of one or more fully specified terms and an information need, wherein the information need is represented by one or more at least partially unspecified terms; identifying contexts in the index that contain the one or more fully specified terms and zero or more at least partially unspecified terms; and locating one or more matches for the information need within the identified contexts.
- 2. The method of claim 1, further comprising the step of:
identifying documents in the index that contain the one or more at least partially unspecified terms.
- 3. The method of claim 1, wherein the locating step further comprises:
converting the query into a finite state machine; and matching the finite state machine against the identified contexts.
- 4. The method of claim 3, wherein the finite state machine is a finite state transducer.
- 5. The method of claim 3, wherein the finite state machine allows for the appearance of fully specified and at least partially unspecified terms in any order in a potential matching context.
- 6. The method of claim 3, wherein the finite state machine allows for one or more intervening words between the fully specified and at least partially unspecified terms in a potential matching context.
- 7. A method of fulfilling an information need employing an index stored on a computer-readable medium and comprised of preanalyzed contexts of terms appearing within a plurality of documents, information indicating category restrictions that the terms and contexts satisfy, and identifiers of the documents and contexts containing the terms, comprising the steps of:
receiving a query comprised of one or more fully specified terms and an information need and at least a partial restriction on the order that the one or more fully specified terms and the information need may appear in a potential matching context, wherein the information need is represented by one or more at least partially unspecified terms reflecting a category restriction; identifying contexts in the index that contain the one or more fully specified terms and the one or more at least partially unspecified terms in the specified order; locating one or matches for the information need within the identified contexts.
- 8. The method of claim 7, further comprising the step of:
identifying documents in the index that contain the one or more fully specified terms and the one or more at least partially unspecified terms in the specified order.
- 9. The method of claim 7, wherein the locating step further comprises:
converting the query into a finite state machine; and matching the finite state machine against the identified contexts.
- 10. The method of claim 9, wherein the finite state machine is a finite state transducer.
- 11. The method of claim 9, wherein the finite state machine assigns a dissimilarity weight to appearances of fully specified and at least partially unspecified terms in potential matching contexts in orders other than that specified in the query.
- 12. The method of claim 9, wherein the finite state machine allows for one or more intervening words between the fully specified and at least partially unspecified terms in a potential matching context.
- 13. A method of fulfilling an information need employing an index stored on a computer-readable medium and comprised of preanalyzed contexts of terms appearing within a plurality of documents, comprising the steps of:
receiving a query comprised of one or more fully specified terms and an information need, wherein the information need is represented by one or more at least partially unspecified terms; converting the query into a Boolean expression; identifying contexts identifiers satisfying the Boolean expression; and locating one or more matches for the information need within the identified contexts.
- 14. The method of claim 13, wherein the identifying step further comprises the steps of:
identifying document identifiers satisfying the Boolean expression.
- 15. The method of claim 1, 7 or 13, further comprising the steps of:
converting the query into a finite state machine; and matching the finite state machine against the contents of the contexts associated with the identified context identifiers.
- 16. The method of claim 1, 7 or 13, wherein the contexts are stored as finite state machines.
- 17. The method of claim 1, 7 or 13, wherein the documents are accessible over the Internet.
- 18. The method of claim 1, 7 or 13, wherein the documents comprise World Wide Web pages.
- 19. The method of claim 1, 7 or 13, further comprising the step of:
accumulating information about the one or more matches as they are located.
- 20. The method of claim 1, further comprising the step of:
assigning a score to a match.
- 21. The method of claim 20, wherein the score reflects the number of times an instance of the match is located among the plurality of documents.
- 22. The method of claim 1, further comprising the step of:
outputting one or more of the matches, or a portion thereof, thereby providing a result for the query.
- 23. The method of claim 22, further comprising the step of:
outputting identifiers or locations of one or more of the documents that contains a match or portion thereof that was output in the outputting step.
- 24. The method of claim 23, wherein a location or a document comprises a uniform resourese locator.
- 25. The method of claim 23, further comprising the step of:
ranking the documents that contain a match, and wherein the second outputting step comprises outputting the document identifiers or locations of the documents that contain a match in an order based on the ranking.
- 26. The method of claim 25, wherein the ranking step comprises ranking a document based on the number of times a match is located within the document.
- 27. The method of claim 1, wherein the category restriction comprises a morphological feature.
- 28. The method of claim 1, wherein the category restriction comprises a syntactic feature.
- 29. The method of claim 1, wherein the category restriction comprises a computer program.
- 30. The method of claim 1, further comprising the step of:
storing a match or a portion thereof.
- 31. The method of claim 30, further comprising the step of:
storing a score for the match or portion thereof.
- 32. The method of claim 1, further comprising the step of:
storing a plurality of matches or portions thereof.
- 33. The method of claim 1, further comprising the step of:
storing a score for a plurality of matches or portions thereof.
- 34. The method of claim 1, wherein the index comprises locations of terms within documents.
- 35. The method of claim 34, wherein the locating step further comprises:
determining the location of a term in the query within a document using the index; and locating a match for the query based on the location of the term within the document.
- 36. The method of claim 1, further comprising the step of:
ranking a plurality of the located matches or portions thereof.
- 37. The method of claim 36, wherein the ranking step comprises:
ranking a located match or a portion thereof based on the content of a plurality of documents identified in the identifying step.
- 38. The method of claim 36, wherein the ranking step comprises:
ranking a located match or a portion thereof based on the content of a majority of documents identified in the identifying step.
- 39. The method of claim 36, wherein the ranking is based on one or more features selected from the list consisting of: the location of a match within a document, a weight assigned to a document that contains a match, the age of a document that contains a match, the source of a document that contains a match, and a format feature of a match within a document.
- 40. The method of claim 36, wherein the ranking step comprises:
ranking a located match or a portion thereof based on the number of times an instance of the match is located within a plurality of documents identified in the identifying step.
- 41. The method of claim 36, wherein the ranking step comprises:
ranking a located match or a portion thereof based on the number of times an instance of the match is located within a majority of documents identified in the identifying step.
- 42. The method of claim 36, 38 or 39, further comprising the step of:
outputting one or more of the located matches, or one or more portions thereof, in an order based on the ranking, thereby providing a result for the query.
- 43. The method of claim 42, further comprising the step of:
outputting an indication of the ranking of a located match or portion thereof.
- 44. A method of fulfilling an information need based on documents stored on a computer-readable medium comprising the steps of:
storing an index identifying documents containing terms; storing contexts for terms, wherein a context occurs in a document; receiving a query containing an unspecified portion; and identifying one or more matches for the query within the contexts.
- 45. A method of fulfilling an information need based on documents stored on a computer-readable medium comprising the steps of:
storing an index identifying documents containing terms; storing contexts, wherein a context occurs in a document; storing information to retrieve a list of contexts for terms; receiving a query containing an at least partially unspecified portion; and identifying one or more matches for the query within the contexts.
- 46. The method of claim 45, wherein contexts are preanalyzed.
- 47. A method of fulfilling an information need based on documents and an index stored on a computer-readable medium comprising the steps of:
storing contexts for terms, wherein the context occurs in a document; storing information identifying a document in which a context occurs; receiving a query containing an unspecified portion; and identifying a plurality of matches for the query within the contexts.48. A method of generating an index for satisfying an information need from a
- 48. A method of generating an index for satisfying an information need from a plurality of document stored on a computer-readable medium, comprising the steps of:
receiving a document; identifying a context in the document; linguistically analyzing the context;selecting a term from the document; determining if there are more terms in the context to select, and if so,
selecting another term from the context until there are no more terms to select; determining if there are more contexts to identify in the document, and if so, identifying another context in the document and repeating the term selecting step for each term in the context until there are no more contexts to identify; determining if there are more documents to receive, and if so, receiving the next document and repeating the context identifying and term selecting steps until there are no more documents to receive.
- 49. The method of claim 48, wherein linguistically analyzing the one or more contexts further comprises identifying category restrictions satisfied by the context.
- 50. The method of claim 48, further comprising the step of:
storing information indicating the number of contexts stored for a term in a document, a document identifier associated with the document, and context identifiers indicating the location of the one or more contexts stored in the context array.
- 51. The method of claim 48, wherein the analyzed contexts are stored as finite state machines.
- 52. The method of claim 48, wherein the analyzed contexts are stored as graphs.
- 53. The method of claim 48, wherein the analyzed contexts are stored as trees.
- 54. The method of claim 48, wherein the category restrictions comprise morphological features.
- 55. The method of claim 48, wherein the category restrictions comprise syntactic features.
- 56. The method of claim 48, wherein the category restrictions comprise computer programs.
- 57. An apparatus for fulfilling an information need employing an index stored on a computer-readable medium and comprised of preanalyzed contexts of terms appearing within a plurality of documents and information indicating category restrictions that the terms and contexts satisfy, comprising:
memory means that stores computer-executable process steps; and a processor that executes the process steps so as to (i)receive a query comprised of one or more fully specified terms and an information need, wherein the information need is represented by one or more at least partially unspecified terms reflecting a category restriction; (ii) identify preanalyzed contexts in the index that contain the one or more fully specified terms and the one or more at least partially unspecified terms; and (iii) locate one or matches for the information need within the identified preanalyzed contexts.
- 58. An apparatus for fulfilling an information need employing an index stored on a computer-readable medium and comprised of preanalyzed contexts of terms appearing within a plurality of documents, information indicating category restrictions that the terms and contexts satisfy, and identifiers of the documents and contexts containing the terms, comprising:
memory means that stores computer-executable process steps; and a processor that executes the process steps so as to (i) receive a query comprised of one or more fully specified terms and an information need and at least a partial restriction on the order that the one or more fully specified terms and the information need may appear in a potential matching preanalyzed context, wherein the information need is represented by one or more at least partially unspecified terms reflecting a category restriction; (ii) identify preanalyzed contexts in the index that contain the one or more fully specified terms and the one or more at least partially unspecified terms in the specified order; and (iii) locate one or matches for the information need within the identified preanalyzed contexts.
- 59. An apparatus for fulfilling an information need employing an index stored on a computer-readable medium and comprised of preanalyzed contexts of terms appearing within a plurality of documents, information indicating category restrictions that the terms and contexts satisfy, and identifiers of the documents and contexts containing the terms, comprising:
memory means that stores computer-executable process steps; and a processor that executes the process steps so as to (i) receive a query comprised of one or more fully specified terms and an information need and at least a partial restriction on the order that the one or more fully specified terms and the information need may appear in a potential matching preanalyzed context, wherein the information need is represented by one or more at least partially unspecified terms reflecting a category restriction; (ii) identify preanalyzed contexts and documents in the index that contain the one or more fully specified terms and the one or more at least partially unspecified terms in the specified order by converting the query into a Boolean expression and identifying context identifiers satisfying the Boolean expression; and (iii) locate one or matches for the information need within the identified preanalyzed contexts by converting the query into a finite state machine and matching the finite state machine against the identified preanalyzed contexts.
- 60. Computer-executable process steps stored on a computer-readable medium, the computer-executable process steps to fulfill an information need, the computer-executable process steps comprising:
code to receive a query containing an unspecified portion, the unspecified portion including an unspecified term; and code to identify one or more matches for the query within a body of information stored on a computer-readable medium.
- 61. Computer-executable process steps stored on a computer-readable medium, the computer-executable process steps to fulfill an information need, the computer-executable process steps comprising:
code to store contexts for terms, wherein a context occurs in a document, code to store information identifying a document in which a context occurs; code to receive a query containing an unspecified portion; and code to identify one or more matches for the query within the contexts.
- 62. Computer-executable process steps stored on a computer-readable medium, the computer-executable process steps to fulfill an information need, the computer-executable process steps comprising:
code to identify a plurality of matches for a partially unspecified query; and code to rank a plurality of the matches or portions thereof.
- 63. Computer-executable process steps stored on a computer-readable medium, the computer-executable process steps to fulfill an information need, the computer-executable process steps comprising:
code to identify a plurality of results for a query, the results occurring within documents; and code to rank the plurality of results based on the content of a plurality of documents in which a result is identified.
Parent Case Info
[0001] The present application is a continuation in part of co-pending U.S. patent application Ser. No. 09/559,223 entitled “System for Fulfilling an Information Need”, filed Apr. 26, 2000. The present application claims the benefit of U.S. Provisional Application No. 60/251,608 entitled “System for Fulfilling an Information Need Using Extended Matching Technique”, filed Dec. 5, 2000.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60251608 |
Dec 2000 |
US |
Continuation in Parts (1)
|
Number |
Date |
Country |
Parent |
09559223 |
Apr 2000 |
US |
Child |
10004952 |
Dec 2001 |
US |