Claims
- 1. A method for searching a corpus of documents, comprising:
defining a knowledge domain; identifying a set of reference documents in the corpus pertinent to the domain; inputting a first query; searching the corpus using the set of reference documents to find one or more of the documents in the corpus that contain information in the domain relevant to the first query; and adding at least one of the found documents to the set of reference documents for use in searching the corpus for information in the domain relevant to a second, subsequent query.
- 2. A method according to claim 1, wherein inputting the first query comprises inputting one or more search terms.
- 3. A method according to claim 2, wherein searching the corpus comprises finding lexical characteristics of terms in the reference documents and refining the search terms using the lexical characteristics.
- 4. A method according to claim 1, wherein inputting the first query comprises specifying one or more documents representative of the information to be found in the corpus.
- 5. A method according to claim 1, wherein searching the corpus comprises searching the corpus to find the documents that contain the information relevant to the query and ranking the found documents by comparing them to the set of reference documents.
- 6. A method according to claim 5, wherein ranking the found documents comprises evaluating a textual resemblance between the found documents and the reference documents.
- 7. A method according to claim 5, wherein ranking the found documents comprises assessing links between the found documents and the reference documents.
- 8. A method according to claim 5, wherein adding the at least one of the found documents comprises adding at least the document having the highest ranking.
- 9. A method according to claim 1, wherein adding the at least one of the found documents comprises removing one of the documents from the set responsive to adding the at least one of the found documents.
- 10. A method according to claim 9, and comprising tracking a level of relevance of the reference documents to the queries, and wherein removing the one of the documents comprises removing one of the reference documents whose tracked level of relevance is low.
- 11. A method according to claim 1, wherein the corpus comprises at least a part of the World Wide Web, and the documents comprise Web pages, and wherein searching the corpus comprises conveying the query to one or more Web search engines.
- 12. A method according to claim 11, wherein inputting the first query comprises receiving the query from a user of a pervasive device, and wherein searching the corpus comprises searching while the device is disconnected from the Web.
- 13. A method according to claim 1, wherein identifying the set of reference documents comprises opening one or more files of a knowledge base on a computer in which data regarding the reference documents are saved.
- 14. A method according to claim 13, wherein identifying the set of reference documents comprises identifying the set of documents used by a first user in searching the corpus, and wherein opening the one or more files comprises copying the files for use by a second user in searching the corpus for information in the domain.
- 15. A method for searching a corpus of documents containing terms, comprising:
defining a knowledge domain; identifying a set of reference documents in the corpus pertinent to the domain; finding lexical characteristics of the terms in the reference documents; inputting a search query; refining the search query using the lexical characteristics; and searching the corpus to find information in the domain responsive to the refined query.
- 16. A method according to claim 15, wherein finding the lexical characteristics comprises finding lexical affinities among the terms.
- 17. A method according to claim 16, wherein the search query comprises search terms, and wherein refining the search query comprises adding to the search terms further terms found to have lexical affinity to the search terms.
- 18. A method for searching a corpus of linked documents containing terms, comprising:
defining a knowledge domain; identifying a set of reference documents in the corpus pertinent to the domain; inputting a search query; searching the corpus to find one or more of the documents in the corpus that contain information relevant to the query; evaluating a textual resemblance between the found documents and the reference documents so as to assign respective textual scores to the found documents; assessing links between the found documents and the reference documents so as to assign respective topological scores to the found documents; and ranking the found documents with respect to their relevance to the domain responsive to the textual scores and the topological scores.
- 19. A method according to claim 18, wherein evaluating the textual resemblance comprises assessing, for each of a plurality of the terms in the found documents, a respective frequency of occurrence in the reference documents.
- 20. A method according to claim 18, wherein the documents comprise World Wide Web pages, and wherein assessing the links comprises generating a graph of the links between the pages and calculating authority weights of the nodes of the graph.
- 21. Apparatus for searching a corpus of documents, comprising:
a memory, adapted to store an identification of a set of reference documents in the corpus pertinent to a predefined knowledge domain; and a search processor, which responsive to receiving a first query as input, is adapted to search the corpus using the set of reference documents to find one or more of the documents in the corpus that contain information in the domain relevant to the first query, and to add at least one of the found documents to the set of reference documents stored in the memory for use in searching the corpus for information in the domain relevant to a second, subsequent query.
- 22. Apparatus according to claim 21, wherein the processor is adapted to find lexical characteristics of the terms in the reference documents and to refine the search query using the lexical characteristics.
- 23. Apparatus according to claim 21, wherein the processor is adapted to receive the documents found to contain the information relevant to the query and to rank the found documents by comparing them to the set of reference documents.
- 24. Apparatus according to claim 23, wherein the processor is adapted to add to the corpus at least the document having the highest ranking.
- 25. Apparatus according to claim 21, wherein the processor is adapted to remove one of the documents from the set responsive to adding the at least one of the found documents.
- 26. Apparatus according to claim 21, wherein the corpus comprises at least a part of the World Wide Web, and the documents comprise Web pages, and wherein the processor is adapted to search the corpus by conveying the query to one or more Web search engines.
- 27. Apparatus according to claim 21, wherein the processor is adapted to receive the query over a communication link from a user of a pervasive device, and to search the corpus while the communication link is disconnected.
- 28. Apparatus for searching a corpus of documents containing terms, comprising:
a memory, adapted to store an identification of a set of reference documents in the corpus pertinent to a predefined knowledge domain; and a search processor, which is adapted to find lexical characteristics of the terms in the reference documents, and responsive to receiving a query as input, is adapted to refine the search query using the lexical characteristics and to search the corpus to find information in the domain responsive to the refined query.
- 29. Apparatus for searching a corpus of linked documents containing terms, comprising:
a memory, adapted to store an identification of a set of reference documents in the corpus pertinent to a predefined knowledge domain; and a search processor, which responsive to receiving a query as input, is adapted to search the corpus to find one or more of the documents in the corpus that contain information relevant to the query, to evaluate a textual resemblance between the found documents and the reference documents so as to assign respective textual scores to the found documents, to assess links between the found documents and the reference documents so as to assign respective topological scores to the found documents, and to rank the found documents with respect to their relevance to the domain responsive to the textual scores and the topological scores.
- 30. A computer software product for searching a corpus of documents, the product comprising a computer-readable medium in which program instructions are stored, which instructions, when read by a computer, cause the computer to receive a definition of a knowledge domain and an identification of a set of reference documents in the corpus pertinent to the domain, and further cause the computer, responsive to a first query, to search the corpus using the set of reference documents to find one or more of the documents in the corpus that contain information in the domain relevant to the first query, and to add at least one of the found documents to the set of reference documents for use in searching the corpus for information in the domain relevant to a second, subsequent query.
- 31. A product according to claim 30, wherein the corpus comprises the World Wide Web, and the documents comprise Web pages, and wherein the instructions cause the computer to search the Web by conveying the query to one or more Web search engines.
- 32. A product according to claim 31, wherein the instructions cause the computer to receive the first query from a pervasive device, and to search the Web while the pervasive device is disconnected from the Web.
- 33. A computer software product for searching a corpus of documents, the product comprising a computer-readable medium in which program instructions are stored, which instructions, when read by a computer, cause the computer to receive a definition of a knowledge domain and an identification of a set of reference documents in the corpus pertinent to the domain and to find lexical characteristics of the terms in the reference documents, and further cause the computer, responsive to a query, to refine the search query using the lexical characteristics and to search the corpus to find information in the domain responsive to the refined query.
- 34. A computer software product for searching a corpus of documents, the product comprising a computer-readable medium in which program instructions are stored, which instructions, when read by a computer, cause the computer to receive a definition of a knowledge domain and an identification of a set of reference documents in the corpus pertinent to the domain, and further cause the computer, responsive to a query, to search the corpus to find one or more of the documents in the corpus that contain information relevant to the query, to evaluate a textual resemblance between the found documents and the reference documents to assign respective textual scores to the found documents, to assess links between the found documents and the reference documents to assign respective topological scores to the found documents, and to rank the found documents with respect to their relevance to the domain responsive to the textual scores and the topological scores.
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application claims the benefit of U.S. Provisional Patent Application No. 60/208,226 filed May 31, 2000, which is incorporated herein by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60208226 |
May 2000 |
US |
Continuations (1)
|
Number |
Date |
Country |
Parent |
09610705 |
Jul 2000 |
US |
Child |
10634319 |
Aug 2003 |
US |