Claims
- 1. A method for clustering electronic documents in response to a search query, comprising:collecting a set of electronic documents, each containing at least one occurrence of a keyword from the search query; analyzing each electronic document in said set to determine a content characteristic in a predefined neighborhood adjacent to at least one of said keywords from the search query; comparing the content characteristics of each document in said set of electronic documents to content characteristics of other documents in said set; creating a plurality of clusters of electronic documents, at least one cluster including at least two of said electronic documents in said set, wherein in a given cluster the electronic documents have overlapping content beyond a commonality of keywords from the search query; and presenting an identification of each said cluster in response to the search query.
- 2. The method of claim 1 wherein said identification of each said cluster comprises information regarding the content characteristics which formed the basis for the clustering of the documents.
- 3. The method of claim 1 wherein said set of electronic documents is collected by a search engine.
- 4. A document clustering engine for finding responses to a search query where responses are split across documents, said clustering engine comprising computer software operated on a processor and when operated performing the steps of:collecting a set of documents, each document containing at least one of a plurality of keywords from the search query; analyzing each document in said set to determine a content characteristic in a predefined neighborhood adjacent to at least one of said plurality of keywords in that document; comparing a content characteristic associated with each document in the set against the content characteristic of other documents in the set and determining a level of similarity of content characteristic for each pair of documents compared; creating a plurality of clusters of documents, wherein in a given cluster the documents have a level of similarity of content characteristic greater than a predetermined threshold; and providing an identification of each said cluster in response to the search query.
- 5. The method of claim 4 wherein said identification of each said cluster comprises information regarding the level of similarity of content characteristics which formed the basis for clustering the documents.
- 6. The method of claim 4 wherein said set of documents is collected by a search engine.
- 7. A system for clustering documents in response to a search query, comprising:means for collecting a set of documents, each document containing at least one of a plurality of keywords from the search query; means for analyzing each document in said set of documents to determine a content characteristic in a predefined neighborhood adjacent to at least one of said plurality of keywords in that document; means for comparing a content characteristic associated with each document in the set of documents against the content characteristic of other documents in the set of documents, and determining a level of similarity of content characteristic for each pair of documents compared; means for creating a plurality of clusters of documents, wherein in a given cluster the documents have a level of similarity of content characteristic greater than a predetermined threshold; and means for providing an identification of each said cluster in response to the search query.
- 8. The system of claim 7 wherein said identification of each said cluster comprises information regarding the level of similarity of content characteristics which formed the basis for clustering the documents.
- 9. The system of claim 7 wherein said set of documents is collected by a search engine.
Parent Case Info
This application is a continuation of U.S. Patent application Ser. No. 08/935,827 filed on Sept. 23, 1997, now issued U.S. Pat. No. 6,167,397.
US Referenced Citations (12)
Number |
Name |
Date |
Kind |
5542090 |
Henderson et al. |
Jul 1996 |
A |
5598557 |
Doner et al. |
Jan 1997 |
A |
5659766 |
Saund et al. |
Aug 1997 |
A |
5675819 |
Schuetze |
Oct 1997 |
A |
5787420 |
Tukey et al. |
Jul 1998 |
A |
5787421 |
Nomiyama |
Jul 1998 |
A |
5787422 |
Tukey et al. |
Jul 1998 |
A |
5819258 |
Vaithyanathan et al. |
Oct 1998 |
A |
5845278 |
Kirsch et al. |
Dec 1998 |
A |
5857179 |
Vaithyanathan et al. |
Jan 1999 |
A |
5864855 |
Ruocco et al. |
Jan 1999 |
A |
5926812 |
Hilsenrath et al. |
Jul 1999 |
A |
Continuations (1)
|
Number |
Date |
Country |
Parent |
08/935827 |
Sep 1997 |
US |
Child |
09/671705 |
|
US |