Claims
- 1. A method for associating words and word strings in a language comprising:
providing a collection of documents, wherein said collection includes at least one document; receiving from a user a word or word string query to be analyzed; searching said collection of documents for the query to be analyzed and returning documents containing the query to be analyzed; determining a user-defined amount of words or word strings or both to the left of said query to be analyzed in said returned documents based on their frequency and creating a Left Signature List comprising said word or word strings to the left of said query to be analyzed in said returned documents; determining a user-defined amount of words or word strings or both to the right of said words or word strings comprising said Left Signature List and creating Left Anchor Lists comprising said word or word strings to the right of said Left Signature Lists based on their frequency in a collection of documents; determining a user-defined number of words or word strings or both to the right of said query to be analyzed in said returned documents and creating a Right Signature List comprising said word or word strings to the right of said query to be analyzed in said returned documents based on their frequency; determining a user-defined number of words or word strings or both to the left of said word or word strings comprising said Right Signature List and creating Right Anchor Lists comprising said word or word strings to the left of said Right Signature List based on their frequency; and ranking results based on the frequency of each word or word string occurring in said Left Anchor Lists and the frequency of said word or word string occurring in said Right Anchor Lists.
- 2. The method of claim 1, wherein ranking results includes multiplying a total frequency of each word or word string occurring in said Left Anchor Lists by a total frequency of said word or word string occurring in said Right Anchor Lists.
- 3. The method of claim 1, wherein ranking results includes adding a total frequency of each word or word string occurring in said Left Anchor Lists to a total frequency of said word or word string occurring in said Right Anchor Lists, for each word or word string occurring in both the Left Anchor List and the Right Anchor List.
- 4. A method for associating words and word strings in a language comprising:
providing a collection of documents, wherein said collection includes at least one document; receiving from a user a word or word string query to be analyzed; searching said collection of documents for the query to be analyzed and returning documents containing the query to be analyzed; determining a user-defined amount and size of words or word strings or both to the left and right of the query in said returned documents containing the query to be analyzed; returning a list with an entry or pluarality of entries, wherein said entry or said plurality of entries contain said determined amount of words to the left and right of the query in said returned documents; searching said collection of documents for said entry or plurality or plurality of entries in said returned list; and returning a list of words or word strings or both that occur most frequently between said determined amount of words to the left and right of said query in said returned documents.
RELATED APPLICATIONS
[0001] This application is a continuation-in-part of U.S. application Ser. No. 10/157,894, filed on May 31, 2002, which in turn is a continuation-in-part of U.S. application Ser. No. 10/024,473, filed on Dec. 21, 2001, which claims the benefit of U.S. Provisional Application No. 60/276,107 filed Mar. 16, 2001, and U.S. Provisional Application No. 60/299,472 filed Jun. 21, 2001, all of which are hereby incorporated by reference.
Provisional Applications (2)
|
Number |
Date |
Country |
|
60276107 |
Mar 2001 |
US |
|
60299472 |
Jun 2001 |
US |
Continuation in Parts (2)
|
Number |
Date |
Country |
| Parent |
10157894 |
May 2002 |
US |
| Child |
10281997 |
Oct 2002 |
US |
| Parent |
10024473 |
Dec 2001 |
US |
| Child |
10157894 |
May 2002 |
US |