Claims
- 1. A method for associating words in a language comprising:
providing a collection of documents, wherein said collection includes at least one document; selecting a first word or word string, and a second word or word string; locating a plurality of occurrences of the first word or word string and the second word or word string in said collection; defining in said collection first ranges and second ranges, wherein the first ranges include the first word or word string and the second ranges include the second word or word string; searching said first ranges and second ranges for common word or word strings, wherein said common word or word strings occur in a plurality of ranges; and associating first word or word strings and second word or word strings with common word or word strings based on frequency of occurrence of the common word or word strings within the first ranges and second ranges respectively.
- 2. The method of claim 1, wherein said associating first word or word strings and second word or word strings is enhanced by a greater frequency of occurrence of the common word or word strings.
- 3. The method of claim 1, wherein said associating first word or word strings and second word or word strings is enhanced by a lesser frequency of occurrence of the common word or word strings.
- 4. The method of claim 1, further comprising replacing the first word and/or word string with a substantially semantically equivalent word or word string.
- 5. A method for associating words in a language comprising:
providing a collection of documents, wherein said collection includes at least one document; selecting a first word or word string, and a second word or word string; locating all documents having a plurality of occurrences of the first word or word string within a defined proximity range of the second word and/or word string, with said defined proximity range having an upper limit and a lower limit; defining in the located documents a range, wherein the range includes the first word or word string and the second word or word string; searching said ranges for common word or word strings; and associating the first word or word string and the second word or word string with common word or word strings based on frequency of occurrence of the common word or word strings within the ranges.
- 6. The method of claim 5, wherein said associating first word or word strings and second word or word strings is enhanced by a greater frequency of occurrence of the common word or word strings.
- 7. The method of claim 5, wherein said associating first word or word strings and second word or word strings is enhanced by a lesser frequency of occurrence of the common word or word strings.
- 8. The method of claim 5, wherein said upper and said lower limit of said defined proximity range are equal.
- 9. A method for creating an association database in a single language comprising the steps of:
providing a collection of documents, wherein said collection includes at least one document; selecting a first word or word string; locating a plurality of occurrences of the first word or word string; defining in said collection ranges, wherein said ranges occur in relation to each of said plurality of occurrences of the first word or word string; searching said ranges for common word or word strings, wherein said common word or word strings occur in a plurality of ranges; and associating first word or word strings with common word or word strings based on frequency of occurrence of the common word or word strings within the ranges.
RELATED APPLICATIONS
[0001] This application is a continuation-in-part of U.S. application Ser. No. 10/024,473, filed on Dec. 21, 2001, and claims the benefit of U.S. Provisional Application No. 60/276,107 filed Mar. 16, 2001, and U.S. Provisional Application No. 60/299,472 filed June 21, 2001, all of which are hereby incorporated by reference.
Provisional Applications (2)
|
Number |
Date |
Country |
|
60276107 |
Mar 2001 |
US |
|
60299472 |
Jun 2001 |
US |
Continuation in Parts (1)
|
Number |
Date |
Country |
Parent |
10024473 |
Dec 2001 |
US |
Child |
10157894 |
May 2002 |
US |