Claims
- 1. Computer-readable code that is operable, when read by an electronic computer, to compare a target concept, invention, or event with each of a plurality of texts, by the steps of:
(a) for each of a plurality of terms composed of non-generic words and, optionally, proximately arranged words groups characterizing the target concept, invention, or event, selecting that term as a descriptive term if the term has an above-threshold selectivity value in at least one library of texts in a field, where the selectivity value of a term in a library of texts in a field is related to the frequency of occurrence of that term in said library, relative to the frequency of occurrence of the same word in one or more other libraries of texts in one or more other fields, respectively, (b) determining for each of the plurality of texts, a match score related to the number of descriptive terms present in or derived from that text that match those in the target concept, invention, or event, and (c) using the match score to compare the texts with the target concept, invention, or event.
- 2. The code of claim 1, which is operable, in carrying out the step of selecting descriptive words in a target text, of (i) accessing a database containing (a) non-generic words contained in the texts in said libraries, and (b) for each database word, an associated selectivity value, and (ii) selecting a word as a descriptive word if its associated selectivity value is above a threshold value.
- 3. The code of claim 2, wherein the selectivity value associated with a word in said database is related to the greatest selectivity value determined with respect to each of a plurality N≧2 of libraries of texts in different fields.
- 4. The code of claim 2, wherein the threshold selectivity value is greater than 1.25.
- 5. The code of claim 2, wherein said database includes, for each word, text identifiers, and said code is operable in carrying out the step of determining match scores, of (i) accessing said database to record texts associated with each nongeneric word, and (ii) from the identified texts recorded in step (i) determining text match score based on number of texts associated with each non-generic word.
- 6. The code of claim 2, wherein said database includes, for each word, text and library identifiers, and text-specific word identifiers, and said code is operable in carrying out the step of selecting a word group with an above-threshold selectivity value of (i) accessing said database to identify texts and associated text-specific identifiers associated with that word pair, and (ii) from the identified texts and text-specific identifiers recorded in step (i) determining the selectivity value of that word group.
- 7. The code of claim 6, wherein said code is further operable, in carrying out the step of determining match scores, of (i) recording the texts associated with each descriptive word pair, and (ii) determining text match score based on number of texts associated with each non-generic word pair.
- 8. The code of claim 2, for use in classifying the concept, invention, or event into one or more recognized classes, wherein said database includes for each word, classification identifiers corresponding to the one or more classes, and said code is operable, in carrying out said determining step of (i) accessing said database to identify classification identifiers that word pair, and (ii) from the identified classification identifiers, determining the classification of the concept, invention, or event.
- 9. The code of claim 1, which is operable to assign to each of the terms in the target concept, invention, or event, a match value related to a fractional exponential of the corresponding selectivity value.
- 10. The code of claim 1, wherein the target concept, invention, or event is a natural language text, and said code is operable to process said text to generate a list of non-generic words and optionally, word groups.
- 11. The code of claim 10, wherein said text processing includes removing generic words from the input text, to generate a list of non-generic words.
- 12. The code of claim 10, wherein said text processing includes classifying non-generic words into those having a verb root and remaining non-generic words.
- 13. The code of claim 1, which is further operable, following said using step, of identifying one or more terms characterizing the target concept, invention, or event that are underrepresented in top-matching texts, and repeating said determining step for the underrepresented terms, to find additional texts that compare the texts with the target concept, invention, or event.
- 14. An automated system for comparing a target concept, invention, or event in a given field with each of a plurality of natural-language texts, comprising
(a) a computer, (b) a database accessible by said computer, and which provides a plurality of words and associated selectivity values, where the selectivity value associated with a word is related to the frequency of occurrence of that word in at least one library of texts in a field, relative to the frequency of occurrence of the same word in one or more libraries of texts in one or more other fields, respectively, and (c) the computer readable code of claim 1, which is operable, in carrying out said selecting step, of (i) accessing said database and (ii) recording from the database, the selectivity value associated with that word.
- 15. The system of claim 14, wherein the selectivity value is calculated for each of N≧2 fields, and said other fields may include, with respect to any of the selected N fields, one or more other of the N fields.
- 16. The system of claim 15, wherein the selected value assigned to a word group is the highest selectivity value calculated for all of the N fields.
- 17. The system of claim 14, wherein said database includes, for each word, text identifiers, and said code is operable in carrying out the step of determining match scores, of (i) accessing said database to record texts associated with each non-generic word, and (ii) from the identified texts recorded in step (i) determining text match score based on number of texts associated with each non-generic word.
- 18. The system of claim 14, wherein said database includes, for each word, text and library identifiers, and text-specific word identifiers, and said code is operable in carrying out the step of selecting a word group with an above-threshold selectivity value of (i) accessing said database to identify texts and associated text-specific identifiers associated with that word pair, and (ii) from the identified texts and text-specific identifiers recorded in step (i) determining the selectivity value of that word group.
- 19. An automated method of comparing a target concept, invention, or event in a given field with each of a plurality of natural-language texts, by the steps of:
(a) for each of a plurality of terms composed of non-generic words and, optionally, proximately arranged words groups characterizing the target concept, invention, or event, selecting that term as a descriptive term if the term has an above-threshold selectivity value in at least one library of texts in a field, where the selectivity value of a term in a library of texts in a field is related to the frequency of occurrence of that term in said library, relative to the frequency of occurrence of the same word in one or more other libraries of texts in one or more other fields, respectively, (b) determining for each of the plurality of texts, a match score related to the number of descriptive terms present in or derived from that text that match those in the target concept, invention, or event, and (c) identifying from among the plurality of texts, one or more texts which have the highest match score or scores.
Parent Case Info
[0001] This application claims priority to U.S. Provisional Patent Application Serial No. 60/394,204 filed on Jul. 5, 2002, PCT Patent Application No. PCT/US02/21198 filed on Jul. 3, 2002 and PCT Patent Application No. PCT/US02/21200 filed on Jul. 3, 2002, all of which are incorporated in their entirety herein by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60394204 |
Jul 2002 |
US |