Claims
- 1. Computer-readable code which is operable, when used to control an electronic computer, to identify descriptive words contained in a digitally encoded input text, by the steps of
(i) processing the input text to generate a list of text words, (ii) selecting a text word from (i) as a descriptive word if that word has an above-threshold selectivity value in at least one library of texts in a field, where
the selectivity value of a word in a library of texts in a field is related to the frequency of occurrence of that word in said library, relative to the frequency of occurrence of the same word in one or more other libraries of texts in one or more other fields, respectively, and (iii) storing or displaying the words selected in (ii) as descriptive words.
- 2. The code of claim 1, wherein said processing includes removing generic words from the input text, to generate a list of non-generic words.
- 3. The code of claim 2, wherein said processing includes classifying nongeneric words into those having a verb root and remaining non-generic words.
- 4. The code of claim 1, which is operable, in carrying out said selecting step, of (i) accessing a database containing (a) words from texts in said libraries, and (b) for each database word, an associated selectivity value, and (ii) selecting a word as a descriptive word if its associated selectivity value is above a threshold value.
- 5. The code of claim 4, wherein the selectivity value associated with a word in said database is related to the greatest selectivity value determined with respect to each of a plurality N≧2 of libraries of texts in different fields.
- 6. The code of claim 4, wherein the threshold selectivity value is greater than 1.25.
- 7. The code of claim 4, which is further operable to identify descriptive word groups formed of proximately arranged words in said input text, wherein
said processing step further includes constructing from non-generic words in the input text, a plurality of proximately arranged word groups, and said selecting step further includes selecting each word group from (i) as a descriptive word group if that word group has an above-threshold selectivity value in at least one library of texts in a field, where
the selectivity value of a word group in a library of texts in a field is related to the frequency of occurrence of that word group in said library, relative to the frequency of occurrence of the same word group in one or more other libraries of texts in one or more other fields, respectively.
- 8. The code of claim 7, wherein said database includes, for each word, text and library identifiers, and text-specific word identifiers, and said code is operable in carrying out the step of selecting a word group with an above-threshold selectivity value of (i) accessing said database to identify texts and text-specific identifiers associated with that word pair, and (ii) from the identified texts and text-specific identifiers recorded in step (i), determining the selectivity value of that word group.
- 9. An automated system for generating descriptive words contained in a digitally encoded input text, comprising
(a) a computer, (b) a database accessible by said computer, and which provides a plurality of words and associated selectivity values, where the selectivity value associated with a word is related to the frequency of occurrence of that word in at least one library of texts in a field, relative to the frequency of occurrence of the same word in one or more libraries of texts in one or more other fields, respectively, and (c) the computer readable code of claim 1, which is operable, in carrying out said selecting step, of (i) accessing said database and (ii) recording from the database, the selectivity value associated with that word.
- 10. The system of claim 9, wherein the selectivity value associated with a word in said database is related to the greatest selectivity value determined with respect to each of a plurality of N≧2 libraries of texts in different fields.
- 11. The system of claim 10, which is further operable to identify descriptive word groups formed of proximately arranged words in said input text, wherein said processing step further includes constructing from non-generic words in the input text, a plurality of proximately arranged word groups, said database further includes text and library identifiers, and text-specific word identifiers, and said code is operable, in selecting descriptive word pairs, in (i) accessing said database to identify texts and text-specific identifiers associated with that word pair, and (ii) from the identified texts and text-specific identifiers recorded in step (i), determining the selectivity value of that word group.
- 12. An automated method for generating descriptive terms contained in a digitally encoded input text by the steps performed by the code of claim 1, when used to control the operation of an electronic computer.
- 13. The method of claim 12, wherein carrying out said selecting step includes (i) accessing a database containing (a) words from texts in said libraries, and (b) for each database word, an associated selectivity value, and carrying out said selecting step includes (ii) selecting a word as a descriptive word if its associated selectivity value is above a threshold value.
- 14. The method of claim 13, wherein the selectivity value associated with a word in said database is related to the greatest selectivity value determined with respect to each of a plurality N≧2 of libraries of texts in different fields.
Parent Case Info
[0001] This application claims priority to U.S. Provisional Patent Application Serial No. 60/394,204 filed on Jul. 5, 2002, PCT Patent Application No. PCT/US02/21198 filed on Jul. 3, 2002 and PCT Patent Application No. PCT/US02/21200 filed on Jul. 3, 2002, all of which are incorporated in their entirety herein by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60394204 |
Jul 2002 |
US |