Claims
- 1. A computer-accessible database comprising
a list of generic words and associated selectivity values, where the selectivity value(s) associated with a word are related to the frequency of occurrence of that word in at least one library of texts in a field, relative to the frequency of occurrence of the same word in one or more libraries of texts in one or more other fields, respectively, and the words in the database are non-generic words in the texts in said libraries of texts.
- 2. The database of claim 1, wherein the words in the list that have verb roots are expressed in a common verb form.
- 3. The database of claim 1, wherein the selectivity value associated with a word in said database is related to at least one of the selectivity values determined with respect to each of a plurality N≧2 of libraries of texts in different fields.
- 4. The database of claim 2, wherein the selectivity value assigned to a word is the highest selectivity value calculated for all of the N fields.
- 5. The database of claim 1, which further includes, associated with each word in the database, a list of one or more text identifiers that identify the texts containing that word.
- 6. The database of claim 5, which further includes, associated with each text identifier, an associated library identifier that identifies the library containing that text.
- 7. The database of claim 5, which further includes, associated with each text identifier, sentence identifiers that identify the sentence number(s) within a given text that contain that word, and, for each text or sentence identifier, a word-number identifier that identifies the number(s) of the non-generic word in the identified text or sentence, respectively.
- 8. The database of claim 5, which further includes, associated with each text identifier, a classification identifier that identifies a recognized class to which that text belong.
- 9. The database of claim 5, wherein the lists of words are non-generic words contained in digitally encoded patent texts, the selected fields are different patent classes or superclasses, and the text identifiers associated with each text are patent or patent-application numbers.
- 10. The database of claim 5, wherein the lists of words are non-generic words contained in digitally encoded scientific or technical journal articles, the selected fields include different scientific or technical specialities, and the text identifiers include journal-article source information.
Priority Claims (2)
Number |
Date |
Country |
Kind |
PCT/US02/21198 |
Jul 2002 |
WO |
|
PCT/US02/21200 |
Jul 2002 |
WO |
|
Parent Case Info
[0001] This application claims priority to U.S. Provisional Patent Application Serial No. 60/394,204 filed on Jul. 5, 2002, PCT Patent Application No. PCT/US02/21198 filed on Jul. 3, 2002 and PCT Patent Application No. PCT/US02/21200-filed on Jul. 3, 2002, all of which are incorporated in their entirety herein by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60394204 |
Jul 2002 |
US |