Claims
- 1. A method comprising:
identifying a name in a document; determining a rarity indicator for the name; and defining a hyperlink for the name based on the rarity indicator.
- 2. The method of claim 1, wherein the name is a person name.
- 3. The method of claim 1, wherein the rarity indicator is a quantity based on a probability of drawing at least a portion of the name at random from a set of sample names representative of a relevant human population.
- 4. The method of claim 1, wherein the rarity indicator is a quantity based on a size of a human population, a probability of drawing a first portion of the name at random, and a probability of drawing a second portion of the name at random.
- 5. The method of claim 1, wherein the first portion is a first name portion of the name and the second portion is a last name portion of the name.
- 6. The method of claim 1, wherein defining a hyperlink for the name based on the rarity indicator, comprises:
identifying one or more non-person-name terms from the document; identifying one or more candidate records in a database based on at least a portion of the name; comparing the non-person-name terms for each of the candidate records to the non-person-name terms from the document; calculating one or more quantities, each based on the rarity indicator for the person name and the comparison of the non-person-name terms for one of the candidate records; and defining the hyperlink based on the one or more calculated quantities.
- 7. The method of claim 6, wherein calculating one or more quantities, each based on the rarity indicator for the person name and the comparison of the non-person-name terms for one of the candidate records, includes using a Bayesian inference engine.
- 8. The method of claim 6, wherein defining the hyperlink based on the one or more calculated quantities comprises:
comparing the quantities to a threshold; and defining the hyperlink based on a greatest one of the quantities exceeding the threshold.
- 9. The method of claim 8, wherein defining the hyperlink based on the greatest one of the quantities exceeding the threshold comprises defining a hyperlink to designate the candidate record corresponding to the greatest one of the quantities.
- 10. A machine-readable medium comprising machine executable instructions for performing the method of claim 1.
- 11. A machine-readable medium comprising machine executable instructions for performing the method of claim 9.
- 12. A system for adding a hyperlink to a document including a person name, the system comprising:
at least one processor; a memory coupled to the processor, the memory including instructions for:
identifying a name in a document; determining a rarity indicator for the name; and defining a hyperlink for the name based on the rarity indicator.
- 13. The system of claim 12, wherein the name is a person name.
- 14. The system of claim 12, wherein the rarity indicator is a quantity based on a probability of drawing at least a portion of the name at random from a set of sample names representative of a relevant human population.
- 15. The system of claim 12, wherein the rarity indicator is a quantity based on a size of a human population, a probability of drawing a first portion of the name at random, and a probability of drawing a second portion of the name at random from a set of sample names representative of a relevant human population.
- 16. The system of claim 12, wherein defining a hyperlink for the name based on the rarity indicator, comprises:
identifying one or more non-person-name terms from the document; identifying one or more candidate records in a database based on at least a portion of the name; comparing the non-person-name terms for each of the candidate records to the non-person-name terms from the document; calculating one or more quantities, each based on the rarity indicator for the person name and the comparison of the non-person-name terms for one of the candidate records; and defining the hyperlink based on the one or more calculated quantities.
- 17. A method comprising:
identifying one or more person names in a set of one or more documents, with each identified person name more likely to refer to a single person in a profession than other person names in the document; identifying descriptive language from one or more documents, based on the identified names; and identifying within one or more documents other person names that refer to persons in the profession, based on one or more portions of the identified descriptive language.
- 18. The method of claim 17, wherein identifying person names in a set of documents comprises:
identifying a plurality of person names in the set of documents, with each name including at least a last name; calculating for each of the plurality of person names, a quantity based on a probability of drawing its last name at random from a set of last names in a search universe.
- 19. The method of claim 17, wherein calculating for each of the plurality of person names a quantity based on the probability of drawing its last name at random from a set of last names, includes:
calculating a quantity based on a size of a human population, a probability of drawing a first portion of the name at random from all the first names in a relevant search universe, and a probability of drawing a second portion of the name at random from all the last names in the search universe.
- 20. The method of claim 17, wherein identifying descriptive language from one or more documents, based on the identified names comprises identifying appositives related to the identified names:
- 21. The method of claim 17, wherein identifying descriptive language from one or more documents, based on the identified names in the set of documents comprises:
identifying a set of terms, including one or more first terms preceding one or more of the identified names and one or more second terms succeeding one or more of the identified names.
- 22. The method of claim 17, wherein the one or more first terms includes one or more parts of speech and one or more of the second terms includes one or more parts of speech.
- 23. A machine-readable medium comprising machine executable instructions for performing the method of claim 16.
- 24. A system comprising:
at least one processor; a memory coupled to the processor, the memory including instructions for:
identifying one or more person names in a set of one or more documents, with each identified person name more likely to refer to a single person in a profession than other person names in the document; identifying descriptive language from one or more documents, based on the identified names; and identifying within one or more documents other person names that refer to persons in the profession regardless of their name uniqueness, based on one or more portions of the identified descriptive language.
- 25. A data structure comprising
a name; and a name-rarity indicator which indicates how likely the name is to refer to more than one entity in a population.
- 26. The data structure of claim 25, further comprising:
one or more organizations co-existent in a document with the name; and one or more locations co-existent in a document with the name;
- 27. The data structure of claim 2, further including positional information indicating relative position of each organization and each location relative to the name.
- 28. A method comprising:
receiving a search query including a name of an entity; determining a measure of how rare the name is in a population; and obtaining additional information to assist in answering the query, in response to the determined measure.
- 29. The method of claim 28, wherein obtaining additional information to assist in answering the query in response to the determined measure comprises:
comparing the measure to a threshold; and requesting additional information if the measure is less than the threshold.
- 30. The method of claim 28, further comprising:
updating the search query based on the additional information.
- 31. The method of claim 28, wherein requesting additional information comprises requesting information is related to a profession, a location, and/or an organization.
- 32. The method of claim 28, wherein obtaining additional information to assist in answering the query in response to the determined measure comprises:
comparing the measure to a threshold; and searching one or more databases based on the name; updating or supplementing the query based on results of searching the one or more databases.
- 33. The method of claim 28, wherein updating or supplementing the query comprises:
defining one or more sub-queries, with each sub-query including information about a professional title, organization, or location associated with the name.
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application claims priority to U.S. provisional application 60/342,952 filed on Dec. 21, 2001. The provisional application is incorporated herein by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60342956 |
Dec 2001 |
US |