Claims
- 1. A method for generating business e-mail address of a person comprising the steps of:
providing a database storing information regarding people, the database including for each person at least name of the person and the name of respective employer for which the person is currently employed; and using digital processor means coupled to the database, automatically generating e-mail address of a subject person named in the database, the e-mail address being with respect to a respective organization named in the database for the subject person.
- 2. A method as claimed in claim 1 wherein the step of using digital processor means and automatically generating e-mail address includes:
obtaining a working e-mail address to the respective organization, the working e-mail address not being the e-mail address of the subject person; deducing from the working e-mail address, format of e-mail addresses to the respective organization; using the deduced information, constructing potential e-mail addresses for the subject person; and verifying each constructed potential e-mail address by testing each, such that at least one verified constructed potential e-mail address provides a business e-mail address of the subject person.
- 3. A method as claimed in claim 2 further comprising the step of using predefined common email address formats, constructing potential email addresses for the subject person.
- 4. A method as claimed in claim 1 wherein the step of providing a database includes using crawler means, automatically extracting information regarding people and/or organizations from sites of a global computer network and storing the extracted information in the database, such that the database is formed by automated means.
- 5. A method as claimed in claim 4 wherein the step of using crawler means includes employing a multiplicity of crawlers under control of a distributor.
- 6. A system for generating business e-mail address of a person comprising:
a database storing information regarding people, the database including for each person at least name of the person and the name of respective employer for which the person is currently employed; and digital processor means coupled to the database for automatically generating an e-mail address of a subject person named in the database, the e-mail address being with respect to a respective organization named in the database for the subject person.
- 7. A system as claimed in claim 6 wherein the digital processor means automatically generates the e-mail address by:
obtaining a working e-mail address to the respective organization, the working e-mail address not being the e-mail address of the subject person; deducing from the working e-mail address, format of e-mail addresses to the respective organization; using the deduced information, constructing potential e-mail addresses for the subject person; and verifying each constructed potential e-mail address by testing each, such that at least one verified constructed potential e-mail address provides a business e-mail address of the subject person.
- 8. A system as claims in claim 7 wherein the digital processor means utilizes predefined common email address formats to further construct potential email addresses for the subject person.
- 9. A system as claimed in claim 6 wherein the database is computer generated from crawler means automatically extracting information regarding people and/or organizations from sites of a global computer network and storing the extracted information in the database, such that the database is formed by automated means.
- 10. A system as claimed in claim 9 wherein the crawler means includes plural crawlers under control of a distributor.
- 11. A computer automated system for mining from a global computer network information on people and organizations comprising:
a plurality of automated crawlers for traversing sites of a global computer network and retrieving pages that contain information of interest; a distributor coupled to the crawlers for controlling crawler processing; an extractor responsive to the crawler retrieved pages and extracting information about people and organizations therefrom; the extracted information being stored in a database; an integrator coupled to the database for resolving duplicate information and combining related information in the database; and a post-processor coupled to the database for analyzing contents of the database and generating missing information therefrom.
- 12. A system as claimed in claim 11 wherein the database stores information about a person in a respective record, different records storing different person's information; and
given two records of potentially a same person, the integrator combines the records if the person's name is the same in the two records and one of organization name and title is the same in the two records.
- 13. A system as claimed in claim 12 wherein the integrator further considers statistical rarity of title and person's name in determining whether to combine the two records.
- 14. A system as claimed in claim 11 wherein the post-processor generates an email address of a subject person named in the database, the email address being with respect to respective organization named in the database for the subject person.
- 15. A system as claimed in claim 14 wherein the postprocessor generates the email address for the subject person by:
obtaining a working e-mail address to the respective organization, the working e-mail address not being the e-mail address of the subject person; deducing from the working e-mail address, format of e-mail addresses to the respective organization; using the deduced information, constructing potential e-mail addresses for the subject person; and verifying each constructed potential e-mail address by testing each, such that at least one verified constructed potential e-mail address provides a business e-mail address of the subject person.
- 16. A system as claims in claim 15 wherein the post-processor further utilized predefined common email address formats to construct potential email addresses for the subject person.
- 17. A method for mining, from a global computer network, information on people and organizations, comprising the computer implemented steps of:
using a plurality of crawlers, traversing sites of a global computer network and retrieving pages that contain information of interest; controlling the crawlers with a distributor; extracting from the retrieved pages information about people and/or organizations; storing the extracted information in a database; resolving duplicate information stored in the database; and analyzing contents of the database and generating missing information for storage in the database.
- 18. A method as claimed in claim 17 wherein the step of resolving duplicates includes combining related information in the database.
- 19. A method as claimed in claim 17 wherein:
the step of storing includes storing information about different people in different records of the database; and the step of resolving includes: (a) comparing name of a person indicated in one record with name of person indicated in a second record; (b) if the person name comparing results in a match, then determining whether one of organization name and title is the same in the one and second records, and (c) combining the one and second records when the determining of step (b) finds one of organization name and title to be the same in the one and second records.
- 20. A method as claimed in claim 19 further comprising the step of considering statistical rarity of title and person's name.
- 21. A method as claimed in claim 17 wherein the step of generating missing information includes generating an email address of a subject person with respect to a respective organization named in the database for the subject person.
- 22. A method as claimed in claim 21 wherein the step of generating an email address includes:
obtaining a working e-mail address to the respective organization, the working e-mail address not being the e-mail address of the subject person; deducing from the working e-mail address, format of e-mail addresses to the respective organization; using the deduced information, constructing potential e-mail addresses for the subject person; and verifying each constructed potential e-mail address by testing each, such that at least one verified constructed potential e-mail address provides a business e-mail address of the subject person.
- 23. A method as claimed in claim 22, further comprising the step of using predefined common email address formats, constructing potential email addresses for the subject person.
RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional application Ser. No. 60/221,750, filed on Jul. 31, 2000, the entire teachings of which are incorporated herein by reference. This application also relates to U.S. patent application Ser. No. 09/704,080, filed Nov. 1, 2000; U.S. patent application Ser. No. 09/703,907, filed Nov. 1, 2000; U.S. patent application Ser. No. 09/768,869 filed Jan. 24, 2001; U.S. patent application Ser. No. 09/821,908 filed Mar. 30, 2001; and U.S. patent application Ser. No. _____, filed Jul. 20, 2001, entitled “Computer Method and Apparatus for Extracting Data from Web Pages”, Attorney Docket No. 2937.1000-005, all by the Assignee of the present invention and herein incorporated by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60221750 |
Jul 2000 |
US |