The present invention relates generally to a method of and system for determining connections between parties and, more particularly, to a connection searching method and system in which a user is capable of entering a source party and a target party and searching a host database to obtain lists of people or entities through which the source and target parties are connected. The system also is capable of determining a number of connections that are associated with one party.
It is well known that personal contacts are advantageous when conducting transactions between parties. However, determining the contacts of one party of a transaction the contacts of the other party of the transaction and what contacts those contacts have in common can be very difficult and time consuming. Currently, there is no efficient method or system for determining such contacts between parties of a transaction.
The present invention is directed to a method of and system for determining connections between people which is efficient and effective. The system includes a host database which includes records of parties, including identification information, which is available from non-restricted sources. The identification information is arranged in a series of searchable data fields. A user connects to a website associated with the system and inputs a source party and a target party, for the purpose of finding a number of connections between the parties. The parties may be people or entities, such as companies, organizations, etc.
The system searches the database for intermediate party records having at least one data field which includes identification information which is common to the identification information in at least one of the data fields of the source party record. The located party records are compared to the target party record to determine if any of the identification information in the intermediate party record is common to any of the identification information in the target party record. If there is a commonality, a list of the source party, intermediate party and target party is generated, including the records for each party, to show the connection path between the source party and the target party. If there are no commonalities between the intermediate party and the target party, further intermediate parties are located which have commonalities with the first intermediate party.
The located party records are then compared to the target party record to determine if any of the identification information in the further intermediate part records are common to any of the identification information in the target party record. If there is a commonality, a list of the source party, intermediate parties and target party is generated, including the records for each party, to show the connection path between the source party and the target party. This process is repeated until no further connections are found or until a preset limit of connections is reached.
According to one aspect of the invention, a method of determining a connection between a source party and a target party includes:
Step G may further include searching the data fields in the records of at least one of the client database and the host database to locate identification information commonalities between the at least one intermediate party records and further intermediate party records; and searching the data fields in the records of at least one of the client database and the host database to locate identification information commonalities between the further intermediate party records and the target party record. The source party and the target party may be one of a person and an entity. The identification information may include personal and affiliation information of the party.
The identification information may include at least one of a person's name, the person's dates of employment with a company, the person's title within the company, the person's company name, the person's company address, the person's company SIC code, and the person's company ticker symbol. The identification information may include at least one of a company name, the company's address, the company's SIC code and the company's ticker symbol. The records stored on the client database may be a subset of the records stored on the host database.
According to another aspect of the invention, a method of determining a connection between a source party and a target party includes:
According to another aspect of the invention, a system for determining a connection between a source party and a target party includes a host system having a computer processor and associated memory. The host system includes a host database including a plurality of records, each record including a number of data fields, each of the data fields including identification information of a party, the identification information being extracted from non-restricted sources.
The system also includes a client system having a computer processor and associated memory, the client system including a client database including a plurality of records, each record including a number of data fields, each of the fields including identification information of a party, the identification information being extracted from a client's private sources. The client system establishes a connection to the host system over the communication network and inputs identification information of a source party and a target party.
The host system identifies a record in at least on of the client database and the host database including identification information of the source party and identifying a record in at least one of the client database and the host database including identification information of the target party; and the host system searching the data fields in the records to locate identification information commonalities between the source party record and at least one intermediate party record and searching the data fields in the records to locate identification information commonalities between the at least one intermediate party record and the target party record. Upon locating a identification information commonality between the at least one intermediate party record and the target party record, the host system generating a list of the at least one intermediate party record.
According to yet another aspect of the invention, a system for determining a connection between a source party and a target party includes a host system including a computer processor and associated memory and a user system including a computer processor and associated memory. The host system includes a database having a plurality of records, each record including a number of data fields, each of the data fields including identification information of a party, the identification information being extracted from non-restricted sources. The user system is adapted for establishing a connection to the host system over a communication network and inputting identification information of a source party and a target party to the host system.
The host system identifies records in the database including identification information of the source party identification information of the target party and searches the data fields in the records to locate identification information commonalities between the source party record and at least one intermediate party record and searching the data fields in the records to locate identification information commonalities between the at least one intermediate party record and the target party record. Upon locating a identification information commonality between the at least one intermediate party record and the target party record, the host system generating a list of the at least one intermediate party record.
According to yet another aspect of the invention, a method of determining a connection between a source party and a target party includes:
According to yet another aspect of the invention, a method of determining a connection between a source party and a target party includes:
According to yet another aspect of the invention, a system for determining a connection between a source party and a target party includes a host system including a computer processor and associated memory and a user system including a computer processor and associated memory. The host system includes a database having a plurality of records, each record including a number of data fields, each of the data fields including identification information of a party. The user system is adapted for establishing a connection to the host system over a communication network, the user system inputting identification information of a source party and a target party to the host system.
The host system identifies records in the database including identification information of the source party identification information of the target party and searches the data fields in the records to locate identification information commonalities between the source party record and at least one intermediate party record and searching the data fields in the records to locate identification information commonalities between the at least one intermediate party record and the target party record. Upon locating a identification information commonality between the at least one intermediate party record and the target party record, the host system generating a list of the at least one intermediate party record.
The system described above may also include the various features and capabilities described below, which enable a client (i.e., a user of host system) to generate a list of persons or entities (including groups of persons or groups of lists) that can function as a starting point for a connections query or request. This functionality can be referred to as “ClientLink™” (a trademark of Orion's Belt, Inc.) and made integral with or a separate module that works in concert with host operation system. A user's personal or private list created using ClientLink can be referred to as the user's “PrivateLink™” (a trademark of Orion's Belt, Inc.) or “PrivateLink list”. For purposes of this description we assume that ClientLink is integral with the host operation system.
As a general overview of a host operation system having aspects of PrivateLink, when the connections server and DB (or host system) receives a query including a PrivateLink list and an endpoint, the host operation system generates information representing the connections to the endpoint for each member of the PrivateLink list, and returns this to the user. In other forms, rather than a single endpoint, a list of endpoints could be used (i.e., an endpoint list). In such a case, the host operation system generates connections between each member in the PrivateLink list and each member in the endpoint list, to the extent such connections exist. In yet another form, a user may enter a single starting point and an endpoint list. In such a case the system generates connections from the starting point to each endpoint in the endpoint list, to the extent such connections exist. The following text describes these features more fully.
The foregoing and other objects of this invention, the various features thereof, as well as the invention itself may be more filly understood from the following description when read together with the accompanying drawings in which:
FIG. is a flow diagram showing another embodiment of a method for determining connections between parties in accordance with the present invention;
In one preferred embodiment of the invention, the user system 14 is an IBM PC compatible system operating an operating system such as the Microsoft Windows® operating system, and host system 12 is configured as a web server providing access to information such as web pages in HTML format via a protocol such as the HyperText Transport Protocol (http). The user system 14 and client systems 16a-16c include software to allow viewing of web pages, commonly referred to as a web browser, thus being capable of accessing web pages located on host system 12. Alternatively, user system 14 and client system 16a-16c can be any wired or wireless device that can be connected to a communications network, such as an interactive television system, including WEBTV, a personal digital assistant (PDA) or a cellular telephone.
The method of and system for determining connections between parties will now be described with reference to
In step 24, the client database 114 is constructed. First, the contact data included in the company database 110 is exported to the company list 112, and irrelevant contacts, such as personal contacts and non-business contacts, are eliminated. Redundant contacts are also eliminated. The company list 112 is input to record matching engine 104 where it is compared to the records included on host operation system and database 102. All contacts in the company list 112 that are also included in the host database 102 are stored in the same record form as the host database contacts and these records are saved in client database 114. This step may be repeated as often as necessary to keep the database updated. Accordingly, the data stored in the client database 114 is a subset of the data stored in host database 102. Known relationships between records in the client database 114 can be determined at this point and links between the related records implemented into the records. The information stored in the client database is proprietary to the client and is not accessible by outside parties. Contacts in the company list 112 which are not already on the host database 102 are not saved in the client database 114, since these contacts will not lead to further contacts on the host database 102.
Once the party records have been constructed and stored in the client database 114 and the host database 102, the process of determining connections between parties (people and/or entities) can be executed. In step 26, the host operation system 102 receives identification information of the source party and the target party, which typically are the names of the person or entity, from the client interface 116 of the client system 16 through a connection with the host system 12 via the internet 18. The record associated with source party is then located in the client database 114 if it is stored there. If it is not, it is located in the host database 102, step 28. The record associated with the target party is also located in either the client database 114 or the host database 102. In step 30, the records in the client database 114 and host database 102 are searched by the host operation system to locate commonalities between the identification information in the data fields in the source party record and identification information in the data fields of the records stored in the databases. All intermediate party records which include commonalities with the source party record are identified as first stage intermediate party records. If relationship links between parties within the client database have been previously established, these links are used to locate the connections between the source party record and the first stage intermediate party record. The identification information in the data fields of the first stage intermediate party records are then compared to the identification information in the data fields of the target party record to locate first stage intermediate party records having commonalities with the target party record, step 32. If none of the first stage intermediate party records have any identification information commonalities with the target party record, step 34, the records in the databases are searched to locate further stage intermediate party records having identification information commonalities with the first stage intermediate party records, step 36. The identification information in the further stage intermediate party records is searched to determine if there are any commonalities between any of the data fields in the further stage intermediate party records and the target party record, step 32. Steps 32 through 36 are repeated until an intermediate party record is located which has identification information commonalities with the target party record. When this occurs, the host operation system 102 generates a list of the parties connecting the source party to the target party, step 38, and transmits the list to the client interface 116 via the internet 18. If a preset limit, which limits the number of unique connections found to a predetermined number, which may be set by the client when entering the source and target party information or by the host operation system, is met, step 40, the process ends. If the preset limit is not met, steps 32 through 36 are repeated until the preset limit number of unique connections is met.
An example connections list is schematically shown in
A more detailed view of the source party record 202, the target party record 204 and the intermediate party record 206 is shown in
In
While the example described above shows how connections between two people are generated, the system also determines connections between a person and an entity, such as a company or association; between an entity and a person; and between two entities. Upon constructing the client database 114, a record of the client entity is generated and stored in the client database 114. The host database 102, when being constructed, generates records of entities found in its search of the non-restricted sources in the same manner as the records for people described above. An example entity record 230 is shown in
In an alternative embodiment, the host operation system and database 102 and the record matching engine 104 are replicated on the client database 114. In this embodiment, all of the operations described above are executed on the client system 16, thus allowing all execution to be local to the client system 16. Furthermore, the system 10 can be utilized to construct a list of connection that are associated with a single party. By inputting a single party to the host operation system and database 102, the searching function described above is executed and, in a first iteration, all records including identification information having commonalities with the source party are located and displayed. Depending on the scope of connections desired, numerous iterations of the search function can be executed in order to locate records of parties connected to the parties located in previous iterations.
While, as described above, the system 10 may be utilized by clients having a proprietary client database, it can also be utilized by a party which does not construct its own database. This process is shown in the flow diagram 240 of
Accordingly, the present invention enables connections between people and entities to be determined using a convenient and efficient database construction and search tool. The invention is able to provide information about connections between parties based on commonalities in the identification information associated with each of the people and entities. The system can also be used simply for browsing through connections between parties and for obtaining the identification information associated with the record for a particular party. While the application has been described in connection with an example using businesses and business people as the parties, it will be understood that any party could utilize the connection-determining feature of the present invention and be the subject matter, including schools, civic groups, churches, organizations, associations, families, agencies, neighborhoods, etc., and the people who populate such groups.
Client Link and Private Link
The system described above may also include the various features and capabilities described below, which enable a client (i.e., a user of host system 12) to generate a list of persons or entities (including groups of persons or groups of lists) that can function as a starting point for a connections query or request. This functionality can be referred to as “ClientLink™” (a trademark of Orion's Belt, Inc.) and made integral with or a separate module that works in concert with host operation system 102. A user's personal or private list created using ClientLink can be referred to as the user's “PrivateLink™” (a trademark of Orion's Belt, Inc.) or “PrivateLink list”. For purposes of this description we assume that ClientLink is integral with the host operation system 102 of
As a general overview of a host operation system 102 having aspects of PrivateLink, when the connections server and DB (or host system 12) receives a query including a PrivateLink list and an endpoint, the host operation system 102 generates information representing the connections to the endpoint for each member of the PrivateLink list, and returns this to the user. In other forms, rather than a single endpoint, a list of endpoints could be used (i.e., an endpoint list). In such a case, the host operation system 102 generates connections between each member in the PrivateLink list and each member in the endpoint list, to the extent such connections exist. In yet another form, a user may enter a single starting point and an endpoint list. In such a case the system generates connections from the starting point to each endpoint in the endpoint list, to the extent such connections exist. The following text describes these features more fully.
In this embodiment, host operation system 102 comprises several components:
The host operation system 102 including ClientLink includes a function called Connect that allows clients (or users) to specify both the desired endpoints of a connection—people, entities or PrivateLink list—and the degrees of separation. It may also provide for an enhanced graphical display and allow filtering according to the presence of specific people or entities in the connection paths (e.g., only show links with Michael Jordan in the path).
Other optional features include functions to:
ClientLink allows clients to integrate knowledge about their own connections and networks of relationships with the host database 102. For a multi-user subscriber, ClientLink can incorporate sophisticated permission protocols for controlling access to information by individual users. Users can indicate the existing people and entities in the host database 102 with which they have relationships. Additionally, the host operation system 102 can enable users to “fill in the blanks” with ClientLink, i.e., add additional information about relationships between people and entities. All of the ClientLink information is preferably kept proprietary to the specific subscriber.
Browse is a function that displays first-order relationships for a specified person, entity or PrivateLink list. An optional feature, “Explore”, allows the user to easily determine concentric, expanding relationships radiating out from a central ending point, whether a person or an entity. Extended Browse capabilities allow searching along a number of parameters such as functional position (e.g., CEO) or education (e.g., MIT alumni).
ClientLink Integration
Synchronizing each customer's PrivateLink list or data with host operation system 102 is the process whereby names in a user's contact list are matched to names in host operation and system database 102. Then, client subscribers can connect from their personal or corporate contacts to the decision-makers in host database 102.
The host operation system 102 can accommodate this synchronization through a variety of mechanisms, including by using plug-ins for popular Customer Relations Management (CRM) and contact management systems to customized extraction.
ClientLink
As mentioned above, ClientLink is the feature that links a client's own contacts (e.g., customers, referral sources, vendors, etc.) 850 with the host database 102 (or connections) in order to produce the most effective links for each client. This feature allows a user to specify in a database 856, in advance, the people 852 or entities 854 in the host database 102 which are to be used as sources for a connection, thus eliminating the need to specify a unique starting point for each connection request.
An individual user's list 860 can be part of a group, and connections can be requested using groups as a starting point. This feature allows client users to request connections from their own or from their colleague' contacts, depending on the flexibility of each client's protocols regarding access to lists. In the host operation system 102, a user's ClientLink list is called a PrivateLink list. Client administrators have wide latitude in setting up groups, so that connections can be requested from an office, a region, a practice, or an entire organization. Security protocols prevent any client from accessing another client's ClientLink data.
ClientLink can be customized for each client, e.g., during its installation. This includes, for example, determining the most effective way to make existing contact lists (e.g., from common contact management or CRM products) accessible by the host operation system 102, identifying client protocols regarding user' lists, and working with the client administrator to establish the group/list structure.
Users can populate their PrivateLink list, e.g., at the time of installation, by extracting data from their current contact lists, or they can manually enter data into their PrivateLink list as they use host operation system 102.
Technology
One embodiment of the technology in ClientLink includes two overall components, as discussed in detail above:
Its components are linked in an overall information architecture 800, shown schematically in
Data-Collection Technology
The host database 102 contains information about entities, people, and the relationships among them:
This information is derived from publicly available sources 802 (offered either free or by subscription) by a combination of automated methods with minimal manual intervention. The host database 102 is populated via a four-step process:
Web Crawlers
Web crawlers 804 are generally known in that art, and are used here to find and collect data about entities and the individuals associated with them. This data can be found at company web sites, SEC filings, executive biographies 808, structured person-entity relationship data sources 810, and a variety of other sources, such as press releases. This data gathering process uses a combination of readily available tools (e.g., Wget) and ad-hoc host operation system software. The Web crawler can identify some kinds of data relevant to host operation system 102 by its relationship to headings and tables on the HTML page.
Parser
For public corporations, the most useful sources of information—such as SEC filings 806 or company web sites—generally contain “Executive Biographies” 808, biographical paragraphs that provide background and supplementary data about each person associated with a particular corporation. These paragraphs are analyzed by a collection of computer programs called the “parser” 812 to identify entities, people, and relationships among them. An example of a paragraph from an SEC filing for the TALX Corporation is shown below:
First, the parser 812 partitions the paragraph into separate sentences. Then, the parser 812 identifies entity names, people names, positions, and dates using a set of recognizer programs. Some of these elements are recognized heuristically (e.g., dates) while others are recognized by a combination of heuristics and by looking them up in a pre-defined list (e.g., entity names). The parser 812 can have a list of more than 64,000 entity names, entity name variants, and aliases (e.g., GE for General Electric Corporation).
Finally, the parser 812 matches sentences containing recognized elements against a list of content patterns. If it finds a match, it uses the entity and position or title specified in the sentence to generate a corresponding relationship between an entity and a person. This relationship may also have start and end dates, if these were present in the sentence. If the parser 812 cannot find a match between a sentence and its list of patterns, it creates a candidate pattern based on the sentence structure, but does not create a relationship. Instead, it records both sentence and candidate pattern to a log file for human review and, where appropriate, for manual input.
The parser 812 used in this embodiment can analyze about 90 sentences per second and takes about two hours to process all public companies listed on the NYSE, NASDAQ and AMEX exchanges. Currently, the parser 812 accepts about 30% to 40% of the information it encounters in free-text format. The acceptance rate will rise as the number of content patterns is increased, but it is unlikely to ever reach 100% with the techonology presently available; perhaps 60% to 75% is a realistic goal for well-written biographical paragraphs. The accuracy of the parsed data is very high—around 95%. Because of the high specificity of the parser 812, it will be able to identify and extract correct relationships when they are mentioned in bodies of text where much of the content is on another topic (e.g., from press releases).
Some sources of data may be of such syntactic complexity or poor grammatical quality that the acceptance rate may be much lower. Even for well-written sources, however, eventually it could reach a point of diminishing returns, where the effort required to analyze sentences programmatically will exceed the effort required to do so manually. But improvements in processing technology could result in significant increases the acceptance rate. However, where there are remaining sentences, they can be analyzed manually. Experience to-date suggests that larger, public companies tend to have better-written biographical paragraphs. These companies were the first priority for loading into the host database 102.
Data Load
When the parser 812 has completed its work, the resulting output undergoes a modest amount of mostly automated follow-up processing to:
The results from parser 812 and any structured person-entity-relationship data 810 are passed to an assembly and merge database 814, which bring the data together, along with any data from licensed data sources 816 and any “data curator tools” 818 provided for accessing data stored within the system or other known repositories. The assembly and merge database ultimately provides a production database 820, which is the host database 102.
As host database 102, database 820 is used by the ClientLink functionality 822 and web site and connect functionality of host operation system 102. The ClientLink functionality 822 can use client (or customer) contact and CRM data, input by the customer 830 to help build the production database 820.
Database 830 (i.e., host database 102) can be generated using computer software to extract information from electronically available data sources, as discussed. Human input can also be used, if needed, to:
Update Process
The host database 102 can be kept current in several ways:
Database updates are preferably done daily, and only allowed from a single system with a secure connection to the database 102. All database changes (corrections, additions, and deletions) can be logged to create an audit trail.
Connection-finding Technology
The connection-related technology includes a user interface for access to the host database 102, and the algorithms required to find and to display connections between people and entities as requested by a user.
Access to Host Operation System
Users access the host operation system 102 via a graphical, browser-based interface by customer 130 (e.g., user 14 from
Referring to
Connections
Users can ask the host operation system 102 (i.e., DB 820) to find connecting paths between a starting point (either a person or an entity) and an end point (which can also be either a person or an entity). Hence there are four connection possibilities:
For example, suppose a user wanted to know if there was a path between John Phelan (a former chairman of the New York Stock Exchange) and Exxon Mobil Company. After requesting a Person-to-Entity connection, the user is asked to specify the person and the entity, as shown in the screen shot 900 of
After selecting the particular one or more person and entity desired in
Screen 1100 also includes three buttons 1150: View Table, View Graphic, and Filter Results. The Filter Results button allows the user to filter the results, which is valuable when a large number of connections are returned. The View Graphic button generates a screen that depicts the connections graphically, as demonstrated in
Within the host operation system 102, a user's ClientLink list is called a PrivateLink. Users can request connections from their PrivateLink to either a person or an entity. An example is shown in the screen shot 1300 of
Selection of the Connect button 1350 of
In sub-table 1410, JLD's List and MMacksoud's List each had 1 result. These were each selected for viewing in the View column. This time, selection of the View Graphic button 1430 produces screen shot 1500 of
Connection Technology Extensions
Beyond that described above, extensions to the connection technology could be selectively implemented. The connection algorithms look for overlaps between the time periods during which two or more people were associated with an entity. But the connection algorithms themselves have no intrinsic knowledge of people and entities—they actually look for overlaps between entries in a general-purpose relational database. These entries could be, for example:
More generally, entries in the database can represent containers or contents-of-containers, where a content entry is associated with a container entry over some (perhaps indefinite) period of time. Containers can themselves be the contents of other containers.
The connection technology and associated user interface can also be applied to client' private databases (e.g., a recruiting firm's inventory of potential candidates). Third-party databases can be integrated into the service providing the host operation system, permitting revenue sharing arrangements with established content providers.
Browse
The browse function (shown has a selectable function in
ClientLink
ClientLink may also be further appreciated with respect to
The invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are therefore to be considered in respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of the equivalency of the claims are therefore intended to be embraced therein. As used herein, the terms “includes” and “including” mean without limitation. It is intended by the following claims to claim any and all modifications and variations that fall within the true scope of the inventive concepts.
This application claims the benefit of priority under 35 U.S.C. §119(e) from co-pending, commonly owned U.S. provisional patent application Ser. No. 60/483,463, filed Jun. 27, 2003. This application also claims the benefit of priority under 35 U.S.C. §120 from co-pending, commonly owned U.S. non-provisional patent application Ser. No. 10/747,550, filed Dec. 29, 2003 (which is a continuation application of commonly owned U.S. non-provisional patent application Ser. No. 09/882,170, filed Jun. 15, 2001, now U.S. Pat. No. 6,697,807).
Number | Date | Country | |
---|---|---|---|
60483463 | Jun 2003 | US |