Modern organizations store electronic information associated with one or more entities, such as people, organizations, or the like. For example, different systems within an enterprise may store data representing personal contact information, management structures, electronic mail communications, project assignments, etc. This data may suggest different types of relationships (e.g., manager-managee, same project, same location, etc.) between people within the enterprise.
Commonly-assigned co-pending U.S. patent application Ser. No. 12/253,562 describes systems to populate a social network based on disparate enterprise source data in order to expose various relationships between entities. Commonly-assigned co-pending U.S. patent application Ser. No. 12/253,518 describes a system to search for entities within such a social network.
Conventional search engines employ scoring algorithms to determine “best” results. For example, a person may submit the search query “Paris developer” and receive a list of results (e.g., entities) ordered by relevance. The list is unaffected by the relationships of the person who submitted the search query. Accordingly, the order of the list is the same regardless of who submitted the search query.
Systems are desired to produce search results which are influenced by relationships of the searching entity. Systems are also desired in which different types of relationships provide different degrees of influence over the search results.
The following description is provided to enable any person in the art to make and use the described embodiments and sets forth the best mode contemplated for carrying out some embodiments. Various modifications, however, will remain readily apparent to those in the art.
Some embodiments may be implemented using a hardware architecture such as that shown in
Application layer 120 may provide access to data stored in database 110. This access may be provided through a reporting, dashboarding, and/or analysis framework supported by application layer 120. Application layer 120 may provide search engine functionality for searching data 115 according to some embodiments. Application layer 120 may also provide security and data distribution functions.
Application layer 120 includes business logic 125 for providing business functions based at least in part on data 115 of database 110. For example, business logic 125 may comprise program code executable by application platform 100 to perform any of the processes described herein. A portion of business logic 125 may run periodically in batch mode to associate cluster identifiers with entities and index the cluster identifiers as described below.
Presentation layer 130 includes program code to provide a user interface for accessing data 115 of database 110 and/or functions provided by business logic 125 via application layer 120. Client system(s) 140 may comprise any suitable device(s), such as a desktop computer, a laptop computer, a personal digital assistant, a tablet PC, and a smartphone. Client system(s) 140 may host the program code of presentation layer 130 (i.e., in a rich client architecture) or may access the code remotely, such as through a Web-based portal.
Generally, process 300 may be executed to associate cluster identifiers with entities, and to index these cluster identifiers to facilitate subsequent searching. As described with respect to business logic 125, this association and indexing may be executed as a batch process. The batch process may be run during periods of low usage (e.g., overnight) so as not to consume system resources during periods of high usage. Moreover, process 300 may be distributed among several processing nodes using known distributive computing techniques.
Initially, at S310, data representing a particular type of relation between each of a plurality of entities is determined. As described above, the determined data may represent email relations between the plurality of entities.
Next, at S320, a set of entity clusters associated with the relation type is determined based on the data. Each entity cluster is associated with at least one of the plurality of entities.
Each entity is associated with a cluster identifier at S330. The associated cluster identifier identifies the entity cluster with which the entity is associated. Continuing the present example, Entities 2, 4 and 5 are associated with the cluster identifier “A1” and Entities 1 and 3 are associated with the cluster identifier “A2”. This association may be persisted in a data structure using any suitable data schema.
At S340, it is determined whether any more relation types are of interest. S340 may comprise attempting to determine whether a set of data represents an additional relation type. As described above, the relation types represented in the data may include relation types based on business hierarchies (e.g., common manager, connected to a same third person within an organizational chart, common department), relation types based on communications (e.g., e-mail exchanges, telephone calls), and relation types based on activities (e.g., common project).
If so (e.g., the set of data specifies whether entities have worked on a same project), flow returns to S310 to determine data representing a next type of relation between each of the plurality of entities.
Flow continues through S320 and S330 as described above. With reference to
A single entity cluster is determined at S320 based on the data of Relation Type B. Accordingly, at S330, each of Entities 1 through 5 is associated with the cluster identifier “B1”. It will be assumed that yet another relation type is identified at S340, causing flow to return to S310 to determine data representing the next type of relation (e.g., Relation Type C of
A cluster field is added to a search engine index at S350. S350 may be omitted if the search engine index already includes the cluster field. For illustration,
Returning to process 300, the cluster identifiers associated with the entities are indexed at S360. Any system for indexing that is or becomes known may be employed at S360.
As will be described below, subsequent processes may use the results of process 300 to present search query results based at least in part on the entity cluster to which the entities belong. Process 300 may be performed periodically to account for new data which may represent new relation types and/or new cluster-entity associations for existing relation types.
Process 800 of
A search query is received from an entity at S810. The search query is intended to identify particular entities. More specifically, the search query includes search terms and is intended to return “result entities” to which those search terms are relevant.
User interface 900 includes input field 910 and Search button 920. Search terms are entered into input field 910 in order to identify one or more entities associated with the search terms. Selection of Search button 920 causes reception of a search query including the entered search terms. In one example, the search query is received by application layer 120 of application platform 100.
Next, at S820, one or more entity clusters associated with the searching entity are determined. Entity-entity cluster associations may be determined and stored prior to process 800 as described above with respect to process 300. Accordingly, S820 may include accessing these pre-stored associations. According to the present example, it will be assumed that the search query is received from an entity associated with Entity Id “3”. Therefore, with reference to Entities data structure 200, the entity clusters A2, B1 and C3 are determined at S820.
A search result is determined at S830 based on the search query. The search result includes two or more result entities. Any query-responsive searching system may be employed at S830. In some embodiments, the searching system includes a search engine and search engine index such as search engine 620 and search engine index 610. More specifically, search engine index 610 may index data associated with various entities and search engine 620 may identify two or more result entities based on the search terms, the indexed data, and its searching algorithms. Its searching algorithms may also associate a relevance score with each result entity as is known in the art.
The two or more result entities are presented at S840.
Beginning with process 1100 of
According to some embodiments, the query is modified by adding one or more entities to the search terms at S 1130. For example, the search terms “Paris” and “developer” may be received from the entity associated with Entity Id 3 at S1110. The entity clusters A2, B1 and C3 may then be identified at S1120. At S1130, a modified search query may be generated to indicate that “Paris” and “developer” are required search terms and that the determined entity clusters are optional search terms. More specifically, the received search query “Paris developer” may be translated at S1130 to “must (Paris, developer) may (cluster:A2) may (cluster:B1) may (cluster:C3)”. Embodiments are not limited to this example, and may modify the search query in any other manner based on the determined entity clusters.
A search result is determined at S1140 based on the modified search query. The search result includes two or more result entities, each of which is associated with a respective relevance score. The search may be performed using any suitable searching technology. In the case of the search terms of the present specific example, the search may be performed using indices 612, 614 and 616 of system 600. Accordingly, some embodiments of process 1100 require pre-processing steps S350 and S360 of process 300.
At S1140, the two or more result entities are presented in an order according to their respective relevance scores. Presentation of search results based on relevance scores is known in the art. However, by virtue of the modified search query, the relevance score associated with each result entity will depend in part on the entity clusters which the result entity shares with the searching entity. For example, a relevance score associated with Entity Id 1 may be greater than a relevance score associated with Entity Id 4 because Entity Id 1 shares two entity clusters with Entity Id 3. Therefore,
Process 1100 may be used in conjunction with conventional systems for searching and for presenting search results. That is, process 1100 requires some non-conventional pre-search processing (e.g., S1120 and S1130) but advantageously allows the subsequent search and result presentation to proceed according to conventional techniques.
Turning to process 1200, S1210 and S1220 may also proceed as described with respect to S810 and S820. At S1230, a search result is determined based on the received search query. The search result includes two or more result entities, each of which is associated with a respective relevance score. S1230 may proceed based on the search query and using any searching system/algorithm that is or becomes known. Next, at S1240, it is determined whether one or more of the result entities is associated with a same relevance score.
For purposes of example, it is assumed that the search query “Paris developer” is received from the entity associated with Entity Id 3 at S1210. The entity clusters A2, B1 and C3 are identified at S 1220 as described above. The following search result is then obtained at S1230 based on the search query “Paris developer”: Entity Id 1 (relevance score 100%); Entity Id 4 (relevance score 100%); and Entity Id 5 (relevance score 50%). Accordingly, flow proceeds from S1240 to S1250 due to the identical relevance scores associated with Entity Id 1 and Entity Id 4.
At S1250, each of the identical relevance scores is modified based on cluster associations shared with the searching entity. That is, for each result entity having the identical relevance score, its relevance score is modified based on the one or more entity clusters determined at S1220 and on one or more entity clusters associated with the result entity.
According to some embodiments, the relevance scores are modified at S1250 based on the entity clusters which are shared between the searching entity and the respective result entities, and also on the relation types associated with the shared entity clusters. For example, and returning to
To further illustrate the calculation of weightings according to some embodiments, data structure 1300 of
The determined weightings may modify the relevance scores in S1250 using any combination of mathematical operands and/or other values. In one example, the weightings are added to the relevance scores as percentages. Accordingly, in the present example, the relevance score associated with Entity Id 1 is modified to 130% and the relevance score associated with Entity Id 4 is modified to 105%.
Flow returns to S1240 to determine if any other result entities are associated with identical relevance scores. If not, the two or more result entities are presented at S1260 in an order according to their respective relevance scores, which may or may not have been modified at S1250. Process 1200 therefore allows the presentation of the search results to proceed according to conventional techniques (i.e., according to relevance score), and does not require pre-search modification of the search query. Moreover, some embodiments of process 1200 do not require prior indexing of the Cluster Ids as described above.
Apparatus 1400 includes processor 1410 operatively coupled to communication device 1420, data storage device 1430, one or more input devices 1440, one or more output devices 1450 and memory 1460. Communication device 1420 may facilitate communication with external devices, such as an external design tool. Input device(s) 1440 may comprise, for example, a keyboard, a keypad, a mouse or other pointing device, a microphone, knob or a switch, an infra-red (IR) port, a docking station, and/or a touch screen. Input device(s) 1440 may be used, for example, to enter information into apparatus 1400. Output device(s) 1450 may comprise, for example, a display (e.g., a display screen) a speaker, and/or a printer.
Data storage device 1430 may comprise any appropriate persistent storage device, including combinations of magnetic storage devices (e.g., magnetic tape, hard disk drives and flash memory), optical storage devices, Read Only Memory (ROM) devices, etc., while memory 1460 may comprise Random Access Memory (RAM).
Program code 1432 may be executed by processor 1410 to cause apparatus 1400 to perform any one or more of the processes described herein. Embodiments are not limited to execution of these processes by a single apparatus. Entity data 1434 may include any type of data structures of information associated with entities, from which relations therebetween and associated entity clusters may be determined. Data storage device 1430 may also store data and other program code for providing additional functionality and/or which are necessary for operation thereof, such as device drivers, operating system files, etc.
The foregoing diagrams represent logical architectures for describing processes according to some embodiments, and actual implementations may include more or different components arranged in other manners. Moreover, each system described herein may be implemented by any number of devices in communication via any number of other public and/or private networks. Two or more devices of may be located remote from one another and may communicate with one another via any known manner of network(s) and/or a dedicated connection. Moreover, each device may comprise any number of hardware and/or software elements suitable to provide the functions described herein as well as any other functions. Other topologies may be used in conjunction with other embodiments.
All systems and processes discussed herein may be embodied in program code stored on one or more computer-readable media. Such media may include, for example, a floppy disk, a CD-ROM, a DVD-ROM, a Flash drive, magnetic tape, and solid state Random Access Memory (RAM) or Read Only Memory (ROM) storage units. Embodiments are therefore not limited to any specific combination of hardware and software.
The embodiments described herein are solely for the purpose of illustration. Those in the art will recognize that other embodiments may be practiced with modifications and alterations limited only by the claims.