Many initiatives are moving the Internet toward being a more social tool. Chief among them is the “open graph” offered by FACEBOOK, INC. of Palo Alto, Calif., which allows website administrators to place endorsement or “Like” buttons on their websites. By selecting a “Like” button, users share with their social network connections that they positively endorse the represented entity (for instance, a movie, a celebrity, etc.). Further, the “Like” event is subsequently surfaced on the selecting user's online profile or wall. Moreover, this signal may be broadcast publicly, as the “Like” counter on a page, representing the number of times the page (or entity) is positively endorsed. Such crowd-sourced entity ratings have the potential for dramatically changing the way users navigate and interact with the Web.
However, a significant drawback with this resource is that endorsement data tends to be very sparse (few “likes” per page, and a high variance of “likes”) and fragmented, that is, the same entity may be represented by several pages within one website or pages across many sites, which dilutes the available endorsement data.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Embodiments of the present invention relate to systems, methods, and computer-readable storage media for, among other things, facilitating the aggregation of endorsement data derived from multiple sources that represent the same entity. Entity-endorsement data is received from a plurality of different sources. Entity resolution is then performed to identify like entities. Endorsement data pertaining to a resolved entity and derived from each appropriate source is then merged or aggregated such that endorsement data pertaining to a particular entity but derived from disparate sources may be accumulated in one place. The aggregated endorsement data may then be presented with or without an identification of the sources from which the data was aggregated. In this way, sparseness and fragmentation of endorsement data are mitigated and a more complete picture of an entity's endorsement status may be seen.
The present invention is illustrated by way of example and not limitation in the accompanying figures in which like reference numerals indicate similar elements and in which:
The subject matter of the present invention is described with specificity herein to meet statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventors have contemplated that the claimed subject matter might also be embodied in other ways, to include different steps or combinations of steps similar to the ones described in this document, in conjunction with other present or future technologies. Moreover, although the terms “step” and/or “block” may be used herein to connote different elements of methods employed, the terms should not be interpreted as implying any particular order among or between various steps herein disclosed unless and except when the order of individual steps is explicitly described.
Various aspects of the technology described herein are generally directed to systems, methods, and computer-readable storage media for, among other things, performing entity-based aggregation of endorsement data derived from multiple sources. An “entity,” in accordance with embodiments of the present invention, is a description of some sort of real-world object or item. That is, an entity is a representation of a real-world concept. Entities sharing common attributes may be grouped into entity types. “Endorsement data,” as utilized herein, may take a variety of forms including, without limitation, liking, sharing, tagging, commenting on, reading, viewing, selecting, bookmarking, saving, tweeting, etc. Endorsements may be favorable or unfavorable. Endorsement data may also encompass rating data wherein the strength of a favorable or unfavorable endorsement is indicated by a scale of some sort, or verbal/textual annotations indicating sentiment.
In accordance with embodiments hereof, upon receipt of endorsement data from multiple sources, entity resolution is performed to identify like entities. Sources may include, without limitation, websites, web pages, database records, files, data feeds, and networks. Once the entities are resolved, the relevant endorsement data from each appropriate source is aggregated. The aggregated endorsement data may then be presented with or without an identification of the sources from which the data was aggregated.
Accordingly, one embodiment of the present invention is directed to one or more computer-readable storage media storing computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform a method for performing entity-based aggregation of endorsement data. The method includes receiving attribute data about an entity, the attribute data being derived from a plurality of sources at least a portion of which are associated with endorsement data pertaining to the entity; aggregating at least a portion of the attribute data into resolved entity data pertaining to the entity; aggregating at least a portion of the endorsement data into resolved endorsement data pertaining to the entity; and storing the resolved entity data and the resolved endorsement data in association with one another and in association with an entity identifier for the entity.
In another embodiment, the present invention is directed to a system for performing entity-based aggregation of endorsement data. The system includes a computing device associated with a server having one or more processors and one or more computer-readable storage media and a data store coupled with the server. The server is configured to receive a search query from a user, at least a portion of the search query pertaining to an entity; receive attribute data about the entity, the attribute data being derived from a plurality of sources at least a portion of which are associated with endorsement data pertaining to the entity; aggregate at least a portion of the attribute data into resolved entity data pertaining to the entity; aggregate at least a portion of the endorsement data into resolved endorsement data pertaining to the entity; and transmit at least a portion of the resolved entity data and at least a portion of the resolved endorsement data for presentation in association with a search engine results page.
In yet another embodiment, the present invention is directed to a method being performed by one or more computing devices including at least one processor, for performing entity-attribute-based aggregation of endorsement data. The method includes receiving a search query from a user, at least a portion of the search query pertaining to an entity-attribute that is common among a plurality of entities; receiving endorsement data pertaining to at least a portion of the plurality of entities, the endorsement data being derived from a plurality of sources; aggregating at least a portion of the endorsement data into resolved endorsement data pertaining to the at least a portion of the plurality of entities; and transmitting at least a portion of the resolved endorsement data and an identifier of one or more entities of the at least a portion of the plurality of entities for presentation in association with a search engine results page.
Having briefly described an overview of embodiments of the present invention, an exemplary operating environment in which embodiments of the present invention may be implemented is described below in order to provide a general context for various aspects of the present invention. Referring to the figures in general and initially to
Embodiments of the present invention may be described in the general context of computer code or machine-useable instructions, including computer-executable instructions such as program modules, being executed by a computer or other machine, such as a personal data assistant or other handheld device. Generally, program modules including routines, programs, objects, components, data structures, and the like, refer to code that performs particular tasks or implements particular abstract data types. Embodiments of the invention may be practiced in a variety of system configurations, including, but not limited to, hand-held devices, consumer electronics, general purpose computers, specialty computing devices, and the like. Embodiments of the invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
In a distributed computing environment, program modules may be located in association with both local and remote computer storage media including memory storage devices. The computer useable instructions form an interface to allow a computer to react according to a source of input. The instructions cooperate with other code segments to initiate a variety of tasks in response to data received in conjunction with the source of the received data.
With continued reference to
The computing device 100 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by the computing device 100 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device 100. Computer storage media does not comprise signals per se. Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer-readable media.
The memory 112 includes computer storage media in the form of volatile and/or nonvolatile memory. The memory may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard drives, optical disc drives, and the like. The computing device 100 includes one or more processors that read data from various entities such as the memory 112 or the I/O components 120. The presentation component(s) 116 present data indications to a user or other device. Exemplary presentation components include a display device, speaker, printing component, vibrating component, and the like.
The I/O ports 118 allow the computing device 100 to be logically coupled to other devices including the I/O components 120, some of which may be built in. Illustrative I/O components 120 include a microphone, joystick, game pad, satellite dish, scanner, printer, wireless device, etc.
As previously mentioned, embodiments of the present invention are generally directed to systems, methods, and computer-readable storage media for, among other things, facilitating the aggregation of endorsement data derived from multiple sources that represent the same entity. Entity-endorsement data is received from a plurality of different sources, for instance, websites, web pages, database records, files, data feeds, networks, and the like. Entity resolution is then performed to identify like entities. Once the entities are resolved, the relevant endorsement data from each appropriate source is aggregated. The aggregated endorsement data may then be presented in a single entity view with or without an identification of the sources from which the data was aggregated. In this way, sparseness and fragmentation of endorsement data are mitigated and a more complete picture of an entity's endorsement status may be seen.
Referring now to
It should be understood that any number of client computing devices and servers may be employed in the computing system 200 within the scope of embodiments of the present invention. Each may comprise a single device/interface or multiple devices/interfaces cooperating in a distributed environment. For instance, the server 212 may comprise multiple devices and/or modules arranged in a distributed environment that collectively provide the functionality of the server 212 described herein. Additionally, other components/modules not shown also may be included within the computing system 200.
In some embodiments, one or more of the illustrated components/modules may be implemented as stand-alone applications. In other embodiments, one or more of the illustrated components/modules may be implemented via the client computing device 210, as an Internet-based service, or as a module inside the server 212. It will be understood by those of ordinary skill in the art that the components/modules illustrated in
It should be understood that this and other arrangements described herein are set forth only as examples. Other arrangements and elements (e.g., machines, interfaces, functions, orders, and groupings of functions, etc.) can be used in addition to or instead of those shown, and some elements may be omitted altogether. Further, many of the elements described herein are functional entities that may be implemented as discrete or distributed components or in conjunction with other components, and in any suitable combination and location. Various functions described herein as being performed by one or more entities may be carried out by hardware, firmware, and/or software. For instance, various functions may be carried out by a processor executing instructions stored in memory.
The client computing device 210 may include any type of computing device, such as the computing device 100 described with reference to
The server 212 is configured to receive and respond to requests that it receives from components associated with user computing devices, for instance, the browser 218 associated with the client computing device 210. Those skilled in the art of the present invention will recognize that the present invention may be implemented with any number of searching utilities. For example, an Internet search engine or a database search engine may utilize the present invention. These search engines are well known in the art, and commercially available engines share many similar processes not further described herein.
As illustrated, the server 212 includes a query receiving component 222, an entity resolution component 224, an attribute/endorsement data receiving component 226, an aggregating component 228, and a transmitting component 230. The illustrated server 212 also has access to a data store 214. The data store 214 is configured to store information pertaining to search queries, entities, and endorsement data. In various embodiments, such information may include, without limitation, search query logs, an index of entity types and corresponding entities, and an index or other listing of sources determined to be malicious. In embodiments, the data store 214 is configured to be searchable for one or more of the items stored in association therewith. It will be understood and appreciated by those of ordinary skill in the art that the information stored in association with the data store 214 may be configurable and may include any information relevant to search queries, entity types and corresponding entities, and endorsement data. The content and volume of such information are not intended to limit the scope of embodiments of the present invention in any way. Further, though illustrated as a single, independent component, the data store 214 may, in fact, be a plurality of storage devices, for instance a database cluster, portions of which may reside in association with the server 212, the client computing device 210, another external computing device (not shown), and/or any combination thereof.
The query receiving component 222 of the server 212 is configured to receive requests for presentation of search results that satisfy an input search query, at least a portion of the search query pertaining to an entity. Typically, such a request is received via a browser associated with a client computing device, for instance, the browser 218 associated with the client computing device 210. It should be noted, however, that embodiments of the present invention are not limited to users inputting a search query into a traditional query-input region of a screen display.
The entity resolution component 224 is configured to resolve entity data received from a plurality of sources. An exemplary entity resolution system 300 in accordance with embodiments of the present invention is shown in the block diagram of
The attributes for the represented entities are cleaned and normalized by pre-processing 312. Blocking 314 is then performed to determine which pairs of entity representations ought to be thoroughly compared to one another. Entity pairs for comparison may come from distinct data sources if such sources do not contain duplications. However, this is generally not the case (e.g., FACEBOOK fan pages contain duplicates) and thus, in general, blocking 314 outputs pairs of entities from both distinct sources and the same source. Scoring 316 then assesses the attributes of entity pairs, computes scores on a per attribute basis, (e.g., edit distance between movie titles, difference between release years, etc.), and combines the attribute scores into a total similarity score. In the matching step 318, the similarity score may be compared to a predefined threshold to produce matching pairs, or a more sophisticated graph matching problem may be solved. Finally, merging 320 is performed wherein matched entity representations are combined into a single integrated overall representation for an entity. Merging may involve taking the union over all attribute values or voting in some way so as to come to a consensus on the true attribute values of an entity. In accordance with embodiments hereof, the endorsement data pertaining to the underlying entity is also merged or aggregated.
A large number of merging or aggregation schemes are possible and the following examples are described herein for illustrative purposes only and are not intended to limit the scope of the present invention in any way. At the simplest end of the spectrum, the aggregate number of favorable endorsements or ratings for an entity may be taken as the sum number of ratings or mean rating from the matched entity representations respectively. More generally, any “estimator of location” could be used for ratings such as the median, or any other statistic similar to the mean but that may be more robust to outliers (for example).
Indeed, outliers, and even malicious endorsements, motivate other kinds of aggregation methods as well. For instance, known “like” farms produce automated favorable endorsements for pay. Aggregation methods may be utilized to detect sets of such endorsements and remove them from aggregation, for instance, by noting IP addresses producing inordinate numbers of favorable endorsements or addresses that correlate with endorsements known to be spam. Unfavorable endorsement or rating data may be source-specific, either due to malicious or benign reasons, and simple machine learning methods may be utilized to learn which sources have poor quality so that aggregation of favorable endorsements and ratings can down-weigh such sources.
Aggregation could also be done on the basis of the set of all people that favorably endorse a given entity at various sources. In this instance, if a user were to search for an entity, for instance, “James Cameron movies,” then in accordance with embodiments hereof, the user may be preferentially shown movies endorsed by social network connections of the user, regardless of the source in association with which such connections endorsed the movie.
Many of the described schemes for aggregating favorable and unfavorable endorsement data may be utilized to aggregate ratings also. However, ratings by one person on multiple entities (e.g., movies) may not match up in scale to those ratings of another person. Thus, tools such as collaborative filtering and recommendation systems (e.g., those used by sites such as NETFLIX) may be employed to normalize ratings so that they can be aggregated in a sensible way.
Where endorsements are textual or verbal, aggregation may also be done on the basis of classifiers or extractors of sentiment from the endorsement, whether positive or negative, and such sentiments could then be aggregated. For example, a user may comment on a web page for a restaurant that it is good for a night out for couples; and another user might tweet about the same restaurant mentioned on a different web page has having good low lighting; and these could be aggregated via entity resolution for that restaurant entity as positive recommendations for a romantic dinner outing.
Referring back to
The aggregating component 228 is configured to aggregate at least a portion of the received attribute data into resolved entity data pertaining to the associated entity. The aggregating component 228 is further configured to aggregate at least a portion of the endorsement data for a given entity into resolved endorsement data pertaining to the entity.
It should be noted that both components 226 and 228 may simply be included in the entity resolution component 224 rather than being separate components as illustrated herein, so long as the selected arrangement results is not only the entities themselves being resolved but also the associated endorsement data. Any and all such variations, and any combination thereof, are contemplated to be within the scope of embodiments of the present invention.
The transmitting component 230 is configured to transmit at least a portion of the resolved entity data and resolved endorsement data for presentation, for instance, in association with a single entity view on a search engine results page. Exemplary screen displays showing illustrative endorsement resolution presentations are more fully described below with respect to
The simplest case for surfacing aggregated endorsement information relates to the presentation of such information for the top-ranking entity associated with an input query by summing the favorable endorsements from known sources contributing to a resolved entity. An exemplary such presentation is shown with reference to the screen display 400 of
Also illustrated in the exemplary screen display 400 of
In embodiments, users may be provided with an option to filter search results such that only those URLs related to a resolved entity and having endorsement data associated with the resolved entity are presented. This is shown in the exemplary screen display 700 of
In embodiments, aggregate endorsements may be applied to groups or categories of entities as well. In accordance with such embodiments, aggregate endorsements for all entities that are part of the category or group, as well as a list of the entities that are part of the group may be presented. This is shown in the exemplary screen display 800 of
Turning now to
In accordance with embodiments of the present invention, similar steps to those described above may be performed to aggregate endorsements with respect to similar entities, for instance, multiple entities having a common attribute, such as movies staring the same actor (as described above with respect to the screen display 800 of
Initially, as indicated at block 1010, a search query is received form a user, at least a portion of the search query pertaining to an entity-attribute that is common among a plurality of entities. As indicated at block 1012, endorsement data pertaining to at least a portion of the plurality of entities is received, the endorsement data being derived from a plurality of sources. At least a portion of the endorsement data is aggregated into resolved endorsement data pertaining to the entities, as indicated at block 1014. At least a portion of the resolved endorsement data and an identifier of one or more entities of the plurality of entities is transmitted for presentation in association with a search engine results page, as indicated at block 1016.
As can be understood, embodiments of the present invention provide systems and methods for performing entity-attribute-based aggregation of endorsement data. The present invention has been described in relation to particular embodiments, which are intended in all respects to be illustrative rather than restrictive. Alternative embodiments will become apparent to those of ordinary skill in the art to which the present invention pertains without departing from its scope.
While the invention is susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the invention to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention.
It will be understood by those of ordinary skill in the art that the order of steps shown in the methods 900 of