Search engines increasingly strive to personalized search results in response to search queries. Search engines personalize search results for a computer user by taking into account the user's current context (location, type of device, time of day, etc.), the user's prior searching and browsing behaviors, the user's preferences—both implicitly and explicitly identified to the search engine, and the like.
While a search engine may receive explicit information regard a computer user's preferences, as well as be able to implicitly identify a user's preferences through the user's interaction with the search engine, there is a substantial amount of online information that the user generates that is not considered. Indeed, users often subscribe to a variety of services and sites on the Internet. For example, a user may subscribe (or otherwise interact with) one or more social networking sites, one or more news sites, college alumni sites, various special interest sites, and the like. Typically (though not exclusively), in interacting with a network site the user will provide information about himself/herself. While each site may take advantage of the information that a user provides to gain insight into the user, the insight is limited in scope by the nature of the subject matter and interaction of a particular site.
The following presents a simplified summary in order to provide a basic understanding of various embodiments described herein. This summary is not an extensive overview, and it is not intended to identify key and/or critical elements or to delineate the scope thereof. The sole purpose of this summary is to present some concepts in a simplified form as a prelude to the more detailed description that follows.
Systems, methods, and media for responding to search queries from a user with personalized search results are presented. A user vector is generated for a user. The user vector is generated by repeatedly accessing a plurality of network sites to obtain user-generated content, and updating the user vector according to the user-generated content. Moreover, a plurality of search results is identified in response to a query. Each of the search results is associated with a score. A user vector is obtained and the scores of the search results are weighted. A subset of the search results having favorable scores is selected and a search results page is generated from the subset of search results. The generated search results page is returned in response to the search query.
The foregoing aspects and many of the attendant advantages of the disclosed subject matter will become more readily appreciated as they are better understood by reference to the following description when taken in conjunction with the following drawings, wherein:
For purposes of clarity, the use of the term “exemplary” throughout this document should be interpreted as serving as an illustration or example of something, and it should not be interpreted as an ideal and/or a leading illustration of that thing. Unless explicitly indicated to the contrary, the terms “computer user” and “user” are synonymous and should be interpreted as a user of computers, including a person or entity capable of providing user-generated content on various network sites.
As used herein, “hyperlink” (also referred to as a “link”) is a reference to data/content at a target site. In some instances, when displayed on a Web browser on a user computer, a hyperlink is user actionable such that, upon activating (e.g., selecting) the hyperlink, the referenced content replaces the current content in the browser. Generally speaking, search results (the information returned from a search engine in response to a search query), are hyperlinks referencing corresponding content at a target sites. Search results in the search results pages are often presented as user-actionable links, commonly displayed in blue to indicate to the user the ability to select (or activate) the link, enabling the user to the referenced content at a target site.
As used in this document, “user-generated content” refers to data or information generated or provided by a user on one or more networked sites. The user-generated content may include information provided in the form of answers to questions, such as, by way of illustration and not limitation, where did you go to school, where have you worked, are you married, how many kids, etc. Similarly, user-generated content may comprise more free-form information such as (by way of illustration and not limitation) posts, comments, blogs, likes, etc.
The user-generated content can be analyzed to identify various interests and attributes of the user. Typically, but not exclusively, user-generated content is obtained from a variety of network sites, including social networking sites such as social networking site 116, and analyzed in the aggregate.
Turning to
These user computers typically, though not exclusively, are connected to a network 108, such as the Internet, a wide area network or WAN, and the like. As such, these user computers are at connected (via the network 108) to other computers and/or devices on the network. For example, as shown in
As will be discussed below, a suitably configured search engine 110 responds to search queries with the requested information. In particular, according to aspects of the disclosed subject matter, in response to receiving a search query, the search engine 110 identifies relevant content according to query terms of the search query. After identifying relevant content responsive to the user query, the search engine 110 personalizes the search results according to user-generated content, generates one or more search results pages from the personalized search results, and returns at least one search results page to the requesting user.
The illustrative environment is also shown as including a social networking site 116, a blog site 112, and a shopping site 114. Those skilled in the art will appreciate that social networking sites, such as social networking site 116, enable a user to connect to others (including friends, peers, family members, organizations, and the like) for keeping up-to-date with each other and sharing information. Examples of social networking sites include, by way of illustration and not limitation, Facebook, Google+, MySpace, Twitter, and the like. As those skilled in the art will appreciate, computer users typically generate substantial amounts of content (such as likes, posts, comments, and the like) with regard to other entities in their social networks. Similarly, as will be readily appreciated, blog sites, such as block site 112, allow users to post content for others to view. By way of illustration, a user may make posts of the daily events corresponding to a vacation such that others (interested parties) may view remain appraised of the activities of the posting user. These blogs/postings may certainly be viewed as user generated content. In some instances, some blog sites allow for content threads, various parties can interact on a topic or topics. Still further, shopping sites, such as shopping site 114, allow a user to conduct transactions for various items and/or services. The fact that the user purchases an item may be viewed as user generated content for that user. Additionally, shopping sites also frequently allow a user to review and grade items that have been purchased. Of course, this information may also be viewed as user generated content.
As can be seen, computer users generate a substantial amount of user generated content throughout various devices and sites on the Internet. According to aspects of the disclosed subject matter, this user generated content can be aggregated into user vectors to personalize a user's online experience, including personalizing search results to a search query. As will be described in greater detail below, a user vector corresponds to an array of data items. According to various embodiments, the user vector may be implemented as an un-ordered collection of labeled data items. Each data item represents a particular piece of information/data of the associated user. These data items include, by way of illustration and not limitation: facts, which may be static or dynamic in nature, such as age, gender, where and when the user went to school, where and when the user work, where the years are currently lives, and the like; user preferences, e.g., enjoys role-playing computer games, likes contemporary fiction, prefers popular music, dislikes classical music, and the like. Typically, though not exclusively, the user vector is implemented as a sparse array of data items, and in any given user vector a data item corresponding to a particular piece of information may or may not be present. In other words, a user vector for a first user may include elements corresponding to a particular topic that may not be present in the user vector of another user.
Turning now to
After identifying the user-related data from the user-generated content, at block 208 a user vector corresponding to the computer user is updated (or created if it does not already exist) with the user-related data. As indicated above, a user vector is a “vector” of data items corresponding to pieces of information about and preferences of the associated user. Typically, though not necessarily, preferences are associated with a strength or amplitude of preference (or dislike). For example,
As mentioned, the routine 200 may create or update the user vector according to the latest information. This “update” reflects the fact that much of the user-related data in the user vector is dynamic. Over time, a user's preferences may change, additional schools may be added, residency changed, etc.
After updating the user vector according to the identified user-related data, the routine 200 proceeds to the next network site that holds user-generated content, if there are any more from which user-generated content may be accessed. Assuming that there are more network sites, the routine 200 returns to block 202 where the above described steps are repeated. Alternatively, if there are no more network site, the routine 200 proceeds to block 212. At block 212, the routine delays for a predetermined amount of time (such as a day, a week, an hour, etc., if at all) before repeating/returning to the process described above. In this manner, the process continually or periodically updates the user vector for the user based on the aggregated user-generated content.
While routine 200 is described above in regard to generating and/or updating a user vector for a specific user, this is for illustration purposes and should not be construed as limiting upon the disclosed subject matter. In various embodiments, as each potential network site is accessed, information regarding user-generated content for multiple users is identified and the user vectors for the corresponding multiple users are generated and/or updated.
Having generated a user vector based on user-generated content, a service may suitably modify the user's experience according to the user vector. To this end,
As a preliminary matter for routine 400, in order to permit a search engine 110 to personalize search results according to user generated content, according to aspects of the disclosed subject matter a user must be “logged in” (i.e., have established a “logged in” status with the search engine.) As will be readily appreciated, a user is “logged in” by establishing his/her identity with a site/service and authenticating the identity with the site/service, typically though not exclusively by way of a password. According to various embodiments of the disclosed subject matter, the user may be logged in directly with the search engine 110. Alternatively, the user may be logged in with a related networked site, such as social network site 116, such that the search engine 110 may be able to determine the identity and authenticity of the user from the related networked site. As will be appreciated, the status of being “logged in” may persist between active sessions with the search engine 110 (or related networked sites) such that the user does not need to establish his/her identity each time the search engine is accessed. As will be further appreciated, persisting a logged in state may be accomplished by way of various techniques including but not limited to temporary files that are maintained on the computer user's computer, sometimes referred to as “cookies.”
As will be appreciated, if a search engine does not know the identity of the requesting computer user, the search engine cannot personalize search results according to user generated content of the requesting user. Of course, there are various techniques in which the search engine may implicitly identify a requesting user (e.g., the IP address at which the requesting user is operating, “cookies” that include information about the user but do not include login information, and the like). However, relying on implicit identification of a user may pose personal security risks for the user and may prevent the search engine from accessing the user generated content from various network sites.
Generally speaking, when an unidentified user submits a search query, the search engine 110 is likely unable to generate and/or identify a user vector corresponding to the requesting user and identifies search results for the search query according to default parameters (i.e., most common search results for the submitted search query, general geographic area of the IP address of the requesting user, the IP domain of the requesting user, and the like.) Thus, assuming that the user submits a search query for movies of a favorite actor, if the user has established a preference for dramatic movies over comedies in the user's user vector but has failed to establish a logged in status with the search engine (directly or via related networked site), then that element of the user vector would not be used in personalizing the search results that are returned to the user. Alternatively, if the user has established a logged in status with the search engine, when the search query for movies featuring the user's favorite actor is received, the search engine can personalize the search results for the user according to the user vector, including the preference for dramatic movies over comedies. Accordingly, for purposes of the discussion of routine 400, it is assumed that the user has a “logged in” status.
Beginning at block 402, the search engine 110 receives a search query from a user. At block 404, the search engine identifies search results that are relevant to the query term (or terms) of the search query. As will be appreciated, the search results are associated with corresponding scores indicating the likelihood that the search result would be desired by the requesting user in response to the search query. According to various embodiments, the score may reflect a general popularity of the search result, the strength of the search result to the search query, and the like.
At block 406, the search engine 110 accesses a user vector for the requesting user from a user vector data store. At block 408, the search engine 110 applies weighting to the scores of the search results according to the applicable data items of the requesting user's user vector. The result of this weighting is to favor or disfavor various search results according to the user's user vector. In other words, if an applicable data item (applicable to the search result) is found in the user vector, a weighting value (which may be based on a magnitude of the preference for or against the particular data item) is applied to score of the search result. Moreover, weighting can be applied in the aggregate: if two or more data items of a user vector are applicable to the search query, then (in at least one embodiment) the weighting of the score for the search result may be an aggregate of the various data items.
In addition to weighting the search results according to the user vector, at block 410 the search engine makes a further connection by weighting search results according to preferred authorship. In particular, the search engine 110 determines whether any data items in the user vector indicate a preference (either positive or negative) regarding a particular author, such as data item 304. When these exist in the user vector, the search engine weights search results that are authored by that particular author as a function of the amplitude of preference (for or against) to that author.
At block 412, candidate advertisements are identified for inclusion with the search results to be returned to the user. As those familiar with search engines will appreciate, as the search services that a search engine provides is typically free to the user, in order to defray the costs of operating the search engine, the search engine will include advertisements for which advertisers pay the search engine. As with search results, advertisements may be scored according to various criteria (fulfillment goals, relevance to the search query, popularity of advertised product, and the like) such that those scoring favorably high the highest likelihood of being included with the search results that will be returned to the user.
After identifying candidate advertisements to be included with the search results, at block 414 the search engine applies weighting to the candidate advertisements based on the user vector. As with the search results, applying weighting to the candidate advertisements may alter, either favorably or unfavorably, the scores associated with one or more of the candidate advertisements.
At block 416, one or more search results pages are generated from the identified search results (based on the weighted scores of the search results). Additionally, one or more candidate advertisements are included in the one or more search results pages, where the candidate advertisements are selected according to their weighted scores. After generating the one or more search results pages, at block 418 at least one search results page is returned to the requesting user in response to the search query. This at least one search results page includes those search results and advertisements that scores most favorably after the weighting of the user vector was applied. Thereafter, the routine 400 terminates.
Regarding the exemplary routines 200 and 400 described above, while these routines are expressed in regard to discrete steps, these steps should be viewed as being logical in nature and may or may not correspond to any actual and/or discrete steps of a particular implementation. Nor should the order in which these steps are presented in the various routines be construed as the only order in which the steps may be carried out. Moreover, while these routines include various novel features of the disclosed subject matter, other steps (not listed) may also be carried out in the execution of the routines. Further, those skilled in the art will appreciate that the described, logical steps of these routines may be combined together or be comprised of multiple steps. Steps of routines 200 and 400 may be carried out in parallel or in series. Often, but not exclusively, the functionality of the various routines is embodied in software (e.g., applications, system services, libraries, and the like) that is executed on computer hardware and/or systems as described below in regard to
While many novel aspects of the disclosed subject matter are expressed in routines embodied in applications, also referred to as computer programs, apps (small, generally single or narrow purposed, applications), and/or methods, these aspects may also be embodied as computer-executable instructions stored by computer-readable media, also referred to as computer-readable storage media. As those skilled in the art will recognize, computer-readable media can host computer-executable instructions for later retrieval and execution. When the computer-executable instructions stored on the computer-readable storage devices are executed, they carry out various steps, methods and/or functionality, including the steps described above in regard to routines 200 and 400. Examples of computer-readable media include, but are not limited to: optical storage media such as Blu-ray discs, digital video discs (DVDs), compact discs (CDs), optical disc cartridges, and the like; magnetic storage media including hard disk drives, floppy disks, magnetic tape, and the like; memory storage devices such as random access memory (RAM), read-only memory (ROM), memory cards, thumb drives, and the like; cloud storage (i.e., an online storage service); and the like. For purposes of this disclosure, however, computer-readable media expressly excludes carrier waves and propagated signals.
Turning now to
The processor 502 executes instructions retrieved from the memory 504 in carrying out various functions, particularly in regard to responding to search queries with fresh product listing advertisements. The processor 502 may be comprised of any of various commercially available processors such as single-processor, multi-processor, single-core units, and multi-core units. Moreover, those skilled in the art will appreciate that the novel aspects of the disclosed subject matter may be practiced on various computers and/or computer system configurations, including but not limited to: mini-computers; mainframe computers, personal computers (e.g., desktop computers, laptop computers, tablet computers, etc.); handheld computing devices such as smartphones, personal digital assistants, and the like; microprocessor-based or programmable consumer electronics; game consoles, and the like.
The system bus 510 provides an interface for the various components to inter-communicate. The system bus 510 can be of any of several types of bus structures that can interconnect the various components (including both internal and external components). The exemplary computing system 500 also includes a network communication component 512 for interconnecting the computing system 500 with other computers, devices and services on a computer network, such as user computers 102-106, blog site 112, shopping site 114 and social networking site 116. The network communication component 512 may be configured to communicate with these other, external devices and services via a wired connection, a wireless connection, or both.
The exemplary computing system 500 includes a search results identifier 514 that determines the subject matter of the received search query and identifies one or more search results from a content store 526. Typically, though not exclusively, the one or more search results that are identified by the search results identifier 514 are associated with corresponding scores indicating the likelihood that the search result is relevant to the requesting user. In this manner (i.e., that the search results are scored) the identified search results may be thought of as an ordered list of search results, ordered according to their scores. The content store stores references to content (e.g., documents, images, web pages, etc.) available throughout the network 108. Typically, though not exclusively, the content store 526 is indexed according to a plurality of keys based on plurality of topics.
Another component of the exemplary computing system 500 is the user-generated content access component 516. The user-generated content access component 516 is configured to access the various network sites, via the network communication component 512, which may host user-generated content. When user-generated content is encountered, the content is then provided the user data extraction component 518 which identifies data within the user-generated content that pertains to a user. After identifying the data pertaining to a user, the information is passed to a user vector update component 520 that creates and/or updates the user vector corresponding to the user associated with the user data. The user vector is stored in a user vector data store 528, and subsequently retrieved from the user vector data store when the computing system 500 responds to a search query.
The ad selector 522 selects one or more ads (advertisements) from an ad store 530 to be included with the search results that are returned to the computer user in response to a search query. As those skilled in the art will appreciate, online search engines typically offer their search services to users as a “free” service: i.e., the user does not have to pay for the search queries that are submitted. However, to offer this “free” service, search engines typically include advertisements from one or more advertisers with the search results of a search query that returned to the user. Generally speaking, advertisers pay a search engine for including the advertisements in the search results. Selecting advertisements to be included with search results to a search query is known in the art. However, according to aspects of the disclosed subject matter, the ad selector 522 may select advertisements according to elements of the user vector associated with the requesting user. In this manner, the ad selector 522 personalizes the advertisement selection according to user generated content.
The search query interface 526 fields search queries from requesting users and, in response to a search query, identifies search results by way of the search results identifier 514. In addition to identifying the search results, the search query interface 526 personalizes the identified search results according to the requesting user by way of the personalization component 532. The personalization component obtains the user vector corresponding to the requesting user and updates the scores of the identified search results according to the information in the search vector (as described above in regard to routine 400 of
Those skilled in the art will appreciate that at least some of the various components of the exemplary computing system 500 of
While various novel aspects of the disclosed subject matter have been described, it should be appreciated that these aspects are exemplary and should not be construed as limiting. Variations and alterations to the various aspects may be made without departing from the scope of the disclosed subject matter.