Search engines allow users to locate relevant websites and other content. A search engine allows a user to submit a query and returns search results that are responsive to the query. Users may struggle to formulate queries that return search results that provide the information or services they are looking for. Some search engines suggest queries that a user can submit instead of writing their own query.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used in isolation as an aid in determining the scope of the claimed subject matter.
Embodiments of the present invention suggest search queries to a user while the user is typing in characters of a search query. The suggested search queries are based, in part, on the characters entered. The suggested queries are presented before the user submits the query to the search engine. The characters entered by the user before submitting the query are called a search prefix within this disclosure. The search prefix may comprise at least one character and may comprise one or more words entered into a search query box.
As mentioned, embodiments of the present invention present search queries to the user. The suggested queries may be displayed in a dropdown box that allows the user to select one of the suggested queries. The suggested queries comprise either equivalent queries or auto-complete queries. An auto-complete query begins with the search prefix. Equivalent search queries do not begin with the search prefix entered by the user. The equivalent query may be generated using an auto-complete query as input. The equivalent query is displayed to the user for possible selection.
Embodiments of the invention are described in detail below with reference to the attached drawing figures, wherein:
The subject matter of embodiments of the invention is described with specificity herein to meet statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventors have contemplated that the claimed subject matter might also be embodied in other ways, to include different steps or combinations of steps similar to the ones described in this document, in conjunction with other present or future technologies. Moreover, although the terms “step” and/or “block” may be used herein to connote different elements of methods employed, the terms should not be interpreted as implying any particular order among or between various steps herein disclosed unless and except when the order of individual steps is explicitly described.
Embodiments of the present invention suggest search queries to a user while the user is typing in characters of a search query. The suggested search queries are based, in part, on the characters entered. The suggested queries are presented before the user submits the query to the search engine. The characters entered by the user before submitting the query are called a search prefix within this disclosure. The search prefix may comprise at least one character and may comprise one or more words entered into a search query box.
As mentioned, embodiments of the present invention present search queries to the user. The suggested queries may be displayed in a dropdown box that allows the user to select one of the suggested queries. The suggested queries comprise either equivalent queries or auto-complete queries. An auto-complete query begins with the search prefix. Equivalent search queries do not begin with the search prefix entered by the user. The equivalent query may be generated using an auto-complete query as input. The equivalent query is displayed to the user for possible selection.
In one aspect, one or more computer-readable media having computer-executable instructions embodied thereon that when executed by a computing device perform a method of generating equivalent queries is described. The method comprises receiving a search prefix from a user. The search prefix is a group of characters entered by the user into a search interface. The search prefix is one or more characters in length and is one or more characters less than a complete search query. The method also comprises generating an auto-complete query that begins with the search prefix. The method also comprises generating an equivalent query that does not begin with the search prefix using the auto-complete query. The method further comprises displaying the equivalent query to the user before the user submits the complete search query.
In an additional aspect, a method of receiving a suggested equivalent query is described. The method comprises communicating a search prefix to a search interface. The method also comprises, before the search query is completed and submitted into the search interface, receiving an equivalent query that does not include the search prefix. The method also comprises displaying the equivalent query. The method also comprises receiving a selection of the equivalent query. The method also comprises receiving search results that are responsive to the equivalent query.
In another aspect, one or more computer-readable media having computer-executable instructions embodied thereon that when executed by a computing device perform a method of generating equivalent queries for autosuggestion is described. The method also comprises receiving a search prefix and generating an auto-complete query based on the search prefix. The method also comprises generating a plurality of equivalent queries using the auto-complete query as a basis. The plurality equivalent queries do not begin with the search prefix but share a subject matter with the auto-complete query. The method also comprises ranking individual equivalent queries within the plurality of equivalent queries.
Having briefly described an overview of embodiments of the invention, an exemplary operating environment suitable for use in implementing embodiments of the invention is described below.
Referring to the drawings in general, and initially to
The invention may be described in the general context of computer code or machine-useable instructions, including computer-executable instructions such as program components, being executed by a computer or other machine, such as a personal data assistant or other handheld device. Generally, program components, including routines, programs, objects, components, data structures, and the like, refer to code that performs particular tasks, or implements particular abstract data types. Embodiments of the invention may be practiced in a variety of system configurations, including handheld devices, consumer electronics, general-purpose computers, specialty computing devices, etc. Embodiments of the invention may also be practiced in distributed computing environments where tasks are performed by remote-processing devices that are linked through a communications network.
With continued reference to
Computing device 100 typically includes a variety of computer-storage media. By way of example, and not limitation, computer-storage media may comprise Random Access Memory (RAM); Read Only Memory (ROM); Electronically Erasable Programmable Read Only Memory (EEPROM); flash memory or other memory technologies; Compact Disk Read-Only Memory (CDROM), digital versatile disks (DVDs) or other optical or holographic media; magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. The computer-storage media may be nontransitory.
Memory 112 includes computer-storage media in the form of volatile and/or nonvolatile memory. The memory 112 may be removable, nonremovable, or a combination thereof. Exemplary memory includes solid-state memory, hard drives, optical-disc drives, etc. Computing device 100 includes one or more processors 114 that read data from various entities such as bus 110, memory 112 or I/O components 120. Presentation component(s) 116 present data indications to a user or other device. Exemplary presentation components 116 include a display device, speaker, printing component, vibrating component, etc. I/O ports 118 allow computing device 100 to be logically coupled to other devices including I/O components 120, some of which may be built in. Illustrative I/O components 120 include a microphone, joystick, game pad, satellite dish, scanner, printer, wireless device, etc.
An embodiment of the present invention provide auto-suggest queries to a user as the user is typing a query. The partially typed query may be described as a search prefix. A search prefix comprises at least one or more characters, but may be less characters than a completed query. For example, the search prefix may be one or more characters or words less than a complete query. In another embodiment, the search prefix comprises any combination of characters and words entered into a search field before the search is submitted. In this embodiment, the search prefix may be a full query just before it is submitted. Embodiments of the present invention generate two different types of auto-suggestion queries: auto-complete queries and equivalent queries. Auto-complete queries start with letters and characters entered by the user (e.g., the search prefix) and add letters and words to build a full query. In contrast, the equivalent query does not begin with the search prefix and may not include the search prefix at all. Equivalent queries and auto-complete queries are illustrated in
Turning now to
The search interface 200 also includes a series of suggested queries 230. Each of the suggested queries 230 may be selected by a user and submitted to the search engine to return search results. The suggested queries 230 may be selected instead of completing a query.
The first suggested query is “movie” 232. Movie 232 is an auto-complete query since it begins with the search prefix “movi” 212. In one embodiment, the highest ranked suggested query is shown at the top of the suggested queries 230. The query rank may be determined using a combination of different factors. In one embodiment, the most frequently submitted query that starts with the search prefix, as determined by mining search session logs, is given the highest rank. In other embodiments, user characteristics including recent browsing history, demographic information, and other facts, may be used to rank a suggested query.
The next suggested query is for “moviefone” 234. The next suggested query is for “movies.com” 236. The next suggested query is “movie reviews” 238. As can be seen, suggested queries 232, 234, 236, and 238 all begin with the prefix “movi” 212. Each of these may be described as auto-complete query suggestion.
The two query suggestions at the end of the suggested queries 230 do not begin with the prefix “movi” 212. Instead, they represent equivalent queries or equivalent query suggestions. The next query suggestion shown is “fandango” 240 followed by “new movies” 242. Both of these query suggestions are related to movies. While related to movies, they do not begin with the search prefix 212 entered by the user. In this example, the equivalent queries are listed at the end of the suggested queries 230.
Embodiments in the present invention are not limited to displaying the equivalent queries at the end of the list. Several different display arrangements are possible. For example, equivalent queries may be interspersed with auto-complete queries, based on their overall ranking. When the equivalent queries are submitted at the end of a list of auto-suggestions, they may be put there based on different criteria. For example, the last two spaces could be reserved for the top two equivalent queries. In other cases, a backfill method may be used to insert equivalent queries when below a threshold number of auto-complete queries is generated.
Turning now to
The search engine 302 receives search queries and presents search results to the user. The search engine may include crawlers that explore available content and create an index that may be used to identify relevant content in response to search queries. The search queries, results shown in response to the search queries, and user interactions with these results may be stored within the search data store 306. The search data store 306 may also include the previously mentioned search indices, as well as other datasets generated by components shown or not shown in
The auto-suggest component 304 receives a search prefix. The search prefix includes characters submitted by a user in a search interface prior to selecting or submitting the search. The prefix may be less than a full word or as little as a single letter. In other embodiments, the prefix may include multiple words. In another embodiment, the prefix may include a few words as well as an incomplete word. The auto-suggest component 304 generates suggested queries and presents these to the user for possible selection.
In one embodiment, the auto-suggest component 304 generates auto-complete-suggested queries. The auto-complete-suggested queries begin with the search prefix submitted. As additional characters are entered by a user, the prefix may change and the auto-suggest component 304 may change the suggested queries in accordance with the additional characters received. The auto-suggest component 304 may also display one or more equivalent queries as suggestions to the users. As mentioned, the equivalent queries do not begin with the prefix. While they do not begin with the prefix, the equivalent could include the prefix. The auto-complete component may generate a group of auto-complete queries and a group of equivalent queries in advance. The auto-complete query may be looked up based on the search prefix and the equivalent query may then be looked up based on the auto-complete query.
The equivalent-query component 308 generates an equivalent query based on a prefix received by the search engine 302. The equivalent-query component 308 may generate the equivalent query using a number of different processes. These processes include a random walk process, session mining process, and a related suggestions stream. Each of these methods is described in more detail below.
The random walk process relies on a click log to identify related queries. The click log records the search results that are clicked on in response to a query. The click log may be taken from search data store 306. Two queries are determined to be related if they result in clicks on the same URL. For example, the queries “Barack Obama bio” and “President Obama bio” both result in clicks on the U.S. President's home page. Accordingly, both queries would be related to one another. The higher the ratio of co-clicks to all clicks from either query, the stronger the link between the query pair.
In one embodiment, the random walk process starts with generating a click graph that maps relationships between queries. The click graph may require a certain number of clicks on common URLs before creating a link, or relationship, between queries. In other embodiments, a single click on a common URL is enough to create the relationship. In another embodiment, instead of using clicks in general, the click graph is built using satisfied clicks. A satisfied click occurs when a user clicks on a URL and remains on the resulting content for greater than a threshold period of time, for example 30 seconds. The satisfied click indicates that the user is satisfied with the result. By inference, the result is responsive to the query. If satisfied clicks are used, the click graph may be called a satisfied click graph.
Embodiments of the present invention attempt to find an equivalent query that does not begin with the search prefix entered. Accordingly, when traversing a click graph to identify queries related to an input query, queries that begin with the search prefix may be excluded from a potential list of equivalent queries. In another embodiment, the filter is based on only the first letter of the search prefix. For example if the search prefix was MOVI as described previously in
Once a list of related queries that do not begin with the prefix (or some portion of the prefix) are generated, the strength of the relationship between the input query and the potential equivalent queries may be determined. In one embodiment, various potential equivalent queries are stack ranked based on the strength of the relationship. As mentioned, the more satisfied clicks in common, the stronger the relationship.
So the random walk process finds related queries given an input query. In embodiments of the invention, the input query may be an auto-complete query generated based on the search prefix. In one embodiment, the auto-complete queries are those that start with the search prefix and are most frequently submitted to the search engine according to the data in the search logs. Thus, given the auto-complete, query related queries may be identified. The related queries are then narrowed by excluding queries that begin with the prefix or have weak relationships with the input query.
The session mining process is a second method of generating an equivalent query Like the random walk process, the session mining process also starts with an auto-complete query generated from the search prefix. The auto-complete query used in the session mining process may be the highest-ranked auto-complete query. A session may be defined as the search activity occurring over a continuous period of time. For example, a single search session for a user may be the search activities that occur within ten minutes. The search activities may include submitting search queries and clicking on search results.
Embodiments of the present invention aggregate large amounts of session data to identify refined query pairs. A query pair comprises two search queries that enter consecutively submitted within a search session. Embodiments of the present invention may define a search session as a time period. For example, all queries entered within ten minutes may be part of the same search session. Thus, if a first query “airlines” is entered and the next query entered during that user's search session is “British Airlines” one minute later, then “airlines” and “British Airlines” would form a query pair. These qualify as a pair because they were consecutively entered within a search session.
Embodiments of the present invention further filter these query pairs to identify refined query pairs. In general, refined query pairs suggest that the second query is an improvement, or refinement, of the first query in the pair. The refined query pairs are identified when over a threshold number of satisfied clicks occur on the second query in the pair. The number of satisfied clicks that form the threshold can be established a number of different ways. For example, the threshold can be a percentage of times the second query in the pair produces satisfied clicks. For example, if a query pair occurs 100 times within the corpus of session log data, and the threshold was 10%, then in ten of the pairs, a satisfied click must occur in response to the second query in the pair.
In another embodiment, the satisfied click threshold is a number, such as 50. In this case, 50 satisfied clicks would need to occur in response to the second query in the pair regardless of how many times the query pairs occur within the corpus of session data. In yet another embodiment, the difference between satisfied clicks occurring in response to the first query in the pair and in response to the second query in the pair is considered. Again, the difference could be defined as a percentage or as an absolute number. The general concept is that the second query should receive more satisfied clicks if it is actually an improvement over the first query in the pair.
Other rules may be included to further define a refined pair. For example, pairs may only qualify as refined when they both include at least one word in common. In another example, a pair is only considered as refined when the user clicked on a search result presented in response to the second query, but not the first query. In other words, pairs initially identified where clicks, or satisfied clicks, occurred on both queries in the pair may be excluded as refined pairs.
The related suggestions process is a third method of generating an equivalent query that utilizes a search engine's related suggestion stream. The related suggestion stream provides suggested or related queries. In some cases, existing search engines use a related suggestions stream to identify related searches after a search query is submitted. The related suggestions are provided along with the search results. An embodiment of the present invention first completes a query from the search prefix to identify a likely full query. An equivalent query is determined by retrieving queries that are related to the completed query from the related suggestion data stream. Those related queries that begin with the search prefix may be excluded as equivalent queries.
Turning now to
The search interface 400 includes a search input box 410. As can be seen, the prefix, “how to write a” 412, is included in the input box 410. A group 415 of suggested queries is displayed to the user. The first suggested query 420 states “how to write a resume.” The second suggestion is “resume writing” 422. As can be seen, the second suggestion 422 is an equivalent suggestion that does not begin with the prefix 412. In this case, the equivalent suggestion 422 could be based on the first auto-complete query 420 which deals with the subject of resumes.
The suggested queries go on to include “how to write a song” 426, “how to write a book” 428, “resume builder” 430, and “how to write a letter” 432. As can be seen, resume builder 430 is also an equivalent query that is interspersed with the auto-suggest queries. This illustrates the interspersed method of displaying equivalent queries. In one embodiment, the auto-complete queries and the equivalent queries are ranked based on the satisfied clicks received in response to the query. A satisfied click is a click on a search result where the user stays on the search result for a period of time. This is different than a simple click, which does not have a time component. The satisfied click indicates that the user was satisfied with the search result and, by inference, that the search results are responsive to the query submitted. In one embodiment, the first query shown is always an auto-complete query and the remaining spaces allow auto-complete queries and equivalent queries to be interspersed according to rank.
Turning now to
At step 510, a search prefix is received from a user. As mentioned, the search prefix may be inserted into an input or search box on a search website or a search utility. The search prefix is a group of characters entered by the user into the search interface. For example, the search prefix may be a few letters or a combination of words and letters. The search prefix is one or more characters less than a complete search query. Further, the search prefix is not submitted to the search engine and the user may continue to modify this search prefix by adding additional letters until a completed search query is created.
At step 520, an auto-complete query that begins with the search prefix is generated. The auto-complete query may be generated by analyzing session log data for previously submitted queries. In one embodiment, the auto-complete query is the most frequently occurring query in the query log that begins with the search prefix. The number of satisfied clicks received in response to a query may also be considered when determining the auto-complete query. In other embodiments, the auto-complete query may be selected based on user demographics or current search session data that allows an inference to be made about a user's intention. In this case, the auto-complete query may be selected based on an inferred intention rather than just the most commonly occurring query within the search session logs. In either case, the auto-complete query begins with the search prefix and is a complete query that could be submitted to the search engine.
At step 530, an equivalent query that does not begin with the search prefix is generated using the auto-complete query as input. In one embodiment, the equivalent query is not generated at runtime. A group of equivalent queries that correspond to one or more potential auto-complete queries may be generated before any search prefixes are received and stored for later use. Once the auto-complete query is generated, the equivalent queries may be obtained by performing a lookup against the group of auto-complete queries. The equivalent query is one that “matches” the auto-complete query. Various methods of determining an equivalent query have been described previously. These methods include the random walk process, the session mining process, and using a related query's data stream. These methods are applicable whether equivalent queries are calculated at runtime or in advance.
At step 540, the equivalent query is output for display to the user before the user submits the completed search query. There are several different ways for the equivalent query to be displayed to the user. In one embodiment, the equivalent query is displayed in a dropdown box beneath the search input box. The equivalent query could be displayed along with the auto-complete query and other suggested queries. The other suggested queries may be additional equivalent queries or additional auto-complete queries.
The equivalent query may be displayed a number of different ways. The equivalent queries could be displayed at the beginning or at the end of a series of suggested search queries. For example, the first two or the last two queries in a series of suggested queries may be designated for equivalent queries. In another embodiment, the equivalent queries are interspersed with auto-complete queries, as in
In another embodiment, additional equivalent queries are presented when less than a viable number or threshold number of auto-complete queries are generated. This method may be described as the backfill method as equivalent queries are added when not enough auto-complete queries exist. In one embodiment, the equivalent query does not include the search prefix at any point within the equivalent query.
Turning now to
At step 620, before a search query is submitted into the search interface, an equivalent query that does not include the search prefix is obtained. The equivalent query may be based on the search prefix and determined using methods described previously, such as the random walk and session mining procedures. In this case, the equivalent query does not include the search prefix anywhere within the equivalent query. For example, in
At step 630, the equivalent query is displayed to the user. Various methods of displaying the equivalent query to the user have been described previously. These methods include interspersing the equivalent query with auto-complete queries, displaying the equivalent query at the front or back of a group of other suggested queries, and intermingling the equivalent queries among other queries.
At step 640, a selection of the equivalent query is obtained. The selection may occur by a user taking a pointing device and clicking on the equivalent query. In another embodiment, the selection occurs when a user touches a touch screen with a finger or stylus above where the equivalent query is displayed. Upon receiving the selection, the equivalent query may be communicated to a search engine associated with the search interface.
At step 650, search results that are responsive to the equivalent query are received. The responsive search results may then be displayed to the user. In one embodiment, as additional characters are received in the search box, and the search prefix is subsequently changed, new equivalent queries are received and displayed to the user.
Turning now to
At step 730, a plurality of equivalent queries are generated using the auto-complete query as an input. The equivalent queries do not begin with the search prefix and share a subject matter with the auto-complete query. The plurality of equivalent queries may be generated using the random walk methodology described previously, the related suggestions data stream, or their refinement query pairs generated through a search session analysis.
At step 740, individual equivalent queries within the plurality of equivalent queries are ranked. The ranking may be based on the number of clicks or satisfied clicks received on search results presented in response to the query. These clicks may be determined based on an analysis of search log data. Other methods of ranking individual equivalent queries may also be used. In one embodiment, individual equivalent queries above a certain rank are made available to display to the user as a suggested query. In one embodiment, the rank must be higher than the lowest available auto-complete query available or generated. In another embodiment, the rank of individual equivalent queries is compared to a rank calculated using a similar methodology for an auto-complete query and those query suggestions with the highest rank are displayed to the user.
Embodiments of the invention have been described to be illustrative rather than restrictive. It will be understood that certain features and subcombinations are of utility and may be employed without reference to other features and subcombinations. This is contemplated by and is within the scope of the claims.