The disclosed embodiments relate generally to search engines for locating documents in a computer network (e.g., a distributed system of computer systems), and in particular, to a system and method for accelerating a desired search by providing query suggestions in response to a partial query provided by a user.
Search engines provide a powerful tool for locating documents in a large database of documents, such as the documents on the World Wide Web (WWW) or the documents stored on the storage devices of an Intranet. The documents are located in response to a query submitted by a user. A query typically consists of one or more query terms. To reduce its latency in response to a search request by a user, a search engine may generate a list of predicted queries based on a partial query entered by the user. The user may select a desired one from the ordered list of predicted queries, or may complete the partial query if, e.g., none of the predicted queries corresponds to the query that the user intends to submit.
In accordance with some embodiments described below, a computer-implemented method is performed at a server system. The server system receives, respectively, a first character string from a first user and a second character string from a second user. There are one or more differences between the first and second character strings. The server system obtains from a plurality of previously submitted complete queries, respectively, a first set of predicted complete queries corresponding to the first character string and a second set of predicted complete queries corresponding to the second character string. There are one or more identical queries in both the first and second sets. The server system conveys at least a first subset of the first set to the first user and at least a second subset of the second set to the second user. Both the first subset and the second subset include a respective identical query.
In some embodiments, a computer system for processing query information includes one or more central processing units for executing programs, and memory to store data and programs to be executed by the one or more central processing units. The programs include instructions for receiving, respectively, a first character string from a first user and a second character string from a second user, wherein there are one or more differences between the first and second character strings; instructions for obtaining from a plurality of previously submitted complete queries, respectively, a first set of predicted complete queries corresponding to the first character string and a second set of predicted complete queries corresponding to the second character string, wherein there are one or more identical queries in both the first and second sets; and instructions for conveying at least a first subset of the first set to the first user and at least a second subset of the second set to the second user, wherein both the first subset and the second subset include a respective identical query.
In some embodiments, a computer readable-storage medium stores one or more programs for execution by one or more processors of a respective server system. The one or more programs include instructions for receiving, respectively, a first character string from a first user and a second character string from a second user, wherein there are one or more differences between the first and second character strings; instructions for obtaining from a plurality of previously submitted complete queries, respectively, a first set of predicted complete queries corresponding to the first character string and a second set of predicted complete queries corresponding to the second character string, wherein there are one or more identical queries in both the first and second sets; and instructions for conveying at least a first subset of the first set to the first user and at least a second subset of the second set to the second user, wherein both the first subset and the second subset include a respective identical query.
In accordance with some embodiments described below, a computer-implemented method is performed at a client device. The client device receives from one or more users of the client device, respectively, a first character string and a second character string. There are one or more differences between the first and second character strings. The client device obtains from a remote server system, respectively, a first set of predicted complete queries corresponding to the first character string and a second set of predicted complete queries corresponding to the second character string. There are one or more identical queries in both the first and second sets. The client device displays to the users of the client device at least a first subset of the first set to the first user and at least a second subset of the second set to the second user. Both the first subset and the second subset include a respective identical query.
In some embodiments, a client system includes one or more central processing units for executing programs, and memory to store data and programs to be executed by the one or more central processing units, the programs including instructions for receiving from a search requestor a partial query. The programs further include instructions for receiving, respectively, a first character string and a second character string, wherein there are one or more differences between the first and second character strings; obtaining from a remote server system, respectively, a first set of predicted complete queries corresponding to the first character string and a second set of predicted complete queries corresponding to the second character string, wherein there are one or more identical queries in both the first and second sets; and displaying at least a first subset of the first set to and at least a second subset of the second set to a respective user of the client device, wherein both the first subset and the second subset include a respective identical query.
In some embodiments, a computer readable-storage medium stores one or more programs for execution by one or more processors of a client device. The one or more programs include instructions for receiving, respectively, a first character string and a second character string, wherein there are one or more differences between the first and second character strings; obtaining from a remote server system, respectively, a first set of predicted complete queries corresponding to the first character string and a second set of predicted complete queries corresponding to the second character string, wherein there are one or more identical queries in both the first and second sets; and displaying at least a first subset of the first set to and at least a second subset of the second set to a respective user of the client device, wherein both the first subset and the second subset include a respective identical query.
The aforementioned embodiment of the invention as well as additional embodiments will be more clearly understood as a result of the following detailed description of the various aspects of the invention when taken in conjunction with the drawings. Like reference numerals refer to corresponding parts throughout the several views of the drawings.
Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. While the invention will be described in conjunction with the embodiments, it will be understood that the invention is not limited to these particular embodiments. On the contrary, the invention includes alternatives, modifications and equivalents that are within the spirit and scope of the appended claims. Numerous specific details are set forth in order to provide a thorough understanding of the subject matter presented herein. But it will be apparent to one of ordinary skill in the art that the subject matter may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the embodiments.
The system 100 may include one or more client systems or devices 102 that are located remotely from a search engine 108. A respective client system 102, sometimes called a client or client device, may be a desktop computer, laptop computer, kiosk, mobile phone, personal digital assistant, or the like. A communication network 106 connects the client systems or devices 102 to the search engine 108. As a user (also called a search requestor herein) inputs a query at a client system 102, the client assistant 104 transmits at least a portion of the user-entered partial query to the search engine 108 before the user has completed the query. An embodiment of a process performed by the client assistant 104 is described below in connection with
As further described herein, the system 100 and its functional components have been adapted so as to handle partial queries in multiple languages in a unified manner. The system 100 has been adapted so as to provide predicted queries based on the user's actual input at the client system 102, regardless of the language coding of the partial query transmitted by the client assistant 104 to the search engine 108. This is particularly useful, e.g., where a user has input a partial query using an incorrect input method editor setting at the client system 102.
The search engine 108 includes a query server 110, which has a module 120 that receives and processes partial queries and forwards the partial queries to a prediction server 112. In some embodiments, the query server 110, in addition, receives complete search queries and forwards the complete search queries to a query processing module 114. The prediction server 112 is responsible for generating a list of predicted complete queries corresponding to a received partial query. An embodiment of the process performed by the prediction server 112 is described below in connection with
Some languages or dialects such as Mandarin Chinese and Korean have a well-accepted phonetic representation scheme among their users. For example, this scheme in Mandarin Chinese is called “Pinyin” and every Chinese character has an official phonetic representation (or romanization) in a particular context. When a user inputs Chinese characters using the Pinyin scheme, any typographical error would result in either a different set of characters than expected or nothing at all (or perhaps an error message). But a widely-adopted standard or official scheme may not exist in some other languages or dialects. For example, Cantonese is a Chinese dialect that uses the same Chinese characters in writing as Mandarin, but often has significantly different pronunciations for the same character. For historical reasons, there is no scheme like Pinyin that is universally accepted by Cantonese speakers. As a result, different persons may choose different phonetic representations for the same character in Cantonese.
For example, for the Chinese character “,” some Cantonese speakers prefer the phonetic representation of “tak” while some others prefer the phonetic representation of “dak.” In other words, the relationship between a Chinese character and its corresponding Cantonese phonetic representations is one-to-many even in the same context. The language-specific model file 128 for Cantonese is a data structure that defines one or more phonetic representations and their respective popularities among Cantonese speakers for a Chinese phrase or a single Chinese character. With this data structure, it is possible to predict what the corresponding Chinese character(s) should be in response to a user-entered phonetic representation in the form of a Latin character string and also to make query suggestions based on the predicted Chinese character(s).
Referring to
The user survey data 154-1 may be collected by setting up a software application such as a web-based application. Cantonese speakers are invited to visit the application and provide their preferred phonetic representations of Chinese phrases/characters. A backend application analyzes these user inputs and generates a statistical model for each phrase or character. Other ways of collecting the user survey data include regular email messages soliciting inputs from Cantonese speakers. A more detailed description of an embodiment of the user survey data analysis is provided below in connection with
Sometimes, the resulting statistical model might be affected by the population size and demographic distribution of those Cantonese speakers that contribute to the user survey data 154-1. Other data sources such as the custom data 154-3 and third-party data 154-5 can be used to improve the quality and completeness of the language-specific model file 128.
One type of custom data 154-3 is Hong Kong geographical data. For example, many locations in Hong Kong have both a Chinese name like “” and an English name like “Tsim Sha Tsui” that is a phonetic representation of the corresponding Chinese name. In this application, the phonetic representation of a Chinese phrase or character in Cantonese is also referred to as “Kongping.” Because combinations like this one have been used for decades and are widely used among Cantonese speakers in Hong Kong, both the individual Kongpings and the Kongping combinations in the Hong Kong geographical data 154-3 are given added weight when generating the language-specific model file 128. Stated another way, the Kongping custom data is generally considered to be highly accurate for multi-character Chinese phrases, and in most cases Cantonese speakers also prefer the individual Kongpings in the custom data 154-3 even when the corresponding Chinese characters are used in other combinations. In some embodiments, the language model builder 152 often gives added weight to the custom data 154-3 when it is inconsistent with the user survey data 154-1 with respect to a particular Chinese phrase or character.
The third-party data may be obtained from documents accessible via the Internet. In some embodiments, a software application such as a classifier is configured to analyze web pages and look for (Chinese phrase, Kongping) pairs in tables or listings having recognized formats, for example:
In some embodiments, the classifier first identifies a pattern of multiple (e.g., two to five) Chinese characters in proximity with multiple (e.g., two to five) Kongpings and then determines if there is a possible one-to-one mapping between a respective Chinese character and the corresponding Kongping by looking up the known Kongpings for the Chinese character in the language-specific (Cantonese) model file 128.
In other words, as shown in
Let
F(user, jp, kp), which is the user's frequency of using kp for jp, can be defined as:
F(user, jp, kp)=K(user, jp, kp)/T(user, jp).
Using the formula above, the user survey data 201 is converted into the frequency data 203 shown in
G(jp, kp)=[F(user1, jp, kp)+F(user2, jp, kp)+ . . . +F(userN, jp, kp)]/N.
In other words, G(jp, kp) indicates the popularity of a particular Kongping kp when the corresponding Jyutping is jp. As shown in
Finally, H(C, kp), i.e., the popularity score of Kongping kp for a Chinese character C, is defined as follows:
H(C, kp)=w1G(jp1, kp)+w2G(jp2, kp)+ . . . wMG(jpM, kp),
wherein:
As shown in
In some embodiments, the language model builder 152 builds each entry in the data structure by merging different types of data from various sources. Each type of data i is given a respective weight ri based on the authenticity of the corresponding data source. For example, the custom data 154-3 is generally given a higher weight than the user survey data 154-1 and the 3rd-party data 154-5 if it is derived from a long-established data source such as Hong Kong map data.
Let
The overall popularity score of the Kongping kp associated with the Chinese phrase/character C is defined as follows:
P(C, kp)=(r1H1(C, kp)+r2H2(C, kp)+ . . . +rnHn(C, kp))/(r1H1(C)+r2H2(C)+ . . . +rnHn(C)).
The Cantonese model builder 152 populates the data structure of the language-specific model file 128 with the overall popularity scores determined using the formula above. For each query identified in the query log 124, 126, the ordered set builder 142 generates a set of candidate Kongping prefixes by looking up entries in the model file 128.
In some embodiments, the model file 128 stores entries for individual Chinese characters like “,” “,” and “” as well as entries for Chinese phrases like “.” By doing so, the model file 128 can provide more context-dependent information with regard to the Kongping of a particular Chinese character. As noted above, one Chinese character may have different pronunciations in different phrases. Having an entry corresponding to a Chinese phrase and its Kongping popularity score distribution in the model file 128 makes it easier to associate a less popular Kongping with a character when the character is part of a special phrase. In some embodiments, the resulting model file 128 is stored in a compressed format to save storage space.
In some embodiments, using the model file 128 and the query logs 124, 126, the ordered set builder 142 constructs one or more query completion tables 130. As further illustrated below, the one or more query completion tables 130 are used by the prediction server 112 for generating predictions for a partial query. Each entry in the query completion tables 130 stores a query string and additional information. The additional information includes a ranking score, which may be based on the query's frequency in the query logs, date/time values of when the query was submitted by users in a community of users, and/or other factors. The additional information for the query optionally includes a value indicating the language of the complete query. Each entry in a respective query completion table 130 represents a predicted complete query associated with a partial query. Furthermore, in some embodiments a group of predicted complete queries associated with the same prefix are stored in a query completion table 130 sorted by frequency or ranking score. Optionally, the query completion tables 130 are indexed by the query fingerprints of corresponding partial search queries, where the query fingerprint of each partial query is generated by applying a hash function (or other fingerprint function) to the partial query. In some embodiments, the predicted complete queries are stored in the one or more query completion tables 130 in their original languages (e.g., Chinese and English).
As noted above, for a given Chinese query the model file 128 may not have any corresponding Kongping. In this case, the ordered set builder 142 has to synthesize one or more Kongpings for the query.
In some embodiments, the ordered set builder 142 performs the synthesis by multiplying the popularity scores of the respective sub-queries that together form the complete query 352. Because each of the three sub-queries has two Kongpings, eight synthesized Kongpings are generated (378). Next, the ordered set builder 142 generates candidate Kongping prefixes for the query “” using the eight synthesized Kongpings and their associated popularity scores (380). For a particular language such as Cantonese, the ordered set builder 142 defines minimum and maximum length limits for the prefix. In some embodiments, these parameters are user-configurable. The minimum length limit is typically 2 or 3 characters, but may be set as low as 1 in some embodiments. The maximum length limit is typically 15 to 20 characters, but there is no reason other than cost that the maximum length limit cannot be significantly larger than 20 characters. In some embodiments, the ordered set builder 142 first concatenates the Kongpings into a single string by removing the delimiters, e.g., “lau tak wah din ying” into “lautakwandinying.” Assuming that the minimum and maximum length limits are 3 to 5 characters, the ordered set builder 142 calculates the sum of the popularity scores of all the eight Kongpings for the candidate prefix “lau” (i.e., 1) and then the sum of the popularity scores of the first four Kongpings for the candidate prefix “laut” (i.e., 0.7), etc. Next, the ordered set builder 142 filters out those the candidate prefixes whose popularity scores are below a predefined limit, e.g., 0.5 (382). As a result, only three prefixes, “lau,” “laut,” and “lauta,” are kept. The ordered set builder 142 then inserts the three prefixes, the Chinese query “” and its associated ranking score 38 into the query completion table (386).
Note that each Chinese character has a specific pronunciation and therefore an associated phonetic representation (e.g., Pinyin in Mandarin and Kongping in Cantonese). A user who enters a query in Kongping may separate the Kongpings of different Chinese characters by a space “ ”, an underline “_”, a hyphen “-”, or other delimiter. So in some embodiments, besides the concatenated phonetic characters (e.g., Kongpings) shown in the table 382 of
Referring to
The search engine 108 receives the partial query for processing (405) and proceeds to make predictions as to the user's contemplated complete query (407). First, the search engine 108 applies a hash function (or other fingerprint function) (409) to create a fingerprint 411 of the partial query. The search engine 108 performs a lookup operation (413) using the fingerprint 411 to locate a query completion table 130 that corresponds to the partial query. The lookup operation includes searching in the query completion table 130 for a fingerprint that matches the fingerprint 411 of the partial query. The query completion table 130 may include a plurality of entries that match or correspond to the partial query, and the fingerprint 411 is used to locate the first (or last) of those entries. The lookup operation (413) produces a set of predicted complete queries that correspond to the received partial query.
Each entry in the query completion table includes a predicted complete query and other information such as the frequency or ranking score for the predicted complete query. The search engine 108 uses the information to construct an ordered set of complete query predictions (415). In some embodiments, the set is ordered by frequency or ranking score. The search engine 108 then returns at least a subset of the predicted complete queries (417) to the client which receives the ordered predicted complete queries (419). The client proceeds to display at least a subset of the ordered predicted complete queries (421).
Note that the ordered set of predicted complete queries may include queries in multiple languages, since the partial query received at 405 can potentially match query entries in different languages in the query completion table 130 corresponding to the fingerprint 411. The search engine 108 can be configured to return mixed language predicted complete queries or can be configured to select whichever language is more likely to predict the partial query.
In some embodiments, either prior to ordering the predicted complete queries (415) or prior to conveying the predicted complete queries to the client (417), the set of predicted complete queries is filtered to remove queries, if any, matching one or more terms in one or more predefined sets of terms. For example, the one or more predefined sets of terms may include English terms and Cantonese terms that are considered to be objectionable, or culturally sensitive, or the like. The system performing the method may include, stored in memory, one or more tables (or other data structures) that identify the one or more predefined sets of terms. In some other embodiments, the set of predicted complete queries conveyed to the client (417) are filtered at the client by the client assistant 104 to remove queries, if any, matching one or more terms in one or more predefined sets of terms. Optionally, a plurality of different filters may be used for a plurality of different groups of users. In some embodiments, run time filtering (performed in response to a partial query) is used in place of filtering during the building of the query completion tables.
When a user input or selection is identified as a completed user input, the completed user input is transmitted to a server for processing (451). The server returns a set of search results, which is received by the client assistant 104 or by a client application, such as a browser application (453). In some embodiments, the browser application displays at least part of the search results in a web page. In some other embodiments, the client assistant 104 displays the search results. Alternately, the transmission of a completed user input (451) and the receipt (453) of search results may be performed by a mechanism other than the client assistant 104. For example, these operations may be performed by a browser application using standard request and response protocols (e.g., HTTP).
A user input may be identified by the client assistant 104 (or by a browser or other application) as a completed user input, in a number of ways such as when the user enters a carriage return, or equivalent character, selects a “find” or “search” button in a graphical user interface (GUI) presented to the user during entry of the query, or by selecting one of a set of predicted queries presented to the user during entry of the query. One of ordinary skill in the art will recognize a number of ways to signal the final entry of the query.
Prior to the user signaling a completed user input, a partial query may be identified. For example, a partial query is identified by detecting entry or deletion of characters in a text entry box. Once a partial query is identified, the partial query is transmitted to the server (433). In response to the partial query, the server returns predictions, including predicted complete search queries. The client assistant 104 receives (435) and presents (e.g., displays, verbalizes, etc.) at least a subset of the predictions (437).
After the predicted complete queries are presented to the user (437), the user may select one of the predicted complete search queries if the user determines that one of the predicted complete queries matches the user-intended entry. In some instances, the predictions may provide the user with additional information that had not been considered. For example, a user may have one query in mind as part of a search strategy, but seeing the predicted complete queries causes the user to alter the input strategy. Once the set is presented (437), the user's input is again monitored (431). If the user selects one of the predictions, the user input is transmitted to the server (451) as a complete query (also herein called a completed user input). After the request is transmitted, the user's input activities are again monitored (431).
In some embodiments, the client assistant 104 may preload additional predicted results (each of which is a set of predicted complete queries) from the server (439). The preloaded predicted results may be used to improve the speed of response to user entries. For example, when the user enters <ban>, the client assistant 104 may preload the prediction results for <bana>, . . . , and <bank>, in addition to the prediction results for <ban>. If the user enters one more character, for example <k>, to make the (partial query) entry <bank>, the prediction results for <bank> can be displayed without transmitting (433) the partial query to the server and receiving (435) predictions.
In some embodiments, one or more sets of predicted results are cached locally at the client. When the user modifies the current query to reflect an earlier partial input (e.g., by backspacing to remove some characters), the set of predicted results associated with the earlier partial input is retrieved from the client cache and again presented again to the user instead of the partial input being sent to the server.
In some embodiments, after receiving the search results or document for a final input (453), or after displaying the predicted complete search queries (437), and optionally preloading predicted results (439), the client assistant 104 continues to monitor the user entry (431) until the user terminates the client assistant 104, for example, by closing a web page that contains the client assistant 104. In some other embodiments, the client assistant 104 continues to monitor the user entry (431) only when a text entry box (discussed below with reference to
Referring to
In some embodiments, a search engine 108 may receive queries in one language (e.g., English) at a much higher submission frequency than queries in other languages (e.g., Chinese). As a result, certain Chinese queries like “,” although very popular among a particular community of users (e.g., people in Hong Kong), have a far lower ranking score than many English queries that match the partial query “la.” Thus, in some embodiments, the ranking scores of the queries in different languages are adjusted by increasing the ranking scores of those queries written in a local language used by the community of users or decreasing the ranking scores of those queries written in other languages and rarely used by the community of users. By doing so, Chinese queries like “” may appear at or near the top of a list of predicted complete queries.
When the length of the partial query is at least the size of one chunk, C, the partial query (e.g., “lauta” or “lauda”) is decomposed into a prefix 484 and a suffix 486, whose lengths are governed by the chunk size. A fingerprint is generated for the prefix 484, for example by applying a hash function 409 to the prefix 484, and that fingerprint is then mapped to a respective “chunked” query completion table 130-2 or 130-3 by a fingerprint to table map 483-1 or 483-2. In some embodiments, each chunked query completion table 130-2 or 130-3 is a set of entries in a bigger query completion table, while in other embodiments each chunked query completion table is a separate data structure. Each entry 488-p or 490-q of a respective query completion table includes a query string 494, which is the text of a complete query in a corresponding language, and may optionally include a popularity score 498 as well, used for ordering the entries in the query completion table. Each entry of a chunked query completion table includes the suffix of a corresponding partial query. The suffix 496 in a respective entry has a length, S, which can be anywhere from zero to C−1, and comprises the zero or more characters of the partial query that are not included in the prefix 484. In some embodiments, when generating the query completion table entries for a historical query, only one entry is made in a respective chunked query completion table 130 that corresponds to the historical query. In particular, that one entry contains the longest possible suffix for the historical query, up to C−1 characters long. In other embodiments, up to C entries are made in each chunked query completion table 130 for a particular historical query, one for each distinct suffix.
Optionally, each entry in a respective query completion table 130 includes a language value or indicator 492, indicating the language associated with the complete query. However, a language value 492 may be omitted in embodiments in which all the query strings are stored in the query completion tables 130 in their original language.
As shown in
In some embodiments, the search engine 108 maintains multiple copies of a partial query in Kongping in the query completion tables, some without the space delimiter “ ” and others with the delimiter. In some embodiments, the different copies of the same partial query point to the same list of predicted complete queries (e.g., 470-5). In some other embodiments, the different copies are treated as different partial queries and each one has its own list of predicted complete queries.
Referring to
At a minimum, the client assistant 104 transmits partial query information to a server. The search assistant may also enable the display of prediction data including the predicted complete queries, and user selection of a displayed predicted complete query. In some embodiments, the client assistant 104 includes the following elements, or a subset of such elements:
The transmission of final (i.e., completed) queries, receiving search results for completed queries, and displaying such results may be handled by the client application/browser 520, the client assistant 104, or a combination thereof. The client assistant 104 can be implemented in many ways.
In some embodiments, a web page (or web pages) 522 used for entry of a query and for presenting responses to the query also includes JavaScript or other embedded code, for example a Macromedia Flash object or a Microsoft Silverlight object (both of which work with respective browser plug-ins), or instructions to facilitate transmission of partial search queries to a server, for receiving and displaying predicted search queries, and for responding to user selection of any of the predicted search queries. In particular, in some embodiments the client assistant 104 is embedded in the web page 522, for example as an executable function, implemented using JavaScript (trademark of Sun Microsystems) or other instructions executable by the client 102. Alternately, the client assistant 104 is implemented as part of the client application 520, or as an extension, plug-in or toolbar of the client application 520 that is executed by the client 102 in conjunction with the client application 520. In yet other embodiments, the client assistant 104 is implemented as a program that is separate from the client application 520.
In some embodiments, a system for processing query information includes one or more central processing units for executing programs and memory to store data and to store programs to be executed by the one or more central processing units. The memory stores a set of complete queries previously submitted by a community of users, ordered in accordance with a ranking function, the set corresponding to a partial query and including both English language and Chinese language complete search queries as well as queries in other languages. The memory further stores a receiving module for receiving the partial query from a search requestor, a prediction module for associating the set of predicted complete queries to the partial query, and a transmission module for transmitting at least a portion of the set to the search requestor.
Memory 606 may include high speed random access memory and may also include non-volatile memory, such as one or more magnetic or optical storage disks, flash memory devices, or other non-volatile solid state storage devices. The high speed random access memory may include memory devices such as DRAM, SRAM, DDR RAM or other random access solid state memory devices. Memory 606 may optionally include mass storage that is remotely located from CPU's 602. Memory 606, or alternately the non-volatile memory device(s) within memory 606, comprises a computer readable storage medium. Memory 606 or the computer readable storage medium of memory 606 stores the following elements, or a subset of these elements, and may also include additional elements:
The query server 110 may include the following elements, or a subset of these elements, and may also include additional elements:
The query processing module (or instructions) 114 receives, from the query server 110, complete search queries, and produces and conveys responses. In some embodiments, the query processing module (or instructions) includes a database that contains information including query results and optionally additional information, for example advertisements associated with the query results.
The prediction server 112 may include the following elements, a subset of these elements, and may also include additional elements:
The ordered set builder 142 may optionally include one or more filters 640.
It should be understood that in some other embodiments the server system 600 may be implemented using multiple servers so as to improve its throughput and reliability. For instance the query logs 124 and 126 could be implemented on a distinct server that communicates with and works in conjunction with other ones of the servers in the server system 600. As another example, the ordered set builder 208 could be implemented in separate servers or computing devices. Thus,
Although the discussion herein has been made with reference to a server designed for use with a prediction database remotely located from the search requestor, it should be understood that the concepts disclosed herein are equally applicable to other search environments. For example, the same techniques described herein could apply to queries against any type of information repository against which queries, or searches, are run. Accordingly, the term “server” should be broadly construed to encompass all such uses.
Although illustrated in
In another embodiment, the client assistant 104 may include a local version of the prediction server 112, for making complete query predictions based at least in part on prior queries by the user. Alternately, or in addition, the local prediction server may generate predictions based on data downloaded from a server or remote prediction server. Further, the client assistant 104 may merge locally generated and remotely generated prediction sets for presentation to the user. The results could be merged in any of a number of ways, for example, by interleaving the two sets or by merging the sets while biasing queries previously submitted by the user such that those queries would tend to be placed or inserted toward the top of the combined list of predicted queries. In some embodiments, the client assistant 104 inserts queries deemed important to the user into the set of predictions. For example, a query frequently submitted by the user, but not included in the set obtained from the server could be inserted into the predictions.
Operations shown in flow charts, such as in
As shown in
In some embodiments, the user who enters the partial query is identified as a Cantonese speaker. For example, the user can make this representation by specifying his or her preferred language to be Cantonese in the user profile submitted to the search engine. Alternatively, the search engine may infer the user's language preference based on the IP address of the client device that submits the partial query. In other words, a partial query from a client computer in Hong Kong indicates that the user who enters the query may be a Cantonese speaker. In yet some embodiments, the search engine may designate that the partial queries submitted to a particular website are from Cantonese speakers. For example, it is assumed that most of the users of the website (http://www.google.com.hk) are located in Hong Kong or at least related to Hong Kong in some way and they are more likely to enter Kongping since most of them are Cantonese speakers.
In the example shown in
In the example shown in
In the example shown in
Moreover, an embodiment of the present invention puts no restriction on the location of the differences between different strings. For example, as shown in
In the example shown in
Although some of the various drawings illustrate a number of logical stages in a particular order, stages which are not order dependent may be reordered and other stages may be combined or broken out. While some reordering or other groupings are specifically mentioned, others will be obvious to those of ordinary skill in the art and so do not present an exhaustive list of alternatives. Moreover, it should be recognized that the stages could be implemented in hardware, firmware, software or any combination thereof.
The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated.
Number | Date | Country | Kind |
---|---|---|---|
61183932 | Jun 2009 | US | national |
This application is related to co-pending, commonly-assigned U.S. Utility patent application Ser. No. 10/987,295, “Method and System for Autocompletion Using Ranked Results,” filed on Nov. 11, 2004, Ser. No. 10/987,769, “Method and System for Autocompletion for Languages Having Ideographs and Phonetic Characters,” filed on Nov. 12, 2004, and Ser. No. 12/188,163, “Autocompletion and Automatic Input Method Correction for Partially Entered Query,” filed on Aug. 7, 2008, the contents of which are incorporated by reference herein in their entireties.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/CN2010/073498 | 6/3/2010 | WO | 00 | 3/14/2012 |