The present disclosure relates to search technology, and in particular, to methods and apparatuses of providing suggested terms.
With the rapid development of the Internet, electronic commerce has been widely integrated into the daily lives of people. In applications involving electronic commerce, searching by inputting search keywords is not only the main method and means for users to find and locate products that are of interest to them, but also a basic function that is most frequently used by the users. In order to quickly find and locate a desired product, a user needs to select an appropriate search keyword to describe his/her search objective.
Generally, users are accustomed to performing searches starting from abstraction to specificity. For instance, the user first inputs relatively general search keywords, then gradually narrows down the search scope by using more specific search keywords, and ultimately locates specific products.
In some cases, specialty products tend to have complicated and obscure spellings. Users may only manage to remember the beginning parts of search keywords, but forget the remaining parts thereof, thus requiring the users to locate respective desired products through multiple queries. Furthermore, inputting search keywords repetitively or repeatedly is a tedious process that not only reduces search efficiency but is also prone to input errors.
As shown in
As the number of products of various types in e-commerce websites continues to grow, it is increasingly more time consuming to use conventional search processes involving entry of keywords when trying to find a desired product. Accordingly, there is a need for improved techniques for providing suggested terms, which builds upon existing technologies, to increase search efficiency associated with an e-commerce site and enhance service performance of the associated e-commerce system.
The embodiments of the present disclosure provide techniques for providing suggested terms in keyword search processes in a way that improves search efficiency while overcoming problems associated with conceptual vagueness of suggested terms in existing technologies.
In one aspect of the present disclosure, a method of providing suggested terms is disclosed. The method may include receiving an initial query input from a user, and obtaining a suggested query corresponding thereto based on the initial query. The method may determine at least two categories corresponding to the suggested query, and at least two clickable regions usable for querying the suggested query. In one embodiment, the method may separately determine a category weight associated with each determined category in each clickable region for the suggested query, and a click attribute weight associated with each clickable region. The method may further separately compute a degree of confidence of each category for the suggested query based on the category weight associated with each category, and the click attribute weight associated with each clickable region. The method may determine target categories of the suggested query based on the degree of confidence of each category for the suggested query. The method may then display the suggested query and the target categories.
In another aspect of the present disclosure, an apparatus of providing a suggested term is provided. The apparatus may include an acquisition unit to receive an initial query input from a user, and obtain a suggested query corresponding thereto based on the initial query. Furthermore, the apparatus may include a first determination unit to determine at least two categories corresponding to the suggested query, and at least two clickable regions usable for querying the suggested query. In one embodiment, the apparatus may further include a second determination unit. The second determination unit separately determines a category weight associated with each determined category in each clickable region for the suggested query, and a click attribute weight associated with each clickable region. Furthermore, the apparatus may include a computation unit to separately compute a degree of confidence of each category for the suggested query based on the category weight associated with each category, and the click attribute weight associated with each clickable region. A display unit may further be included and used for determining target categories of the suggested query based on the degree of confidence of each category for the suggested queries, and displaying the suggested query and the target categories.
In certain embodiments of the present disclosure, a dictionary of suggestions is established based on user query logs and category suggestions are based on a user click log. Therefore, in response to obtaining suggested queries based on an initial query (a query keyword) that is input from a user, a system may determine a target category for each suggested query based on the user's existing click behavior, and display the suggested queries and corresponding target categories at the same time. Accordingly, a guiding intention of each suggested query is displayed to the user based on the target categories, allowing the user to quickly determine his/her search intention based on the target categories of the suggested queries. This avoids interference from unrelated suggested queries, and thereby effectively improves the speed of information searching. Furthermore, the system takes advantage of performing a search under a target category corresponding to a suggested query selected by the user as opposed to performing searches under all categories. The amount of information to be searched is therefore greatly reduced, thus further improving the speed of information searching while reducing the processing workload of an associated server. The present disclosure may be applied in electronic products such as computers, wireless communications devices, etc.
The embodiments of the present disclosure will be described hereinafter in conjunction with the attached figures.
Dictionaries play an important role in completing query inputs. All suggested terms are generated using the dictionaries. For example, if a user enters “pho”, suggested terms prefixed with “pho”, such as “phone”, “photo”, “photo frame”, “photo album”, etc., may be obtained by looking up a dictionary.
One process that may be used to construct a dictionary is given as follows:
1. Input a query log of a user;
2. Pre-process the query log of the user, which includes elimination of illegible characters, standardization of punctuation writing, correction of spelling mistakes (a user may enter a wrong search keyword due to a typing error), and conversion of plurals into singular forms, etc. Upon pre-processing, these search keywords form a candidate term set;
3. Select a candidate term from the candidate term set generated in step 2;
4. Extract and remove the leftmost letter from the candidate term. For example, extract the letter “p” from a candidate term “phone” so that the candidate term becomes “hone” after the first letter is removed;
5. Add the candidate term “phone” to a set of suggested terms that have the first letter “p”;
6. Repeat steps 4 and 5 until all the letters of the candidate term are extracted;
7. Add the candidate term “phone” to a suggested term set corresponding to “phone”;
8. Repeat steps 3-7 until the candidate term set is empty;
9. Complete construction of a suggested term dictionary.
The space available for displaying suggested terms on an e-commerce site is limited, and may only display a limited number of suggested terms. However, the number of suggested terms that match a search keyword input by a user is generally far greater than that limit. Therefore, a certain number of suggested terms having the highest “quality” are to be selected for display.
In the present embodiments, a precedence level is used to measure the quality of a suggested term—the higher the precedence level is, the better the quality will be. Specifically, an ordering is first performed using degrees of matching between suggested terms and a search keyword. If the first word of a suggested term matches the search keyword, a match position is “0”. If the second word is matched, then the match position is “1”, and so forth. The precedence level is higher if the match position is nearer to the beginning. For example, if “phone” is entered, the suggested term “phone case” is better than “mobile phone”, because the match position of the former one is 0, while the match position of the latter one is 1.
In the field of electronic commerce, each e-commerce product is classified into a particular category (or multiple categories). A category in the e-commerce field is a product classification corresponding to a product. For example, a category corresponding to mobile phones might be “communications equipment”, and a category corresponding to cameras might be “digital products”, and so forth. Query behavior of a user is usually related to a particular category. The embodiments of the present disclosure therefore relate the suggested terms with categories, and recommend them jointly to the user. As such, the user can select a category to filter away some interference factors. These interference factors correspond to suggested terms that are irrelevant to the search purpose of the user. The search efficiency of the system is therefore improved.
Under normal circumstances, upon entering a search keyword on an e-commerce website, a user may click and browse certain products in a non-navigational region of a web page, or click a category in a navigational region of the web page. Therefore, a relationship between the search keyword (i.e., a suggested term) and a category may be learned from a query log of the user. The techniques of the present disclosure define, as attributes, click behavior associated with an offer (i.e., click behavior associated with product information displayed in the non-navigational region of the web page) and click behavior associated with an e-commerce navigational region. The techniques employ linear models for fusion. The linear models include an offer click model and a navigational region click model respectively. A framework of the fusion is shown in
First, two functions are respectively defined as follows.
Based on the functions defined above, a click attribute model for a web page associated with an offer may be represented in Equation (1):
Equation (1) represents a characteristic function “f” for an attribute extracted for an offer. For an offer, given a query (a query term, represented by x in the function) and cat' (category), the function can take on one of two values: one or zero (which is the value of an attribute). y in the characteristic function is defined as the click1 function. Given a query, the value of the function is one when click1(offer,query)=cat' for that query, and is zero otherwise. Using this function, an offer is allowed to be converted into an attribute space. This attribute space indicates categories of product information that the user has clicked thereon in the web page associated with the Offer after he/she has entered a query (or multiple queries).
Based on the functions defined above, a navigational region click attribute model may be represented in Equation (2):
Equation (2) represents a characteristic function “f” for an attribute extracted for a navigational region. Given a query (a query term, represented by x in the function) and a category, the function takes on one of two values: one or zero (which correspond to a value scope of an attribute value). y in the characteristic function is defined as the click2 function. Given a query, an attribute value for a category in a navigational region may be computed to be one if click2 (query)=cat″, and is zero otherwise. Using this function, an attribute space may be generated based on a query and a category of a navigational region. This attribute space indicates which categories the user has clicked thereon within the navigational region after he/she has entered a query (or multiple queries).
Click data associated with the offer and click data associated with the navigational region may be used as training data. Through this training, category weights of each category under click attributes of the offer and click attributes of the navigational region may be obtained. Alternatively, these may also be referred to as category weights of each category under clickable regions of the offer and clickable regions of the navigational region. Alternatively, these may be interpreted as, for a specific query, probabilities that a user clicks on each category within the clickable regions of the offer, and probabilities that the user clicks on each category within the clickable regions of the navigational region. Specifically, weights may be defined as:
1) As shown in Equation (3), category weights in a clickable region of an offer are:
where “offer_cnt” represents, for a specific query, a total number of clicks associated with an offer with a category being cat' among the click data associated with the offer. The element “catj” represents a certain predetermined category. In practical applications, a great number of products on an e-commerce site are classified into a particular category, for example, “fruits”. “j” is used to label different categories.
For example, if a given query is “apple”, and the user has clicked 75 offers under a category “fruits” and 25 offers under a category “electronics”, then g1 (“apple”, “fruits”)=0.75, and g1 (“apple”, “electronics”)=0.25;
2) As shown in Equation (4), category weights in a clickable region of a navigational region are:
where “sn_cnt” represents, for a specific query, a total number of clicks associated with category cat″ among the click data associated with the navigational region. The label “j” is used to label different categories. If there exist category 1, category 2, category 3, . . . , category n, j=1, 2, . . . , n, which allow computation of a total number of clicks under all categories for a particular query.
For example, a given query is assumed to be “apple”, and two categories, category 1: “fruits” and category 2: “electronics”, are displayed in a navigational region. For the query “apple”, if the total number of clicks for category 1 in the navigational region is 75, and the total number of clicks for category 2 in the navigational region is 25, then g2 (“apple”, “fruits”)=0.75, and g2 (“apple”, “electronics”)=0.25.
As shown in
Based on the foregoing embodiments, a final operation of determination combines click attributes corresponding to all clickable regions. Specifically, click weights w are needed to discriminate between the click attributes corresponding to the clickable regions. Therefore, a gating process is introduced to evaluate a degree of importance of each attribute, i.e., computing w. Specifically, as shown in
As can be seen from the settings of the above functions, g represents a degree of importance of a particular click attribute with respect to an outputted category. The variable w represents relative degrees of importance between click attributes.
In practical applications, if the training data is tagged, w may be obtained using maximum likelihood estimation (MLE) training. Indeed, the parameter g may not be needed in this situation (but the parameter g may be used as a click attribute value, which is no longer has a value of zero or one), and parameters of the attributes can be trained directly. If the training data is not tagged, w can be set by using the degrees of confidence associated with the click attributes corresponding to the clickable regions (or referred to as degrees of confidence of the clickable regions). For example, in a clickable region of an offer, W1 corresponding to a click attribute of the offer is set as: ω1=1−perror, where perror represents an error rate when determination is performed based on the click attribute of the offer. The value of ω of the center NP can be set to be a similarity value between itself and an original query.
Based on the functions defined above, according to the embodiments of the present disclosure as shown in
Block 500 receives an initial query input by a user, and obtains corresponding suggested queries based on the initial query. In this embodiment, due to incompleteness of the initial query, upon receiving the initial query input from the user, the search apparatus needs to complete the initial query using a predetermined dictionary in order to obtain corresponding suggested queries, i.e. obtaining corresponding suggested terms based on the initial query. For example, if the user inputs “pho”, the search apparatus may obtain suggested terms (i.e., suggested queries) prefixed with “pho”, such as, “phone”, “photo”, “photo frame”, “photo album”, etc., by looking up a dictionary. For another example, if the user enters “app”, the search apparatus may look up the dictionary to obtain a suggested query “apple”. Still another example, if the user enters “apple”, the search apparatus may obtain suggested queries “apple phone”, “apple MP3”, etc., by searching the dictionary. The following embodiments will assume the initial query entered by the user to be “app” and the suggested term to be “apple” that is obtained by the search apparatus after completing the initial query based on the dictionary as an example.
Block 510 separately determines at least two categories corresponding to the suggested queries, and at least two clickable regions usable for looking up the suggested queries. In this embodiment, assume that two categories correspond to “apple” are “fruits” and “electronics” respectively, and two clickable regions are usable for looking up the suggested query, with one being an offer web page, and the other being a navigational region.
Block 520 determines a category weight g for each category in each clickable region and a click attribute weight w for each clickable region. In this embodiment, when determining a category weight g for any category (referred to as category x) in any clickable region (referred to as region x), it is computed using the following approach: determining a corresponding category weight g, i.e., a category weight for the category x within the region x, based on a ratio between a total number of clicks corresponding to the category x within the region x for the suggested query and a total number of clicks corresponding to all categories within the region x for the suggested query. Specific details of the computation can be referenced to Equation (3) and Equation (4), and are not redundantly repeated herein.
Further, a method of determining a click attribute weight w for any clickable region is given as follows. If the training data is tagged, w is obtained using maximum likelihood estimation. If the training data is not tagged, w is set using a corresponding degree of confidence of any clickable region of the above. The specific setting methods have been described in the foregoing embodiments, and are not redundantly repeated herein.
The values of the aforementioned parameters g and w may be determined and stored by the administrator in advance, and may be updated in real time based on a change in user data, or computed in real time based on current user data in response to obtaining a suggested query.
For example, for the suggested query “apple”, the system obtains statistics about user click behavior, finding that the number of user clicks under the category “fruits” within the region of the web page associated with the offer is seventy-five times, and the number of user clicks under the category “electronics” within the region of the web page associated with the offer is twenty-five times. In this case, g1 (“apple”, “fruits”)=0.75, and g1 (“apple”, “electronics”)=0.25. In the navigational region, the number of user clicks is eighty times under the category “fruits”, and is twenty times under category “electronics”. As such, g2 (“apple”, “fruits”)=0.8, and g2 (“apple”, “electronics”)=0.2.
Further, if the accuracy of predicting categories of a query using the offer click model is 80%, the click attribute weight w1 for the web page associated with the offer is set to be 0.8. If the accuracy of predicting categories of a query using the navigational region click model is 60%, then the click attribute weight w2 for the navigational region is set to be 0.6.
Block 530 separately computes a degree of confidence h of each category for the suggested queries based on the category weight g for each category under each clickable region, and the click attribute weight w for each clickable region.
In this embodiment, Equation (5) is used for computing the degree of confidence of any category for the suggested query:
h(x,y) is used as a degree of confidence of y for x;
x represents the suggested query;
y represents a characteristic function for a category, e.g., click1(offer, query) or click2(query). For a certain category, if the suggested query is present, the value of y is one. If the suggested query is not present, the value of y is zero. As the present embodiment computes h(x,y) for categories that exist, y may be rendered as any category of an object to be computed.
ωi represents a click attribute weight of a clickable region i;
k represents the number of clickable regions;
gi represents a category weight of category y within a clickable region i for the suggested query;
fi (x,y) represents a click attribute corresponding to the clickable region i. With reference to Equation (1) and Equation (2), fi(x,y) takes a value of one if the suggested query is present under category y. Equation (5) is calculated specifically for a correspondence relationship between the suggested query and y. Therefore, the value of fi (x,y) is one. Apparently, the computation of fi (x,y) can be integrated into the computation of gi(x,y);
Z represents a normalization factor, ΣyΣi=1kωigi(x,y)fi(x,y).
In this embodiment, if k=2, the possible values for i are 1 and 2. For instance, in the example of Block 520, Z may be computed as:
Z=(0.8×0.75+0.6×0.8)+(0.8×0.25+0.6×0.2)=1.4;then
h(“apple”,“fruits”)/Z=(0.8×0.75+0.6×0.8)/1.4=77.14%;
h(“apple”,“electronics”)/Z=(0.8×0.25+0.6×0.2)/1.4=22.86%.
Block 540 separately determines target categories for the suggested queries based on the degrees of confidence h of each category for the suggested queries, and displays the suggested query and respective target categories. In this embodiment, implementations of Block 540 may include, but are not limited to, the following:
1. Categories having a degree of confidence greater than a set threshold are rendered as target categories for the suggested queries, and the suggested queries are displayed in a descending order of the degrees of confidence of the target categories. For example, the two target categories corresponding to the query “apple” are the category “fruit” of which the degree of confidence is 77.14%, and the category “electronics” of which the degree of confidence is 22.86%. Both categories have the degree of confidence greater than a set threshold of 20%. Therefore, when displaying suggested term “apple”, the category “fruits” will be displayed first, followed by the category “electronics”. For example,
2. Categories having a degree of confidence greater than a set threshold are rendered as target categories for the suggested queries, and the suggested queries are displayed in groups based on types of the target categories. For example, for the initial query “apple”, its suggested queries “apple mobile phone”, “apple MP3” and “apple headphones” correspond to the category “mobile phones” (with degree of confidence as 56%), and the category “digital media players” (with degree of confidence as 44%) respectively, whose degrees of confidence are greater than the set threshold of 20%. Therefore, when displaying the above suggested queries, they will be displayed in groups according to different target categories. For example,
In practical applications, many flexible display methods may be emerged along with the expansion of business. The above two methods are examples for illustration only.
Further, when employing a suggested query selected by the user for further search, the system may perform a search under corresponding target category as opposed to searching under all the possible target categories, thus effectively reducing the amount of information to be searched and further improving the search efficiency.
The acquisition unit 602 is used for receiving an initial query input by a user. A suggested query corresponding to the input query is then obtained.
The first determination unit 604 determines at least two categories corresponding to the suggested query and at least two clickable regions usable for looking up the suggested query.
The second determination unit 606 separately determines a category weight associated with each obtained category in each clickable region for the suggested query and a click attribute weight associated with each clickable region.
The computation unit 608 separately computes a degree of confidence of each category for the suggested query based on the category weight associated with each obtained category and the click attribute weight associated with each clickable region.
The display unit 610 separately determines target categories for the suggested query based on the degree of confidence of each category for the suggested query and displays the suggested query and the target categories.
In short, the embodiments of the present disclosure establish a dictionary of suggestions based on a user query log, and develop category suggestions based on a user's click log. Therefore, in response to obtaining corresponding suggested queries based on an initial query (a query keyword) input from a user, a system may determine a target category for each suggested query based on the user's existing click behavior, and display the suggested queries and corresponding target categories at the same time. Accordingly, a guiding intention of each suggested query is displayed to the user based on the target categories, allowing the user to quickly determine his/her search intention based on the target categories of the suggested queries. This avoids interference from unrelated suggested queries, and thereby effectively improves the speed of information searching. Furthermore, the system takes advantage of performing a search under a target category corresponding to a suggested query that is selected by the user as opposed to performing searches under all categories. Amount of information to be searched is therefore greatly reduced, thus further improving the speed of information searching, and reducing the processing workload of an associated server. The present disclosure may be applied in electronic products such as computers, wireless communication devices, etc.
The memory 703 may include computer-readable media in the form of volatile memory, such as random-access memory (RAM) and/or non-volatile memory, such as read only memory (ROM) or flash RAM. The memory 703 is an example of computer-readable media.
Computer-readable media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules, or other data. Examples of computer storage media includes, but is not limited to, phase change memory (PRAM), static random-access memory (SRAM), dynamic random-access memory (DRAM), other types of random-access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information for access by a computing device. As defined herein, computer-readable media does not include transitory media such as modulated data signals and carrier waves.
The memory 703 may include program units 705 and program data 706. In one embodiment, the program units 705 may include an acquisition unit 707, a first determination unit 708, a second determination unit 709, a computation unit 710 and a display unit 711. Details about these program units and any sub-units and/or modules thereof may be found in the foregoing embodiments described above.
It is noted that one skilled in the art can alter or modify the disclosed method, system and apparatus in many different ways without departing from the spirit and the scope of this disclosure. Accordingly, it is intended that the present disclosure covers all modifications and variations which fall within the scope of the claims of the present disclosure and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
201110138955.X | May 2011 | CN | national |
This application is a national stage application of an international patent application PCT/US12/39426, filed May 24, 2012, which claims priority to Chinese Patent Application No. 201110138955.X, filed on May 26, 2011, entitled “Method and Device for Providing Suggested Terms”, which applications are hereby incorporated by reference in their entirety.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US12/39426 | 5/24/2012 | WO | 00 | 7/13/2012 |