A keyword or phrase is a word or set of terms submitted by a user to a search engine when searching for a related web page/site on the World Wide Web. Search engines determine the relevancy of a web site based on the keywords and keyword phrases that appear on the page/site. Because a significant percentage of web site traffic results from use of search engines, proper keyword/phrase selection is vital to increasing site traffic to obtain desired site exposure. In general, promoters (e.g., advertisers) try to identify and select as many keywords as possible to increase site traffic. Techniques to identify keywords relevant to a web site for search engine result optimization include, for example, evaluation by a human being of web site content and purpose to identify relevant keyword(s). This evaluation may include the use of a keyword popularity tool. Such tools determine how many people submitted a particular keyword or phrase including the keyword to a search engine. Keywords relevant to the web site and determined to be used more often in generating search queries are generally selected for search engine result optimization with respect to the web site. Another typical technique for identifying keywords includes a computerized keyword suggestion tool that provides a list of keywords related to an input keyword. For example, the input keyword “car” may yield “car accessories,” “luxury cars,” etc. Each keyword identified by such a system is typically in the same language as the input keyword.
After identifying and selecting a set of keywords for search engine result optimization of the web site, a promoter may desire to advance a web site to a higher position in the search engine's results (e.g., as compared to displayed positions of other web site search engine results). To this end, the promoter bids on the keyword(s) to indicate how much the promoter will pay each time a user clicks on the promoter's listings associated with the keyword(s). In other words, keyword bids are pay-per-click bids. The larger the amount of the keyword bid as compared to other bids for the same keyword, the higher (e.g., more prominently with respect to significance) the search engine will display the associated web site in search results based on the keyword.
Embodiments of the invention provide multilingual keyword identification and selection. In response to an input keyword in one language from a user, one or more related keywords (e.g., translation candidates) in another language are identified. In one embodiment, the invention generates a list of the translation candidates as a function of the input keyword by applying morphological changes to the input keyword, translating the input keyword, and transliterating the input keyword. The translation candidates are presented and validated to the user for review and selection. The input keyword may relate to, for example, goods and/or services.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Other features will be in part apparent and in part pointed out hereinafter.
Corresponding reference characters indicate corresponding parts throughout the drawings.
In an embodiment, the invention provides cross-language suggestion of related keywords.
The process and system illustrated in
The exemplary operating environment illustrated in
Although described in connection with an exemplary computing system environment, aspects of the invention are operational with numerous other general purpose or special purpose computing system environments or configurations. The computing system environment is not intended to suggest any limitation as to the scope of use or functionality of aspects of the invention. Moreover, the computing system environment should not be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment. Examples of well known computing systems, environments, and/or configurations that may be suitable for use in embodiments of the invention include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, mobile telephones, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
Embodiments of the invention may be described in the general context of computer-executable instructions, such as program modules, executed by one or more computers or other devices. Generally, program modules include, but are not limited to, routines, programs, objects, components, and data structures that perform particular tasks or implement particular abstract data types. Aspects of the invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
Referring next to
The method illustrated in
In one alternative embodiment, a click-through model is used to rank the translation candidates. For example, the translation candidates are ranked based on how many people selected each of the translation candidates. Another alternative to the ME model includes linear interpolation of the ranking criteria (e.g., linear regression and machine leaming).
The list of keywords is presented to the user for selection at 210. That is, the original input keyword is displayed, the related keywords in the original (e.g., first) language are displayed, and the related keywords in the target (e.g., second) language are displayed. In one alternative embodiment, the method selects one or more of the keywords for the user and presents the selected keywords. For example, the method may present the top five keywords in the ranking.
In another embodiment, the method identifies and presents keywords in the first language related to the input keyword to expand the list of translation candidates. In such an embodiment, there is no one-to-one mapping between the related keywords in the first language and the related keywords in the second language. These related keywords may be stored in unilingual related keyword tables. The related keywords in the first language may be determined or identified before, during, or after identifying the translation candidates. Determining related keywords in both the first and second languages (e.g., generating keyword clusters) improves the results of the method because there may not be a direct translation for the input keyword or a determined, related keyword in the first language (e.g., as determined by generating a keyword cluster in the first language). With the knowledge that one keyword whose context is known is related to another keyword, the context of the other keyword may be inferred. For example, with “voiture de luxe” as the input keyword and “Porsche” as a keyword determined to be related to the input keyword, the method translates “voiture de luxe” into “luxury car” but fails to directly translate “Porsche.” However, by combining the two unilingual related keyword tables, the method infers that “Porsche” is related to “luxury car.”
In one embodiment, one or more computer-readable media have computer-executable instructions for performing the method illustrated in
Referring next to
These results are then ranked (e.g., by an ME model) at 314 and the top results are determined. In this example, the term “product pharmaceutical” was ranked the lowest among the translation candidates and removed from the list. Keyword clusters are generated for the input French keyword at 318 and the English translation candidates at 316. The top translation candidates from 314, the French keyword cluster from 318, and the English keyword cluster from 316 are presented to the user as an expanded cross-language related keywords mapping list. From this list, the user may select particular keywords (in English) to use to promote a good or service associated with the input keyword.
Referring next to
An alternative procedure for identifying, ranking, and selecting keywords using web mining is shown in Appendix B. An example of the alternative procedure is also included in Appendix B.
Hardware, software, firmware, computer-executable components, computer-executable instructions, and/or the contents of
The order of execution or performance of the operations in embodiments of the invention illustrated and described herein is not essential, unless otherwise specified. That is, the operations may be performed in any order, unless otherwise specified, and embodiments of the invention may include additional or fewer operations than those disclosed herein. For example, it is contemplated that executing or performing a particular operation before, contemporaneously with, or after another operation is within the scope of aspects of the invention.
Embodiments of the invention may be implemented with computer-executable instructions. The computer-executable instructions may be organized into one or more computer-executable components or modules. Aspects of the invention may be implemented with any number and organization of such components or modules. For example, aspects of the invention are not limited to the specific computer-executable instructions or the specific components or modules illustrated in the figures and described herein. Other embodiments of the invention may include different computer-executable instructions or components having more or less functionality than illustrated and described herein.
When introducing elements of aspects of the invention or the embodiments thereof, the articles “a,” “an,” “the,” and “said” are intended to mean that there are one or more of the elements. The terms “comprising,” “including,” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements.
As various changes could be made in the above constructions, products, and methods without departing from the scope of aspects of the invention, it is intended that all matter contained in the above description and shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.
A maximum entropy (ME) model may be used in one embodiment to rank the translation candidates. The ME model ranks the translation candidates with the following features.
1. The Chi-Square of translation candidate C and the input English named entity E is shown in (1) below.
where:
a=the number of web pages containing both C and E
b=the number of web pages containing C but not E
c=the number of web pages containing E but not C
d=the number of web pages containing neither C nor E
N=the total number of web pages, i.e., N=a+b+c+d
In this example, N is set to 4 billion, but the value of N does not affect the ranking once it is positive. The model combines C and E as a query to search a search engine for Chinese web pages. And the result page contains the total page number containing both C and E which is a. Then C and E are used as queries respectively to search the web to get the page numbers Nc and Ne. So b=Nc−a and c=Ne−a and d=N−a−b−c.
The process of ranking the translation candidates obtained from the dictionary or other source and selecting the translation candidates from this ranking through web mining is shown below. The process includes the following operations.
A. Format the query translation candidates obtained from the dictionary using a Boolean query.
B. Limit the search region using the source query otherwise the search engine returns only the most popular term combinations.
C. Search the structure query in a web search engine and set the returned result language type as the original language. Get the top 100 snippets from the search results.
D. Use an algorithm to analyze the top 100 snippets and get the top 50 term phrases sorted by phrase frequency.
E. Filter the term phrase and keep the phrase that contains exact one word for each word in the target language query.
F. If there is at least one phrase after filtering go to operation G, else go to operation H.
G. Get the translation candidates and terminate.
H. Enumerate all the possible combinations of translation candidates and re-format the query as (a) target language query+one candidate and (b) “+candidate+” for every candidates of the combinations.
I. Search the two queries for each candidate in a web search engine and get the count number returned by the search engine. J. Rank the candidates according to the combination of its two count number for each candidate.
Alpha*Count(a)+(1−Alpha)*Count(b) . . . (1)
(Alpha=0.6, for example)
K. Return the top five translation candidates as the final result.
The following example illustrates the above exemplary procedure. In this example, the original language is French and the target language is English. The French query is “pages jaunes” and translation candidates from a dictionary include “page;hansard/yellow;yolk”. The Boolean query in operation A above is ((Page OR hansard) AND (yellow OR yolk)). The query from operation B above includes ‘“pages jaunes”+((Page OR hansard) AND (yellow OR yolk))’. After searching the structure query in a web search engine, retrieving the top 100 snippets from the search results, and using an algorithm to obtain the top 50 term phrases, the following phrases are obtained in this example: main page; yellow pages; yellow page; home page; blank page; white page. The translation result returned to the user is “yellow pages; yellow page”.
In another example, the French query may be “fermer cette liste” and the translation candidates include “close; closing; shut; fasten/this; it; these; those/list; roll; register”. The Boolean Query is ((close OR closing OR shut OR fasten)AND(this OR it OR these OR those)AND(list OR roll OR register)). With the algorithm in operation D above, there is no result after filtering in operation F. In operation H, the translation candidates are enumerated to include the following: close this list, close it list, close these list, close those list, closing this list, closing it list, close these list, etc. The query is re-formatted as “fermer cette liste+close this list” and “close this list”. An exemplary count for “fermer cette liste+close this list” is 688 and an exemplary count for “close this list” is 1390. The two counts are combined and the candidates are ranked in operation J above.