1. Field of the Invention
The present invention relates generally to the field of data processing, and more particularly to query formulation for information retrieval.
2. Description of the Related Art
A search engine is used to perform a search to retrieve information accessed from web sites and services from the Internet. A user engages the search engine to perform a search through a query that contains one or more search terms. The results of the query rely on the selection of the search terms used in the query. Misspelled terms or typographical errors in a query often produce poor results since these search terms do not retrieve information pertinent to the user's query.
In order to improve the search results, a search engine may expand the query by including additional search terms in the query. These additional search terms may come from a dictionary or a thesaurus and identify synonyms for the search terms used in the query. The additional search terms are often a broader set of terms than originally intended so that the search produces a larger set of results. However, a larger set of results may not retrieve information of interest to the user and often results in excessive search time to scan the results for relevant information.
Accordingly, the choice of search terms used in a query is an important factor in generating search results that produce relevant information.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
A question conversion engine converts search terms used in a user's question into an expression that is more likely to produce pertinent search results. The question conversion engine may change the semantics of the question into an expression that contains search terms that will appear in the search results.
The question conversion engine utilizes a question parsing procedure to parse a user's question and attribute a part of speech identifier to each term and/or phrase used in the user's question. The more pertinent terms and phrases are identified. A phrase replacement procedure utilizes a term/replacement map to replace certain terms and phrases in the user's question with other terms and phrases that are more likely to produce relevant results. The resulting expression is then used by a search engine to search for the desired information.
The term/replacement map contains replacement phrase rules having a left hand side phrase that when matched is replaced by a right hand side phrase. The replacement phrase rules may be automatically generated from searches of sets of question and answer pairs, such as frequently asked questions (FAQ) and answers. A phrase replacement procedure may be utilized to analyze FAQs and answers to determine the most frequently used terms that appear in search results although not in a question.
These and other features and advantages will be apparent from a reading of the following detailed description and a review of the associated drawings. It is to be understood that both the foregoing general description and the following detailed description are explanatory only and are not restrictive of aspects as claimed.
The communications network 104 facilitates communications between a computing device 102 and a server 106. The communications network 104 may be a local-area network, a wide-area network, or any combination thereof. In several embodiments, the communications network 104 may be the Internet. A server 106 may be any type of electronic computing device that is dedicated to running a service. A server 106 may be a web server, an application server, a file server, a database server, a web site, and the like.
The computing device 102 may include a question conversion engine 108 and a search engine 110. The question conversion engine 108 receives a user's question 120 and converts the question into a declarative expression rather than a question. The search engine 110 receives the declarative expression and searches for documents that contain the search terms in the declarative expression. By converting the question into a declarative expression, the question conversion engine 108 alters the semantics of the question.
The question conversion engine 108 may include a user interface 111, a question parsing procedure 112, a phrase replacement procedure 114, a parts-of-speech database 116, a term/phrase replacement map 118, and a phrase replacement procedure 122. The user interface 111 accepts input from the user such as user questions 120 and user settings 121 for the question conversion engine 108. The user settings 121 may be used to enable and disengage the question conversion engine 108.
The question parsing procedure 112 accepts a user's question 120 and parses it to determine the parts of speech for each term or phrase in the user's question 120. The question parsing procedure 112 may utilize a parts-of-speech database 116 that contains frequently used words and a corresponding part of speech. The parts-of-speech database 116 may be configured to recognize eight parts of speech such as, a noun, a verb, a pronoun, an adjective, an adverb, a preposition, a conjunction, and an interjection. However, the embodiments are not constrained to this particular configuration of the parts of speech and other variations may be used instead.
The user's question 120 and the corresponding parts-of-speech annotations may then be passed to the phrase replacement procedure 114. The phrase replacement procedure 114 maps the user's question into a declarative expression. The phrase replacement procedure 114 may utilize the term/phrase replacement map 118 to construct an appropriate declarative expression that utilizes terms that may be found in the information that is retrieved.
The term/phrase replacement map 118 may be generated in a number of ways. For example, a team of developers may manually generate the term/phrase replacement map 118 based on exemplary questions and developer-generated responses. This manual approach offers the value of human intelligence at the expense of consuming a considerable amount of time and effort.
Alternatively, a phrase replacement procedure 122 may be executed offline to search for a large set of question and answer pairs, such as frequently asked questions (FAQ) documents and answers that are used by search engines across the Internet. The phrase replacement procedure 122 may parse the FAQ documents to analyze the terms and phrases used most often in answers that may be used to generate the rules for the term/phrase replacement map 118.
The term/phrase replacement map 118 may be embodied in the form of a context-free grammar that consists of multiple expressions. Each expression is configured as a rule having a left hand side phrase that maps into a right hand side phrase. When the terms and/or phrases in the user's question match the left hand side phrase of an expression, they are replaced with the right hand side phrase.
The question parsing procedure 112, the phrase replacement procedure 114, and the search engine 110 each may be embodied as a software application, procedure, program, module and the like. The part-of-speech database 116 and the term/phrase replacement map 118 may be embodied as a database, lookup table, hash table, and the like.
Although the system 100 shown in
In various embodiments, the system 100 described herein may comprise a computer-implemented system having multiple components, programs, procedures, modules. As used herein these terms are intended to refer to a computer-related entity, comprising either hardware, a combination of hardware and software, or software. For example, a component may be implemented as a process running on a processor, a hard disk drive, multiple storage drives (of optical and/or magnetic storage medium), an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a server and the server may be a component. One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer and/or distributed between two or more computers as desired for a given implementation. The embodiments are not limited in this manner.
Attention now turns to an example illustrating a question being converted for information retrieval. Referring to
Next, the phrase replacement procedure 114 searches the term/phrase replacement map 118 for one or more replacement phrases. As shown in
Attention now turns to a second example to illustrate the question conversion process on a question containing a negation. Referring to
Next, the phrase replacement procedure 114 searches the term/phrase replacement map 118 for one or more replacement phrases. In this particular example, the left hand side of the question is matched with the parts of speech in the left hand side of the rule and a declarative expression is formed in accordance with the right hand side of the rule. The right hand side of the rule has a prepositional phrase in quotes 310 (e.g., “<prepositional phrase>”) and a dash 312 before a noun (−) to denote a negated term (e.g., not Alaska).
As shown in
Attention now turns to a more detailed discussion of the operations for the embodiments with reference to various exemplary methods. It may be appreciated that the representative methods do not necessarily have to be executed in the order presented, or in any particular order, unless otherwise indicated. Moreover, various activities described with respect to the methods can be executed in serial or parallel fashion, or any combination of serial and parallel operations. The methods can be implemented using one or more hardware elements and/or software elements of the described embodiments or alternative embodiments as desired for a given set of design and performance constraints. For example, the methods may be implemented as logic (e.g., computer program instructions) for execution by a logic device (e.g., a general-purpose or specific-purpose computer).
Referring to
Referring to
The phrase replacement procedure 114 uses the user's question, the object of the request, and the requested information to construct a declarative expression using one or more replacement phrases from the term/replacement map 118 (block 504). The declarative expression may then be passed onto the search engine 110 (block 504). The search engine 110 uses the declarative expression to search for the information and returns the search results to the user (block 506).
Attention now turns to
The phrase replacement procedure 122 parses the question to determine the question and the answer from the text in the FAQ document (block 604). The question parsing procedure 112 may be used to make this determination (block 604). In addition, the parts of speech of each term and phrase in the question is determined as well as the parts of speech of each term and phrase in the answer (block 604).
The phrase replacement procedure 122 may then analyze the question and its answer to determine the frequency that certain terms in the question appear in the answer (block 606). In addition, the phrase replacement procedure 122 may utilize a statistical technique to determine the frequency that certain parts of speech occur in an answer when a certain term is used in a question (block 606). This analysis may then generate a rule or declarative expression that is added to the term/replacement map 118 (block 608).
For example, the question “How far is it from New York to Los Angeles?” may be found in a FAQ document. The answer may contain the phrase “It is 3,500 miles from New York to Los Angeles.” The phrase replacement procedure 122 may generate a rule that uses the term “miles” in a question having the phrase “how far” based on an analysis that shows the term “miles” often appearing in the search results of questions containing the term “how far.” In this example, the question may be converted into the phrase “miles New York Los Angeles” since this phrase contains terms that are more likely to appear in the search result.
The embodiments described herein are focused on improving the search terms used in a query so that the results contain the information that is requested rather than pointers or documents to web sites that contain one or more of the search terms in the user's question. Searches posed in the form of a query may not produce optimum search results since the words in the query are not necessarily found in the retrieved information. The conversion of the user's question into a declarative expression containing those search terms that are more likely to appear in the retrieved information results in altering the semantics of the user's question. In this manner, the user obtains more relevant documents more readily and does not incur the additional expense of searching through irrelevant retrieved documents.
Attention now turns to a more detailed description of the components of computing device 102. Referring to
The computing device 102 may include a processor 124, a memory 126, a network interface 128, and a user input interface 130. The processor 124 may be any commercially available processor and may include dual microprocessors and multi-processor architectures. The network interface 128 facilitates wired or wireless communications between the computing device 102 and a communications network 104 in order to provide a communications path between the computing device 102 and the servers 106. The network interface 128 may be used to facilitate network communications through a communications network 104. The user input interface 128 accepts user input from input devices, such as a mouse, keyboard, touch screen, and the like.
The memory 126 may be any computer-readable storage media or computer-readable media that may store processor-executable instructions, procedures, applications, and data. The computer-readable media is a non-transitory media that does not pertain to propagated signals, such as a modulated data signal transmitted through a carrier wave. It may be any type of memory device (e.g., random access memory, read-only memory, etc.), magnetic storage, volatile storage, non-volatile storage, optical storage, DVD, CD, floppy drive, disk drive, flash memory, and the like. The memory 126 may also include one or more external storage devices or remotely located storage devices. The memory 126 may contain instructions and data as follows:
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
Various embodiments may be implemented using hardware elements, software elements, or a combination of both. Examples of hardware elements may include devices, components, processors, microprocessors, circuits, circuit elements, integrated circuits, application specific integrated circuits, programmable logic devices, digital signal processors, field programmable gate arrays, memory units, logic gates and so forth. Examples of software elements may include software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, application program interfaces, instruction sets, computing code, code segments, and any combination thereof. Determining whether an embodiment is implemented using hardware elements and/or software elements may vary in accordance with any number of factors, such as desired computational rate, power levels, bandwidth, computing time, load balance, memory resources, data bus speeds and other design or performance constraints, as desired for a given implementation.