The following disclosure relates generally to search methods and associated systems, including tools for answering specific fact-based questions.
Computer systems can store a wealth of information, however, it can often be difficult to find or retrieve a specific fact or piece of information when desired. Many search engines allow a user to search for information by entering one or more keywords that may be of interest to the user. After a user submits a search request that contains the keywords, the search engine identifies documents or web pages that may be related to those search terms. Often, the search engine returns a large number of documents or web page addresses, many of which have little or nothing to do with the specific piece of information that the user was seeking. The user is then left to sort through the list of documents, links, and associated information to find the desired fact. This process can be cumbersome, frustrating, and time consuming, especially when the user is looking for a single specific fact or fact set instead of general information about a topic.
The present invention is directed generally toward search methods and associated systems. One aspect of the invention is directed toward a computer-implemented searching method that includes receiving an input having a format (e.g., receiving a question). The method further includes finding a pattern that matches the format of the input using a rule set (e.g., a rule set that includes one or more context free grammar rules). The method still further includes determining a subject of the input based on the pattern, finding a result record corresponding to the subject, and sending an output based on the result record. In certain embodiments, this process can provide a user with an effective and efficient way to quickly search for information (e.g., to answer a question) in a computing system environment.
In certain embodiments, the method can further include determining at least one qualifier based on the pattern and finding a result record corresponding to the subject and the at least one qualifier. In other embodiments, the method can further include finding multiple result records corresponding to the subject. The result records can include a relevancy element, and the method can further include sending an output based on a portion of the multiple result records and the relevancy elements. In still other embodiments, the method can further include determining a subject of the input based on the pattern and at least one synonym rule.
Another aspect of the invention is directed generally toward a computer-implemented searching method that includes receiving an input having a format and finding a pattern that matches the format of the input using a rule set. The method can further include determining if the pattern is suitable for use with a fact tool or at least one other tool. If the pattern is suitable for use with the fact tool, the method can still further include determining a subject of the input based on the pattern, finding a result record corresponding to the subject, and sending an output based on the result record. In certain embodiments, if the pattern is suitable for use with the fact tool, the method can further include determining at least one qualifier using the rule set and finding a result record corresponding to the subject and the at least one qualifier.
The following disclosure describes several embodiments of search methods and associated systems, including tools for answering specific fact-based questions. Specific details of several embodiments of the invention are described below to provide a thorough understanding of such embodiments. However, other details describing well-known structures and routines often associated with computer-based systems and computer-based searching methods are not set forth below to avoid unnecessarily obscuring the description of the various embodiments. Additionally, several flow diagrams and processes having process portions are described to illustrate various embodiments of the invention. It will be recognized, however, that these process portions can be performed in any order, and are not limited to the order described herein with reference to particular embodiments. Furthermore, those of ordinary skill in the art will understand that the invention may have other embodiments that include additional elements or lack one or more of the elements described below with reference to
The invention is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structure, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
With reference to
Computer 110 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by computer 110 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computer 110. Communication media typically embody computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer-readable media. It will be recognized that computer-readable media can store computer-executable instructions for performing at least a part of any or all process portions described herein.
The system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132. A basic input/output system 133 (BIOS), containing the basic routines that help to transfer information between elements with computer 110, such as during start-up, is typically stored in ROM 131. RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 120. By way of example, and not limitation,
The computer 110 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media, discussed above and illustrated in
A user may enter commands and information into the computer 110 through input devices such as a keyboard 162 and pointing device 161, commonly referred to as a mouse, trackball, or touch pad. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 120 through a user input interface 160 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port, or a universal serial bus (USB). A monitor 191 or other type of display device is also connected to the system bus 121 via an interface, such as a video interface 190. In addition to the monitor, computers may also include other peripheral output devices such as speakers 197 and printer 196, which may be connected through an output peripheral interface 195.
The computer 110 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 180. The remote computer 180 may be a personal computer, a server, a router, a network PC, a peer device, or other common network node, and typically includes many or all of the elements described above relative to the computer 110, although only a memory storage device 181 has been illustrated in
When used in a LAN networking environment, the computer 110 is connected to the LAN 171 through a network interface or adapter 170. When used in a WAN networking environment, the computer 110 typically includes a modem 172 or other means for establishing communications over the WAN 173, such as the Internet. The modem 172, which may be internal or external, may be connected to the system bus 121 via the user input interface 160, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 110, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,
In further embodiments, the process 200 can also include presenting an input prompt to signal a user to enter an input (process portion 212). In certain embodiments, the process 200 can further include providing help information to a user to aid the user in formatting the input (process portion 214). In other embodiments, the process 200 can also include determining at least one qualifier based on the pattern, and finding a result record corresponding to the subject can include finding a result record corresponding to the subject and the at least one qualifier (process portion 216). In still other embodiments, the process 200 can further include presenting the output to a user (process portion 218). In certain embodiments, finding a result record corresponding to the subject can include finding multiple result records corresponding to the subject and the process 200 can further include receiving a command to send an output based on a selected number of the multiple result records (process portion 220). In still other embodiments, the subject or the subject and qualifier(s) can be determined simultaneously with finding a pattern that matches the format.
In the illustrated embodiment, receiving an input (process portion 202) can include receiving an input from a user through an input device (e.g., through a keyboard, mouse, and/or a microphone). In other embodiments, receiving an input (process portion 202) can including receiving an input from another source, for example, another computer application or process. As discussed above, in certain embodiments the process 200 can include presenting an input prompt to signal a user to enter an input (process portion 212) and/or providing help information to the user to aid the user in formatting the input (process portion 214).
In
In the illustrated embodiment, help information 365 is displayed above the input prompt 371 and includes the text, “Help: Enter a question in the same manner as you would ask a person the question.” In other embodiments, the help information can be provided via other methods, for example, in audio form. In certain embodiments, help information is continually displayed. In other embodiments, help information is only displayed in response to certain conditions (e.g., when requested by the user, when the user makes an invalid input, and/or when the process 200 cannot be completed using the input 370). In certain embodiments, the help information 365 includes a link (e.g., a link to a help utility program or process). In other embodiments, the help information 365 includes an interactive process. For example, in certain embodiments, the user can search a table of contents or index for help information. In other embodiments, the help utility leads the user through a series of questions to aid the user in performing certain tasks (e.g., formatting the input 370).
In the illustrated embodiment, the user has entered an input 370 via a keyboard that includes the text, “What was the population of China in 2004.” In other embodiments, the user can enter an input 370 via other methods (e.g., using an audio or voice input). The input 370 can include one or more portions. As discussed below in further detail, the input 370 can be parsed into multiple portions via the process 200 discussed above with reference to
Additionally, an input 370 can be formatted in various manners. For example, while the input 370 in the illustrated embodiment includes the phrase “What was the population of China in 2004,” the user could have entered an input that included the phrase “in 2004 what was the population of China.” Although these two phrases have similar meanings, they have different grammar structures and different formats (e.g., word order).
In the illustrated embodiment, the first and fifth rules 477a and 477e include patterns that can be compared to the input 370 (shown in ghosted characters) to find a pattern that matches the format of the input (process portion 204 discussed above with reference to
In certain embodiments, selected portions of the patterns in the rules 477 can be optional. In order for a specific pattern to match the format of the input 370, the input 370 can, but does not have to contain portions that match the optional portions of the specific pattern. In
Additionally, in certain embodiments, selected portions of the pattern can include variable terms. In certain cases, the variable terms are limited to a selected number of specified items (e.g., specific word(s), letter(s), number(s), reference(s), and/or symbol(s)). In other cases, the variable terms can include any item. In
In the illustrated embodiment, the format of the input 370 matches the pattern of the first rule 477a. The {[whatis]} portion of the pattern corresponds to the “what was” portion of the input 370, the {the} portion of the pattern corresponds to the “the” portion of the input 370, the {[join]} portion of the pattern corresponds to the “of” portion of the input 370, and the {in} portion of the pattern corresponds to the “in” portion of the input 370. The input 370 also includes portions that are located or positioned in the input 370 to correspond with the [first qualifier], the [subject], and the {[second qualifier]} portions of the pattern. Accordingly, the pattern of the first rule 477a matches the input 370. In certain embodiments, the input 370 can match more than one pattern or rule 477. For example, in certain embodiments, the input 370 can be parsed differently when being matched to a different pattern (e.g., the input 370 can be divided into different portions or word groups to fit a different pattern). In some embodiments, the rules 477 can include additional features. For example, in certain embodiments, a pattern will be found to match the format of the input 370 only when the pattern matches the format and the portion of the input corresponding to the subject contains a certain item or group of items.
Because the pattern of the first rule 477a matches the input 370, a subject of the input 370 can be determined based on the pattern (process portion 206 discussed above with reference to
Multiple inputs 370 can match the pattern of the first rule 477a. For example, an input, “what is the population of China,” does not include the {in} and {[second qualifier]} portions of the first rule 477a and the input portion corresponding to the {[whatis]} portion is “what is” instead of “what was,” but the input “what is a population of China” matches the first rule 477a, with the “China” portion corresponding to the subject. Similarly, inputs that include “population of China,” and “population China” also match the pattern of the first rule 477a, with “China” corresponding to the subject. An input “what is the population of China Tex.” (e.g., what is the population of the city China in the state of Texas) also matches the pattern of the first rule 477a, with “China Tex.” corresponding to the subject and “population” corresponding to the first qualifier. Additionally, “what is the population of the People's Republic of China,” matches the pattern of the first rule 477a, with “the People's Republic of China” corresponding to the subject. Similarly, “what is the population of the PRC” matches the pattern of the first rule 477a, with “the PRC” corresponding to the subject. “China population” also matches the pattern of the first rule 477a, but with “population” corresponding to the subject and “China” corresponding to the first qualifier.
The input, “in 2004 what was the population of China” does not match the pattern of the first rule 477a, but does match the pattern of the fifth rule 477e. Using the fifth rule, “China” corresponds to the subject, “population” corresponds to the first qualifier, and “2004” corresponds to the second qualifier. Accordingly, although using different rules (e.g., the fifth rule 477e and the first rule 477a), the same subject and qualifier can be determined for the input “in 2004 what was the population of China” and the input “what was the population of China in 2004.” As discussed below in further detail, in certain embodiments this feature can allow the same result record to be found for both inputs.
Once a subject is determined, a result record corresponding to the subject can be found (process portion 208 discussed above with reference to
In other embodiments, a result records table can have more or fewer result records and/or the result records can have more, fewer, and/or different elements. For example, in certain embodiments a result records table can include links or references to other tables or data files. In other embodiments, the result records can be part of the rule set discussed above with reference to
Once the subject or a subject and at least one qualifier (e.g., a subject/qualifier(s) combination) have been identified, the result records table can be searched to find one or more corresponding result records. For example, in the illustrated embodiment a subject “China” and a qualifier “population” can correspond to the first result record 580a. An output can be sent (e.g., to a user or to another application) based on the result element 586 of the first resort record 580a. For example, an output containing “The population of China is approximately 1.3 billion (source year) URL” can be sent to a user in response to an input that included “what is the population of China.” The “source year” can include the source (e.g., the name of an encyclopedia) on which the result element 586 is based and the date or year of that source. The “URL” can include one or more links to other tables, files, and/or sources (e.g., to a website) containing additional information that might be of interest to the user.
In certain cases, it can be desirable to return multiple results to a single query. For example, an input that includes “what is the population of China” can be a query about the population of the country China or the population of the city China in the state of Texas. Accordingly, the result records can contain references, pointers, and/or links to other records or tables. For example, in the illustrated embodiment, a subject of “China” and a qualifier of “population” can correspond to a first result record 580a. The first result record 580a can include a reference to the second result record 580b. The output, can be based on both the first result record 580a and the second result record 580b. For example, the output can include “the population of China is approximately 1.3 billion (source year) URL; the population of China, Tex. is approximately 1,100 (source year) URL.” This feature can provide a user with an unambiguous answer to the user's query, even when there are ambiguities with respect to the user's query.
In other embodiments, input ambiguities can be handled using various methods and/or rules regarding finding a result record corresponding to a subject or subject/qualifier(s) combination. As illustrated above, in certain embodiments a result record corresponds to a subject or a subject/qualifier(s) combination only when all the identified subjects and qualifiers are contained in the result record. In other embodiments, a result record corresponds to a subject or subject/qualifier(s) combination when the subject and/or the subject and a selected number of qualifiers are contained in the result record. For example, in certain embodiments, the search process can be set up such that a result record is found to correspond to a subject or subject/qualifier(s) combination when the subject or the subject and first qualifier are contained in the result record, regardless of whether there are any other qualifiers. Accordingly, an output can be sent or returned based on some or all of the corresponding result records. In still other embodiments, the number of qualifiers that must be matched to find a corresponding result record can be fixed or vary with different factors (e.g., the pattern used to determine the subject and/or the number of qualifiers identified by the pattern).
Additionally, as shown in
In certain embodiments, the relevancy element 586 can be used to determine the order the result records will be used in the output and/or whether certain result records will be used at all. For example, in the illustrated embodiment the first resort record 580a has a larger relevancy element 586 (e.g., 800) than that of the second result record 580b (e.g., 200). Accordingly, the first result record 580a was used first in the output discussed above.
In certain embodiments, the relevancy element 586 can include fixed values and/or smaller relevancy elements 586 can take priority over larger relevancy elements 586. In other embodiments, the relevancy elements 586 can have other arrangements. In certain embodiments, the relevancy elements 586 can include other items or values (e.g., a numeric or alphanumeric value or term can be used to order the use of the relevancy records 586). In other embodiments, the relevancy elements 586 can be computed based on the pattern used to determine the subject of the input. For example, in certain embodiments the result records 580 can have different values for the relevancy elements 586 depending on whether the pattern in the first rule 477a or the fifth rule 477e, discussed above with reference to
As discussed above, different inputs can include different terms or items that have similar meanings (e.g., synonyms). For example, a user who enters an input that includes “what is the population of China,” may be requesting the same information as another user who enters “what is the population of the PRC.” Accordingly, it can be desirable to account for synonyms when determining the subject of an input and/or when finding a result record.
In certain embodiments, the result records table can include synonyms for the subject(s) and/or qualifier(s). For example, if the subject of the input is “the People's Republic of China” or “the PRC,” the result records table can include a result record with the subject of “the People's Republic of China” and another result record with the subject of “the PRC.” Both result records can have result elements 586 similar to that of the first result record 580a that has “China” as a subject. In other embodiments, the subject of the first result record 580a can include “‘China’ or ‘the PRC’ or ‘the People's Republic of China’” and the result record can correspond to a subject that includes any of the three terms.
In still other embodiments, synonyms can be identified using a separate rule, separate table, separate database, or separate part of the result records table. For example, in certain embodiments determining the subject or subject/qualifier(s) combination of the input based on the pattern can include determining a subject of the input based on the pattern and the rules set (e.g., where the rule set includes one or more synonym rules, tables, and or data). As shown in
In certain embodiments where there are multiple result records associated with a subject and/or a subject/qualifier(s) combination, it can also be desirable to base an output on a selected number of result records. For example, in some embodiments a user can select a number of result records on which the output will be based. In other embodiments, a process may base the output on a selected number of result records and/or only use result records having a selected range of relevancy elements. Although, this feature can be applied to many or all of the embodiments described herein, it can be especially useful for inputs that are associated with finding the largest or smallest of items in a set.
For example, in certain embodiments an input can include a query that asks, “What are the three longest rivers in the world?” The input can match a pattern (e.g., a rule) and the pattern can be used to determine that a subject of the input is “rivers,” a first qualifier of the input is “longest,” and a second qualifier of the input is “world.” Additionally, a third qualifier and/or a command “three” can be identified and used to indicate the number of result records upon which the output should be based. In the illustrated embodiment, the pattern used to determine the subject and the qualifiers can be associated with one or more specific result records tables that contain result records corresponding to one or more lists of largest and/or smallest items.
The result records table in
In
In other embodiments, the output can be derived by other processes and/or include other arrangements. For example, in certain embodiments portions of the input can be used to build an output string (e.g., the “the three longest rives in the world” portion and the “are” portion of the input “What are the three longest rivers in the world?” can be used to build the “three longest rivers in the world are” portion of the output 895). In still other embodiments, the output can be sent and/or presented in other forms. For example, in certain embodiments the output can be sent to another computer application. In other embodiments, instead of displaying the output to a user, the output can be presented to the user in an audio format.
As shown in
A feature of some of the embodiments described above is that a process (e.g., a fact tool) can provide a method through which a user can quickly, effectively, and efficiently find selected information. An advantage of this feature is that information can be found in less time and with less frustration than with current methods. For example, as shown in
If the input is suitable for use with the fact tool, the process 1000 can further include determining one or more subjects (process portion 1008); determining one or more qualifiers, if any (process portion 1010); and determining if there are one or more corresponding result records (process portion 1012). If there is at least one corresponding result record, the process 1000 can further include sending one or more outputs based on at least one of the one or more result records (process portion 1014). In certain embodiments, the output can be sent in an XML format to facilitate use in or with another computer application. If there are no corresponding result records, the process 1000 can include returning nothing, sending a no result message, and or providing help information to aid the user (process portion 1016). For example, in certain embodiments the process 1000 can provide help information to aid the user in formatting an input.
If the input format matches one or more known patterns (process portion 1004), but is not suitable for use by the fact tool (process portion 1006), the input (or portion of the input) can be sent to an appropriate tool (process portion 1018) and the process 1000 can return an answer using the appropriate tool, return nothing, send a no result message, and/or provide help information to aid the user (process portion 1020). If the input format does not match a known pattern (process portion 1004), the process 1000 can determine whether there is a question word (e.g., what, who, how, when, where, or why) or a question mark in the input (process portion 1022). If there is a question word or a question mark in the input, the process 1000 can provide help information to the user (process portion 1024). If there are no question words and/or question marks in the input, the process 1000 can return nothing, send a no result message, and or provide help information to aid the user (process portion 1026). Accordingly, the process 1000 can provide an efficient and effective method of quickly finding selected information in a computing environment.
From the foregoing, it will be appreciated that specific embodiments of the invention have been described herein for purposes of illustration, but that various modifications may be made without deviating from the invention. For example, aspects of the invention described in the context of particular embodiments may be combined or eliminated in other embodiments. Although advantages associated with certain embodiments of the invention have been described in the context of those embodiments, other embodiments may also exhibit such advantages. Additionally, none of the embodiments need necessarily exhibit such advantages to fall within the scope of the invention. Accordingly, the invention is not limited except as by the appended claims.