Not applicable.
Not applicable.
Although computer systems can store a wealth of information, it can often be difficult for users to find or retrieve a specific fact or piece of information. For example, users often wish to quickly find specific facts or answers to specific fact-based questions, such as, for instance, “what is the population of China.” A variety of search engines currently exist that allow users to search for information by entering a search input comprising one or more keywords that may be of interest to the user. After receiving a search request from a user, a search engine identifies documents and/or web pages that are relevant based on the keywords. Often, the search engine returns a large number of documents or web page addresses, many of which have little or nothing to do with the specific piece of information that the user was seeking. The user is then left to sift through the list of documents, links, and associated information to find the desired fact. This process can be cumbersome, frustrating, and time consuming, especially when the user is looking for a single specific fact or fact set instead of general information about a given topic.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Embodiments of the present invention relate to an unbounded redundant discrete fact data store. The data store stores discrete facts with information for identifying the appropriate discrete fact for a search query. In particular, for each discrete fact, the data store may include a subject of the discrete fact and zero or more indicators representing zero or more facets of the subject corresponding with the discrete fact, thereby facilitating the look-up of discrete facts based on search queries. Additionally, zero or more subject classifications may be included for the subject of each discrete fact. Further, zero or more parent/child relationships between a discrete fact's subject and one or more other subjects may be included in the data store. The subject classifications and subject parent/child relationships provide relationships between the discrete facts, further facilitating searching across domains of discrete facts.
The present invention is described in detail below with reference to the attached drawing figures, wherein:
The subject matter of the present invention is described with specificity herein to meet statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventors have contemplated that the claimed subject matter might also be embodied in other ways, to include different steps or combinations of steps similar to the ones described in this document, in conjunction with other present or future technologies. Moreover, although the terms “step” and/or “block” may be used herein to connote different elements of methods employed, the terms should not be interpreted as implying any particular order among or between various steps herein disclosed unless and except when the order of individual steps is explicitly described.
Embodiments of the present invention relate to an unbounded redundant discrete fact data store. The data store is structured such that answers are stored individually as discrete facts rather than multiple answers being grouped and stored together as a single entry. Additionally, information required to look-up each discrete fact is stored with each discrete fact. In particular, the core of the data store comprises subject-indicator-fact sets. Each discrete fact represents a particular facet of a particular subject. Accordingly, the data store includes an appropriate subject and indicator (i.e., relevant facet of the subject) for each discrete fact, facilitating the look-up of discrete facts in response to search requests. The data store is further structured such that each subject may have zero or more classifications. Additionally, each subject may have zero or more parent/child relationships with other subjects. As such, subjects may be attached in the data store in a consistent hierarchy. Further, alternatives for subjects, indicators, and classifications may be provided within the data store such that search queries may match more intelligently, flexibly, and safely.
Embodiments of the present invention provide, among other things, a data store that is optimized for scalability and look-up. Additionally, it allows for an unbounded number of discrete facts to be stored. Further, the data store may be redundant in that multiple copies of the data may be stored to ensure a very high degree of availability in the case of scattered hardware failure. The data store provides for the look-up of discrete facts, thereby facilitating the ability to return answers to fact-based questions. By providing subject classifications and subjects in a consistent hierarchy, the data store provides relationships between discrete facts from many different domains. Accordingly, when subjects are correctly classified and attached in a consistent hierarchy in the data store, it becomes possible to search across domains of facts. Further, new facts may be computed based on discrete facts in the data store. While embodiments of the present invention are described herein primarily in the context of searching, further embodiments may support browse or navigation scenarios through subjects with the same classifications or through parent/child relationships.
Accordingly, in one aspect, an embodiment of the invention is directed to one or more computer-readable media having stored thereon a data structure for storing discrete facts and information for identifying one or more discrete facts in response to a search query. The data structure includes a first data field containing data representing a discrete fact. The data structure also includes a second data field containing data representing a subject that corresponds with the discrete fact. The data structure further includes a third data field containing data representing an indicator, the indicator representing a facet of the subject that corresponds with the discrete fact.
In another aspect of the invention, an embodiment is directed to one or more computer-readable media having stored thereon a data structure for storing discrete facts and information for identifying one or more discrete facts in response to a search query. The data structure includes a first data field containing data representing a discrete fact; a second data field containing data representing a subject that corresponds with the discrete fact; a third data field containing data representing an indicator, the indicator representing a facet of the subject that corresponds with the discrete fact; a fourth data field containing data representing one or more classifications for the subject that corresponds with the discrete fact; and a fifth data field containing data representing one or more relationships between the subject that corresponds with the discrete fact and one or more other subjects.
In a further aspect, an embodiment of the invention is directed to one or more computer-readable media having stored thereon a data structure for storing discrete facts and information for identifying one or more discrete facts in response to a search query. The data structure includes a first data field containing data representing a discrete fact; a second data field containing data representing a subject that corresponds with the discrete fact and one or more alternatives for the subject; a third data field containing data representing an indicator and one or more alternatives for the indicator, the indicator and the one or more alternatives for the indicator representing a facet of the subject that correspond with the discrete fact; a fourth data field containing data representing one or more classifications for the subject that corresponds with the discrete fact and one or more alternatives for at least one of the one or more classifications; and a fifth data field containing data representing one or more relationships between the subject that corresponds with the discrete fact and one or more other subjects.
Having briefly described an overview of the present invention, an exemplary operating environment for the present invention is described below.
Referring initially to
The invention may be described in the general context of computer code or machine-usable instructions, including computer-executable instructions such as program modules, being executed by a computer or other machine, such as a personal data assistant or other handheld device. Generally, program modules including routines, programs, objects, components, data structures, etc., refer to code that perform particular tasks or implement particular abstract data types. The invention may be practiced in a variety of system configurations, including hand-held devices, consumer electronics, general-purpose computers, more specialty computing devices, etc. The invention may also be practiced in distributed computing environments where tasks are performed by remote-processing devices that are linked through a communications network.
With reference to
Computing device 100 typically includes a variety of computer-readable media. By way of example, and not limitation, computer-readable media may comprises Random Access Memory (RAM); Read Only Memory (ROM); Electronically Erasable Programmable Read Only Memory (EEPROM); flash memory or other memory technologies; CDROM, digital versatile disks (DVD) or other optical or holographic media; magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, carrier wave or any other medium that can be used to encode desired information and be accessed by computing device 100.
Memory 112 includes computer-storage media in the form of volatile and/or nonvolatile memory. The memory may be removable, nonremovable, or a combination thereof. Exemplary hardware devices include solid-state memory, hard drives, optical-disc drives, etc. Computing device 100 includes one or more processors that read data from various entities such as memory 112 or I/O components 120. Presentation component(s) 116 present data indications to a user or other device. Exemplary presentation components include a display device, speaker, printing component, vibrating component, etc.
I/O ports 118 allow computing device 100 to be logically coupled to other devices including 1/0 components 120, some of which may be built in. Illustrative components include a microphone, joystick, game pad, satellite dish, scanner, printer, wireless device, etc.
Referring now to
As used herein, the term “subject” represents a person, place, or thing in which a searcher may be interested. For example,
In operation, the subject-indicator-fact sets may be used to determine answers to search queries. When a user enters a search query, the query may be parsed to determine the relevant subject, indicator, and any other qualifiers for the search. For instance, grammars may be provided for pulling out a subject, indicator, and any other qualifiers from a particular search query. Examples of such grammars are further described in U.S. patent application Ser. No. 11/059,014, filed Feb. 15, 2005, which is herein incorporated by reference in its entirety. The subject, indicator, and any other qualifiers extracted from a query may then be used to search the fact records and return a discrete fact matching the query. For example, a user may provide a search input that includes “what is the population of China.” A grammar may be used to determine that the subject is “China” and the indicator is “population.” Based on this determination, the data store 200 of
Other qualifiers beyond a subject and indicator may also be used to filter fact records and determine an appropriate answer for a query. For instance, instead of the previous query input, a user may provide an input that includes “what was the population of China in 1975.” In addition to determining that the subject is “China” and the indicator is “population,” a grammar may determine that the query further includes the qualifier “1975,” which represents a specific date. This qualifier may be used in conjunction with the subject “China” and indicator “population” to determine an appropriate answer. For instance, the facts table 202 may include a number of fact records having the subject “China” and indicator “population.” However, each of these records may include further fact meta data 214, such as a valid date. By matching the qualifier “1975” from the search query to fact meta data 214, the appropriate fact may be retrieved from the facts table 202 and provided as an answer to the query.
While the subject-indicator-fact sets shown in
Referring to
The facts table 302 in
While the data store 300 of
With respect to subject classifications, each subject may have zero, one, or many classifications. For example, the subject “World” 402 has the classification “Planet” 404. Although each subject in
The addition of subject classifications enhances the data store by providing a mechanism for grouping and sorting facts. In effect, the subject classifications create relationships between discrete facts that allow the data store to search across different domains and readily match answers to more complex search queries. By way of example, a user may provide a search input that includes “what is the state with the largest population.” Based on the input, facts having a subject classified as a “state” may be grouped together and compared to determine which has the largest population.
Similar to the alternatives provided for subjects and indicators in the data store of
In addition to providing classifications for subjects, that data structure shown in
The inclusion of parent/child relationships further enhances the data store by providing another mechanism for sorting and grouping discrete facts. Similar to subject classifications, parent/child relationships effectively create relationships between discrete facts. For example, a user may provide a search input that includes “what is the longest river in Washington.” Based on the input, facts having subjects classified as “rivers” may be filtered by only those having a parent of Washington. Having been filtered thus far, the facts may be compared to provide an appropriate answer to the search query.
In some cases, a parent/child relationship may have a valid date range placed on it to show that the relationship only existed during a certain period. For instance, the parent/child relationship between the subject “United States” 414 and the subject “California” 416 has a date range of “9/9/1850—present” indicating that the relationship is only valid during that time period. A valid date range on parent/child relationships may be useful for determining answers to date-specific search queries. For example, a user may enter a search query that includes “what states were included in the United States in 1820,” in which case the valid date range would preclude California from being included in the answer. Alternatively, a search query that includes “what states were included in the United States in 1860” would result in an answer including California based on the date range for the parent/child relationship.
As described above, subject classifications and parent/child relationships facilitate searches by grouping and creating relationships between discrete facts, thereby allowing the data structure 400 to readily provide answers to more complex search queries. Accordingly, discrete facts may be filtered and compared to provide an answer to a search. Additionally, the data structure allows new facts to be computed based on discrete facts. For example, a user may provide a search input that includes “what is the average GDP of countries in Asia.” The data store may not store the answer to this query as a discrete fact, but may store the GDP of individual countries. Accordingly, based on the search input, the data store may be searched and facts having subjects with the classification “country” and “Asia” as a parent subject may be grouped together. The average GDP may then be calculated based on the relevant discrete facts.
One skilled in the art will recognize that the subjects, parent/child relationships, classifications, and alternatives shown in
In some embodiments of the present invention, particular parent/child relationships may be defined within the data structure as required relationships, in which both the parent subject and the child subject must be present in the query to be considered a valid subject match on the child subject. Examples of required parent/child relationships may be illustrated in the context of Nobel prizes with reference to
Facts stored by a data store in accordance with embodiments of the present invention may be derived from a variety of different data sources. By way of example only and not limitation, some data may be obtained from feed sources, while other data may be obtained by crawling the Internet. In some cases, the data store may support real-time data (e.g., current stock price quotes, sport statistics for current games, etc.). Because such types of data may be continuously changing, it may not be realistic to store the actual data. Instead, a pointer to the actual data may be provided in the data store as opposed to constantly updating the stored data. When a search query results in a particular fact comprising real-time data, the data may be retrieved at that time based on the pointer in the data store, and the retrieved data may be provided as an answer.
As can be understood, embodiments of the present invention provide an unbounded redundant discrete fact data store that facilitates the look-up of discrete facts in response to search queries. The present invention has been described in relation to particular embodiments, which are intended in all respects to be illustrative rather than restrictive. Alternative embodiments will become apparent to those of ordinary skill in the art to which the present invention pertains without departing from its scope.
From the foregoing, it will be seen that this invention is one well adapted to attain all the ends and objects set forth above, together with other advantages which are obvious and inherent to the system and method. It will be understood that certain features and subcombinations are of utility and may be employed without reference to other features and subcombinations. This is contemplated by and is within the scope of the claims.