The present invention relates generally to a method for improving the relevance of search results by considering the context of the query as well as its arguments.
As computers and networks grow and multiply, and as the amount of data being gathered and probed increases exponentially, search engines have become indispensable tools for most aspects of business.
Search engines turn vast reservoirs of meaningless data into invaluable information. It is the capability of these engines to separate the wheat from the chaff that powers the great databases of the world, which in turn power most information management systems: supply and demand, CRM, e-commerce, payroll, accounting, documentation, file management, customization, ad-serving and many other types of systems.
Search technology has become increasingly strategic for all aspects of business. It has become a formidable money-maker for various technology and media players on the internet, and is at the top of the priority list for companies like Microsoft, Google, Yahoo and AOL, among a myriad of other ventures of all sizes.
Search technology is at the heart of the commerce and culture revolution of our times, and as the volume of data and the number of queries grow, the importance of the relevance of those queries grows too. Relevant results are defined herein as “having some sensible or logical with something else, for example, a matter being discussed or investigated.” Hence, if what we are looking for are “relevant” results, and that means that they have a sensible or logical connection to something else, it becomes obvious that the “something else” has to be a consideration in the query.
Many initiatives and ideas aimed at improving the relevance of results have emerged in the last few years, the most influential and widely discussed of them being the Google search algorithm. By taking into consideration the number of links connecting to a given page, and the number of people who find it useful or interesting, Google tackled relevancy head on. Searches are no longer performed in a vacuum, they take into consideration earlier searches and connections between the data that were not considered previously.
The present application extends the contextual nature of the search by considering the context in which the search arguments where found.
It is an object of the present invention to enhance the relevance of search results by considering additional data surrounding queried text. Preferably, this is achieved by delivering search functionality within other applications instead of as a text entry box with no relation to the context in which the query arguments are originally found.
Prior to the current invention, searches have been performed more or less in the following fashion:
It becomes clear from the above description that the string that is used for the query is removed from its context and pasted into another application (or another website) before the search is performed. This removal from context hinders the search engine's ability to render relevant results, since relevance is by definition a function of context and context is no longer available.
To solve this problem, the present invention brings search capabilities to the original document, whether it is a web page, a Microsoft Word file, a database file or any other kind of data. Thus, it is possible to consider the text surrounding the selection.
Some embodiments of the current invention could achieve this by using “Shvitzer” technology, as disclosed in U.S. Provisional Application No. 60/517,586, the disclosure of which is incorporated herein by reference in its entirety. Such an embodiment allows the search function to be included in the contextual menu deployed by highlighting text on a web page.
One embodiment of the present invention is activated by dragging the selection onto a specific area of the screen.
Other embodiments take the functionality to the application level, adding it to menus or palettes, and empowering users to conduct searches directly from a specific application.
Another embodiment takes the form of a specialized application that is activated in any other program by use of macros or mouse/key combinations.
Alternatively, the current invention could be integrated at the operating system level, making the functionality available throughout the entire system.
In all embodiments, the current invention allows for the contextualization of the query string, so that the search engine can use contextual information to enhance the search itself.
It is contemplated that, in some embodiments of the present invention, the selected text could be submitted along with the surrounding text to the search engine, so as to keep the search in context. Other embodiments, like the currently preferred one, could use any of the widely available web based search engines to refine the examination in a succession of individual searches that are defined by an algorithm. This embodiment benefits from the fact that any search engine can be used, without the need for modifying it. A currently preferred embodiment uses Google as the search engine.
Those skilled in the art will realize that considering the surrounding sentence and paragraph in addition to the selected text allows for a number of variations in the search algorithm in order to customize and tweak the results of the search.
Those skilled in the art will also appreciate that the invention is not limited to the use of a single search engine, but may make use of multiple search engines simultaneously, applying a contextualization algorithm to the various results returned.
The foregoing brief description, as well as further objects, features, and advantages of the present invention, will be understood more completely from the following detailed description of a presently preferred, but nonetheless illustrative embodiment, with reference being had to the accompanying drawing, in which;
The following nomenclature is utilized in the following description:
The logic flow described in
The process continues at block 107, where the text in its entirety (or just the paragraph) is compared with a list of words that should not be considered in the analysis. These are words that are considered irrelevant for a number of reasons (e.g., prepositions and articles). Next, at block 10, the paragraph, the sentence and the selection are identified and each is subjected to a different path of analysis, as seen in blocks 111, 112 and 113.
The paragraph analysis begins at block 111 and goes on to block 115, where the syntax is examined and proper nouns are identified. The number of proper nouns is considered at block 117, if they exceed a predetermined amount then flow jumps to block 121, otherwise block 119 identifies all common nouns in the paragraph and adds them to the list of proper nouns already identified in block 115. The process resumes at block 121, where a list of nouns is compiled. The list includes only proper nouns or all nouns in the paragraph, depending on the whether the number of proper ones does or does not exceed the figure.
Block 123 represents the process by which the list of nouns is divided into groups. The number of words per group may vary. Each group is passed on to block 125, where they are submitted to a search engine as separate queries. The process then merges onto the sentence analysis branch at block 131.
The sentence analysis branch begins at block 127, continuing from block 112. Block 127 groups the words of the sentence into query strings of a few words each. The list of query strings is passed on to block 129, where they are submitted to a search engine separately. The list of results from the individual queries is then compared to the list of results from the paragraph analysis. This takes place at block 131. Words that appear on both lists of results are passed on to block 133, where each word is assigned a score (based on whether it is a proper noun, how many times it appears, how close to the selection it is found, how often it was queried before, etc.), and then organized in a list in block 135.
Next, at block 137, the top words from the list are sent to block 139. Block 139 merges the result of the above process with the original selection coming directly from block 113, and it assembles a query with the selection plus the top words from the paragraph and sentence analyses. Next, at block 141, the query is submitted to a search engine, which returns its results at block 143. The process ends at block 145.
Depending on the embodiment, the selected text could be submitted along with the surrounding text to the search engine, so as to keep the search in context. This, of course, would require an especially designed browser that would parse the text into paragraphs, sentences and the selected keyword. In another embodiment, the selected text and surrounding text could be placed in any of the widely available web based search engines to refine the examination in a succession of individual searches that are defined by an algorithm. This embodiment benefits from the fact that any search engine can be used, without the need for modifying it. A currently preferred embodiment uses Google as the search engine.
Those skilled in the art will realize that considering the surrounding sentence and paragraph in addition to the selected text allows for a number of variations in the search algorithm in order to customize and tweak the results of the search.
Although preferred embodiments of the invention have been disclosed for illustrative purposes, those skilled in the art will appreciate that many additions, modifications and substitutions are possible, without departing from the scope and spirit of the invention.
This application claims the benefit of priority under 35 U.S.C. § 119 (e) to U.S. Provisional Application No. 60/538,759, filed Jan. 23, 2004. This application is a continuation of International Application PCT/US2005/002323, filed Jan. 24, 2005, designating the United States of America and published in English as WO 2005/070019 on Aug. 4, 2005. Both of these applications are hereby incorporated by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
60538759 | Jan 2004 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US05/02323 | Jan 2005 | US |
Child | 11487720 | Jul 2006 | US |