In a typical computer-implemented search setting, a user inputs what they want to find by typing search query terms into a query box. A search application associated with a search source performs the search by finding search result items such as Internet web sites, documents, and so on, which correspond to the search query terms. The discovered items are typically ranked according to their relevance to the search terms. To accomplish this ranking, search sources employ a variety of ranking programs, which are often their own proprietary schemes. The result is that a search performed by one search source will often produced different results than another search source, even though both use the same search query terms. In addition, when two different search sources identify the same search result item, it is sometimes ranked differently owing to the diverse ranking schemes employed.
Embodiments described herein for combining and re-ranking results of a search performed by multiple search sources involves inputting the results from two or more different search sources, applying a uniform ranking system to all the different result sets, and then combining and presenting the results to a user. In one general embodiment, combining and re-ranking results of the same search performed by multiple search sources is accomplished by first inputting the results of the search from the sources. The search results take the form of a list of search result items that have been ranked by the source that performed the search. While some of the search result items included by each search source may also be found in the search results produced by another source, generally the results produced by the sources will vary from one another. The rank of each search result item is based on its perceived relevance as determined using a ranking scheme employed by the search source. Re-ranking of the combined search results is based on the rankings of one of the search sources, which has been designated as the primary search source. The one or more other search sources are considered secondary search sources.
Once the search results have been input, a ranking standard is established based on the differences in rank between consecutively ranked search results items in the results input from the primary search source. The search result items from each secondary search source are then re-ranked based on this ranking standard to create a common ranking scheme for all the search result items input from the primary and secondary search sources. In addition, duplicate search result items are eliminated. The remaining primary and secondary search result items are then provided to the user.
It should also be noted that this Summary is provided to introduce a selection of concepts, in a simplified form, that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
The specific features, aspects, and advantages of the disclosure will become better understood with regard to the following description, appended claims, and accompanying drawings where:
In the following description of search results combining technique embodiments reference is made to the accompanying drawings which form a part hereof, and in which are shown, by way of illustration, specific embodiments in which the technique may be practiced. It is understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the technique.
In general, the re-ranking of combined search results involves inputting the results from two or more different search sources, applying a uniform ranking system to all the different result sets, and then combining, de-duping, and presenting the results to a user.
Before exemplary embodiments for combining and re-ranking results of a search performed by multiple search sources (hereinafter referred to as search results combining technique embodiments) are described, a general description of a suitable system environment in which portions thereof may be implemented will be described. Referring to
In one general embodiment, combining and re-ranking results of a search performed by multiple search sources is accomplished as shown in the process flow diagram of
Once the search results have been input, a ranking standard is then established based on the differences in rank between consecutively ranked search results items in the results from the primary search source (202). The search result items from each secondary search source are re-ranked based on this ranking standard to create a common ranking scheme for all the search result items input from both the primary and secondary search sources (204). In addition, duplicate search result items are eliminated (206). The remaining primary and secondary search result items are then provided to the user (208).
The foregoing actions will now be described in more detail in the sections to follow.
As disclosed previously, a ranking standard is established using the search results from the primary search source. Referring to
It is next determined if a pair of consecutive search result items having a rank delta that exceeds the stop gap value exists in the search results input from the primary search source (308). This determination presumes the search result items are arranged in a descending order according to their ranks. If one or more such pairs exist, the first occurring pair is identified (310), and all result items ranked lower than the higher ranked of the identified pair are eliminated from the search results (312). After eliminating the lower ranked search result items, or if no pair was identified, an average rank delta is then computed from rank deltas computed between all the remaining consecutively-ranked search result items of the primary search source (314). This average rank delta is designated as the ranking standard (316).
It is noted that in a situation where the first occurring pair of consecutive search result items having a rank delta that exceeds the stop gap value involves the first two items in the rank-ordered search results, only the first search result item would be retained. In such a case, a prescribed default rank delta value is used as the average rank delta. For instance, in the absence of a calculated rank delta, the smallest rank unit of the ranking scheme employed by the primary search source can serve as the default rank delta. As an example, suppose the ranking scheme employed by the primary search source range from 1 to 64K, and is always a whole number. In such a case, a suitable value for the default rank delta is 1. Any other ranking system from a primary search source could be translated into an equivalent range. Also, default rank delta is configurable and should be set depending on the range of ranks used by the search sources.
It is also noted that the prescribed default rank delta value can be used in a situation where the remaining search result items from the primary search source all have the same rank. Thus, the prescribed default rank delta value is used instead of a zero average rank delta.
1.2 Re-Ranking the Search Result Items from the Secondary Sources
The search result items from each secondary search source are re-ranked using the aforementioned ranking standard to create a common ranking scheme for all the search result items from the primary and secondary search sources. This involves separately re-ranking the items from each secondary source as follows.
Referring to
It is noted that in one embodiment of the re-ranking procedure, where the secondary source under consideration can assign the same rank to more than one item, all the search result items having the same initial rank are re-ranked with the same newly assigned rank. Thus, when the first of a series of items having the same initial rank is re-ranked with a certain value, the others having that same initial rank are also re-ranked with that same value.
It is further noted that in one embodiment of the re-ranking procedure, it is possible that not all the search result items from a secondary source will be re-ranked. In this embodiment, before the re-ranking occurs, all but a prescribed number (e.g., 10) of the highest ranked search result items from the secondary search source under consideration are eliminated. Thus, if the number of search result items from the secondary search source under consideration exceeds the prescribed number, some items are eliminated rather than being re-ranked.
As disclosed previously, duplicate search result items among the search result items from the primary and secondary search sources are eliminated. Referring to
The result of the identification action is to identify duplicate item sets each having two or more duplicate search result items. A previously unselected set is selected (502), and the highest ranking item in the set is identified (504). Next, all but the identified highest ranking search result item in the selected set are eliminated (506).
Optionally, when the lower ranking search result items in a set of duplicate items are eliminated, the rank of the remaining item can be increased. This recognizes the fact that more than one search source found the item relevant. To this end, the rank of the remaining search result item in each former set of duplicate search result items is increased by a prescribed amount (508). For example, the rank can be increased by two times the average rank delta. It is noted that the optional nature of this last action is indicated in
It is next determined if all the identified duplicate item sets have been selected (510). If not, process actions 502 through 510 are repeated. Otherwise the procedure ends.
The ranks assigned to individual search result items can also be optionally adjusted under certain circumstances. More particularly, the rank of a search result item from the primary source, or the newly-assigned rank of an item from a secondary source, can be increased or decreased based on circumstances indicative of an item's relevance.
For example, in one embodiment, each search result item is inspected to determine if a search term used in the search producing the item is found in both a body of the item and in metadata associated with the item. If so, the rank of the search result item is increased by a prescribed amount (e.g., two times the average rank delta).
Further, in one embodiment, each search result item is inspected to determine if a search term used in the search producing the item is found only in metadata associated with the item. If so, the rank of the search result item is decreased by a prescribed amount (e.g., two times the average rank delta).
Another optional procedure involves eliminating search result items that are deemed to be unacceptable based on a prescribed acceptability criteria. This is sometimes referred to as culling “blacklisted” items. For example, a list of previously determined unacceptable items can be loaded into cache of the search application from a specified source (e.g. a dedicated table in a database). The search result items can then be checked against that list and extracted. Blacklisted domains can either be identified by: content managers that find unacceptable results by conducting searches; users that report bad site results; or they can be managed by companies that produce lists of bad sites and provide updates on a periodic basis. These can be manually entered into a database table. Unwanted search result items can be culled from the search results produced by the primary or secondary search sources, or both. In addition, this culling can occur when the search results are input, before any processing, or once the processing is complete. Thus, in this latter case the search result items would not be culled until after the re-ranking, duplicate elimination and any optional rank adjustments have been made. As there will be fewer items to screen, doing the culling after the processing has the advantage of reducing the screening costs. However, doing the culling before processing means that fewer items will need to be processed, thereby saving on the processing costs.
The re-ranked and possibly rank-adjusted search result items remaining from those input from the primary and secondary search sources are provided to the user. Generally, this entails displaying the search result items.
More particularly, referring to
A brief, general description of a suitable computing environment in which portions of the search results combining technique embodiments described herein may be implemented will now be described. The technique embodiments are operational with numerous general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
Device 10 may also contain communications connection(s) 22 that allow the device to communicate with other devices. Device 10 may also have input device(s) 24 such as keyboard, mouse, pen, voice input device, touch input device, camera, etc. Output device(s) 26 such as a display, speakers, printer, etc. may also be included. All these devices are well know in the art and need not be discussed at length here.
The search results combining technique embodiments described herein may be further described in the general context of computer-executable instructions, such as program modules, being executed by a computing device. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The embodiments described herein may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
It is noted that any or all of the aforementioned embodiments throughout the description may be used in any combination desired to form additional hybrid embodiments. In addition, although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.