None.
The present invention relates generally to software tools for searching. In particular, the present invention is a software system for optimizing search results from files that index names.
A significant proportion of queries that library users submit to online catalogs include names. Names are used for author searches and are frequently used for subject searches. Because online catalog queries frequently include names, various attempts to optimize search results have been made.
Name search optimization techniques used in search engines and similar applications often include use of a phonetic algorithm or comparable technique that indexes words by their pronunciations. “Soundex” is an example of a well-known phonetic algorithm for indexing names by their sound when pronounced in English. Names with the same pronunciation are encoded to the same string so that matching occurs despite minor differences in spelling. While the use of Soundex and other commonly used search optimization routines tend to produce satisfactory results for a portion of name queries, they generally fail to produce satisfactory results for the full range of categories queries (i.e., partial, errant, complete) presented. Therefore, there is a need for an improved system and method for optimizing results of names searches.
The present invention is an improved system and method for name searching. The invention processes name queries by iteratively applying a two-step process to conduct progressively fuzzier searches until all candidate names have been found or the requested maximum (a definable system parameter) has been exceeded. The invention uses a step-wise process that operates slightly differently for 1) name in the form of last (or family) name, first (or personal) name; and 2) other cases. At each step, if the results produced meet or exceed the requested maximum, the system stops and presents its results.
The system comprises software routines and a database of names with special indexes to support name-based searches. A software search routine applies progressively fuzzier searches and then ranks the results, eliminating unsuitable names, until a predetermined number of candidate names have been found. A series of steps to score and rank potential results within the present invention tends to produce a high degree of satisfactory results. The relevant results are in ranked order with the most relevant name positioned high in the results list, across a wider range of commonly occurring categories of queries. It is designed for deployment in software applications where names are indexed, and it is desirable to return highly relevant results even when the user provides incomplete, errant, or exactly matching search queries for the name as indexed. The invention may be used in any computerized information retrieval system or service that indexes and presents names in search results.
The improved system of the present invention for name searching operates by executing a series of steps to score and rank potential results. The system comprises two primary components: a database of names with special indexes to support searches according to the present invention and a software search routine that applies progressively fuzzier or approximate searches and then ranks the results, eliminating unsuitable names, until a predetermined number of candidate names has been found.
Referring to
The “spelling forgiveness” or misspelling algorithm is applied to single words (e.g., FamilyName or FirstName). First, the letters in the word are sorted. The sorting generates a key that forgives swapped letters. For each letter in the key, a new key is generated with that letter missing. This technique forgives single letter omissions. For example, the term ‘ralph’ generates the keys ‘ahlpr’, ‘ahlp’, ‘ahlr’, ‘ahpr’ and ‘hlpr’. At search time, a search for the term ‘ralhp’ generates the key ‘ahlpr’ and matches records that contain ‘ralph.’
A table of common nicknames is used to generate nicknames for formal names (e.g., Bob for Robert) and formal names for nicknames (e.g., Joan and Joanne for Jo).
The software search routine receives as input from a computer user or other source a name to find and a count of the maximum number of candidate names needed. It then iteratively applies a two-step process to progressively fuzzier searches until all candidate names have been found or the requested maximum has been exceeded. In an example embodiment of the present invention, the two step process consists of a search where the results are ranked by the database by popularity and the top 250 records returned. Those records are then further ranked by the software routine according to the closeness of the name in the name records to the name in the search. Candidate names that are not sufficiently close are eliminated.
Referring again to
Ranking is also completed at each step and unsuitable names are eliminated. The search results are then presented to the user 114. The number of results may be controlled by setting a parameter. The search method of
A full name search according to the present invention for the name ‘jobs, steve’ produces the following results.
As the above search results indicate, records based on variants of the last and first names as well as exact matches for the last and first names are returned.
In another example, the following results are returned for the name ‘tressle, jim.’
Once again, records based on variants of the last and first names are returned. If the results parameter is a large number, the user may see many variants that were considered in the search.
As the example searches indicate, the database indexes and software search routine of the present invention assist a computer user who does not know how to spell the name of a person of interest by considering and presenting to the computer user many variants. The special indexes and search routine of the present invention help a computer user to locate relevant name records, even if the user misspells the name in a variety of ways or uses a nickname instead of the formal name. The computer user may peruse the results on the computer display and find the person of interest in the results, even with little knowledge about how the person's name is spelled. The spelling variants that are considered in the search help to locate many records, any one of which may have information about the actual person of interest.
While certain exemplary embodiments are described in detail above, the scope of the application is not to be considered limited by such disclosure, and modifications are possible without departing from the spirit of the invention as evidenced by the following claims:
Number | Name | Date | Kind |
---|---|---|---|
4453217 | Boivie | Jun 1984 | A |
4974191 | Amirghodsi et al. | Nov 1990 | A |
5148541 | Lee et al. | Sep 1992 | A |
5258909 | Damerau et al. | Nov 1993 | A |
5551018 | Hansen | Aug 1996 | A |
6144958 | Ortega et al. | Nov 2000 | A |
6401084 | Ortega et al. | Jun 2002 | B1 |
6405172 | Baker et al. | Jun 2002 | B1 |
6411950 | Moricz et al. | Jun 2002 | B1 |
6564213 | Ortega et al. | May 2003 | B1 |
6898590 | Streifer | May 2005 | B1 |
6904436 | Merchant et al. | Jun 2005 | B1 |
7099857 | Lambert | Aug 2006 | B2 |
7254773 | Bates et al. | Aug 2007 | B2 |
7296019 | Chandrasekar et al. | Nov 2007 | B1 |
7321892 | Vadon et al. | Jan 2008 | B2 |
7630978 | Li et al. | Dec 2009 | B2 |
20020194229 | Decime et al. | Dec 2002 | A1 |
20060031239 | Koenig | Feb 2006 | A1 |
20070033217 | Basner | Feb 2007 | A1 |
20070078828 | Parikh et al. | Apr 2007 | A1 |
20070088679 | Heffler | Apr 2007 | A1 |
20070276845 | Geilich | Nov 2007 | A1 |
20080126389 | Mush et al. | May 2008 | A1 |
20080208812 | Quoc et al. | Aug 2008 | A1 |