Determining a known character string equivalent to a query string

Description

BACKGROUND

A. Technical Field

The present invention is related to text equivalencing, and more particularly, to text equivalencing for human language text, genetic sequences text, or computer language text.

B. Background of the Invention

In many contexts finding equivalent text is desirable. Equivalent texts are two pieces of text that are intended to be exactly the same, at least one of which contains a misspelling or typographical error or an difference in representation such as using a symbolic representation of a word or a different word order. One context where finding equivalent text is desirable is in identifying a song or book by its textual identifiers such as title, artist or author, album name or publisher name, etc. Another such context is in identifying genetic sequences. In these and other contexts, it is desirable to be able to identify equivalent texts because typographical errors or misspellings can occur. In particular, misspellings occur with a high frequency with foreign-sounding text or names, or when the text is the product of voice transcription. Also, there are times when certain words, such as, “the” are omitted or added erroneously, or a character is replaced by a word or vice versa, for example “&” to “and” or “@” to “at.”

In the context of text equivalencing for textual information related to music, there are many ways the text ends up close, but not exactly the same. For example, when a song is downloaded the information can be typed in manually by the user, thus increasing the chance for error. In the context of a software music player it is generally important to identify the track of music by the textual information. If the textual information cannot be determined it is difficult to identify the music track.

One method of identifying an equivalent text is by applying a simple set of heuristics to the text and then comparing it to known texts. The simple set of heuristics can overcome problems of variable amounts of white space and omitting or adding words such as “the” at the beginning of an album or book name. However, such heuristics fall short when it comes to typographical errors or misspellings. It is not possible to apply a heuristic for every possible mistyping or misspelling based on every possible pronunciation of every word in the text. The problem is further compounded when the words are not actual language words, but instead are proper names, genetic sequences, acronyms, or computer commands. In those contexts, predicting the mistakes and forming the heuristics is more difficult than when the words are language words.

What is needed is a system and method of text equivalencing that avoids the above-described limitations and disadvantages. What is further needed is a system and method for text equivalencing that is more accurate and reliable than prior art schemes.

SUMMARY OF THE INVENTION

The present invention provides a text equivalencing engine capable of determining when one string of characters or text is equivalent or probably equivalent to another string of characters or text. In one embodiment, the string of characters is textual information describing a track of music such as a song title, artist name, album name or any combination of these attributes. In another embodiment, the string of characters is textual information describing a book or magazine such as title of the book or magazine, author name, or publisher. In another embodiment, the text is a genetic sequence or the text is a computer program listing. One skilled in the art will recognize that the techniques of the present invention may be applied to any type of text or character string. The present invention can determine when two strings of characters are equivalent even when there are misspellings or typographical errors that are not addressed by a set of heuristics. The present invention can be used to accurately perform text equivalencing in most cases. Thus, the present invention overcomes the above-described problems of misspellings and typographical errors.

The text equivalencing is performed by first applying a set of heuristics. The set of heuristics modifies the text to be equivalenced by a set of rules. The rules are designed to eliminate common mistakes such as eliminating or adding whitespace, adding or deleting the word “the,” changing “&” to “and” and vice versa, and a few common misspellings and typographical errors.

The modified strings are compared to known strings of characters or text. If a match is found, the text equivalencing process is exited successfully. If an exact match is not found, the string of characters is separated into sub-strings. The sub-strings can be of any length. In one embodiment, the sub-strings are typically three characters long and are referred to as 3-grams. In another embodiment, the length of the sub-strings is adaptively determined.

Any information retrieval technique can be applied to the sub-strings to perform accurate text equivalencing. In one embodiment, the information retrieval technique weights the string of characters and scores other known strings of characters using a technique known as Inverse Document Frequency (IDF) as applied to the sub-strings in each string of characters. If the highest scoring known string of characters scores above a first threshold, t₁, then that known string of characters is accepted as a match to the string of characters and all other known strings are considered not to be matches. If one or more of the scores is between the first threshold, t₁, and a second threshold, t₂, the string of characters is accepted as a match after a user manually confirms the match and all known strings scoring below t₂are discarded. If no known strings score above t₂and there is a set of known strings scoring between the second threshold, t₂, and a third threshold, t₃, all strings scoring in this range are presented to the user as options and all strings scoring below t₃are discarded. The user can select one of the strings as a match. If the scores are all below the third threshold, t₃, the strings are ignored and a match is not determined. Once a match is determined, the matching string can be updated to reflect the fact that the string of characters is equivalent to it.

As can be seen from the above description, the present invention may be applied to many different domains, and is not limited to any one application. Many techniques of the present invention may be applied to text equivalencing in any domain.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a functional architecture for text equivalencing.

FIG. 2 is a flow diagram of a method of text equivalencing in accordance with the present invention.

FIG. 3 is a flow diagram of a method of database update in accordance with the present invention.

FIG. 4 is a flow diagram of a method of grouping characters to form sub-strings.

FIG. 5 is a flow diagram of a method of scoring text and strings of characters using the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The following description of preferred embodiments of the present invention is presented in the context of a text equivalencing engine for any string of characters. In some embodiments, it may be implemented for use in identifying mistyped or misspelled song titles, artist names or album titles for an Internet-based jukebox or for identifying genetic sequences. One skilled in the art will recognize that the present invention may be implemented in many other domains and environments, both within the context of text equivalency, and in other contexts. Accordingly, the following description, while intended to be illustrative of a particular implementation, is not intended to limit the scope of the present invention or its applicability to other domains and environments. Rather, the scope of the present invention is limited and defined solely by the claims.

Now referring to FIG. 1, there is shown a block diagram of a functional architecture for text equivalencing. Also, referring to FIG. 2, there is shown a method of text equivalencing. Text equivalencing is useful for determining when one string of characters is equivalent to another string of characters. A character may be any letter, number, space, or other symbol that can be inserted into text. In an Internet jukebox or personalized radio system, it is typically important to be able to identify a track of music by its title, artist, and album name because such a system makes recommendations based on a history of listener behavior. In other domains it is also important to be able to determine when one piece of text is equivalent to another. For example, in genetic sequencing it is useful to determine when two genetic sequences are equivalent. Also, in computer programming, text equivalencing can be an important tool as a reverse engineering tool for documenting large programs.

The text equivalencing engine 140 takes a string of characters 105 and determines a text equivalent 135 to the string of characters. A heuristics module 110 modifies the character string by applying a set of rules to the string of characters 205. In one embodiment, there are typically about 50 heuristics that are applied. However, there could be fewer or greater than 50 heuristics applied depending on the system and the type of text.

In one embodiment, the heuristics include omitting whitespace, adding or deleting the word “the,” converting “&” to “and” and vice versa, and any other heuristic as may be appropriate or desired. Also, if the string of characters is words in a human language, a heuristic could be applied that changes the order of the words in the string such as in the modification of “Einstein, Albert” to “Albert Einstein.” In one embodiment, once the heuristics have been applied, a database containing known character strings is searched by comparator module 117 to find any exact match to the character string or any of the new characters strings created by modifying the character string in accordance with the heuristics 205. If such a match 119 is found 215, text equivalencing engine 140 ceases searching for equivalents and the database update module 145 performs database update 235, a process by which the set of known character strings are updated to indicate that the string of characters is an equivalent text.

If no match 118 is found 215, the character strings 115 are divided into sub-strings. Sub-strings 125 are formed in the sub-string formation module 120 by selecting frequently occurring groups of characters 225. The formation of sub-strings is further described below in reference to FIG. 4. In one embodiment, the sub-strings 125 are 3-grams. A 3-gram is a sub-string 125 that is three characters in length. In another embodiment, the sub-strings 125 can be of any length and are selected by frequency in the corpus of known strings.

The sub-strings 125 are input into an information retrieval module 130. The information retrieval module can perform any accurate information retrieval technique to retrieve an equivalent text 135 to the character string 105 from a database of known character strings 230. Any information retrieval technique may be implemented. Examples of such information retrieval techniques are discussed in G. Salton & M. McGill, “Introduction to Modern Information Retrieval,” McGraw-Hill, 1983 and P. Willet and K. Sparck Jones (ed), “Readings in Information Retrieval,” Morgan Kaufmann, 1997.

In one embodiment, module 130 employs an information retrieval technique 230 using a score relative to each known string of characters in the database. The score is based on a term-weighting scheme described below in reference to FIG. 5. After the information retrieval module 130 determines an equivalent text or a set of candidate equivalent texts 230, database update module 145 typically updates a database of known equivalent texts to indicate that the string of characters is equivalent 235. The database update module 145 is further described below with the reference to FIG. 3.

The most likely equivalent string of characters is determined by the retrieval score as produced by the information retrieval module 130. The score is evaluated based on three threshold values, t₁, t₂, and t₃. Typically, t₁>t₂>t₃, creating four regions, greater than t₁, between t₁and t₂, between t₂and t₃, and less than t₃. If the highest of the of the scores is greater than a first predetermined threshold value, t₁, the text with that score is accepted as an equivalent to the string of characters without any manual intervention. If the highest of the scores is between the first threshold, t₁, and a second predetermined threshold, t₂, that text is accepted as an equivalent to the string of characters after a user manually confirms the match. If there are one or more scores between the second threshold value, t₂, and a third predetermined threshold value, t₃, each text scoring in that range is presented to the user. One piece of text is accepted as an equivalent to the string of characters only after being selected by the user. If all the scores are below the third threshold, t₃, no equivalent is found. In one embodiment, the threshold values are empirically determined to optimize the amount of manual labor needed to build a table of text equivalences.

Now referring to FIG. 3, there is shown a method of performing database update 235. Database update is performed 235 when an equivalent but not identical text has been found and accepted 305. If an equivalent text is determined, the database update module 145 performs database update 235. In one embodiment, database update involves updating 310 the database that stores the known strings of characters to indicate that the string of characters is an equivalent text. The database update module 140 takes into consideration any user confirmation or selection that is made in accepting the document as equivalent text. Thus, if the same string of characters 105 is input into the text equivalencing engine 140 on a subsequent run, there will be no need to run through the entire text equivalencing engine 140.

Now referring to FIG. 4, there is shown a method of forming variable length sub-strings from a string of characters. Sub-strings are formed from a series of characters in a given string of characters by extending the sub-strings based on the frequency of occurrence of the extended sub-strings. In one embodiment, there are two thresholds that together define whether a sub-string is considered “frequently appearing” in each document. The threshold values are chosen such that the words yield an accurate, fast, and memory-efficient result of identifying a character string. The first threshold, for the purpose of sub-string formation, is a minimum number of appearances of a sub-string in a document. In one embodiment, the second threshold, for the purpose of sub-string formation, is a maximum number of appearances of a sub-string in a document. A sub-string is considered to be “frequently appearing” if its frequency lies between the thresholds for sub-string formation. In an alternate embodiment, only one of the two thresholds for sub-string formation is used.

When sub-strings are formed, initially, each character is assumed to be a sub-string 405. Then, in one embodiment of the present invention, the system looks for extensions of frequently appearing sub-strings 410 formed by adding one character. These are also considered sub-strings. In one embodiment, the system then looks for frequently appearing 3-grams 415 (a three letter word). In one embodiment, the system then looks for frequently appearing 4-grams 415. Finally, the system looks for frequently appearing n-grams 415, where n is any positive integer value greater than 2. In one embodiment, in the event of overlap between a substring and any longer sub-string, the system favors the longer sub-string 425 and presumes that the shorter did not occur. The result is a series of the longest frequently appearing n-grams. In another embodiment, the system favors 3-grams, instead of favoring the longer sub-strings. In another embodiment, frequency of occurrence is determined relative to a separate large corpus of strings.

The following is a hypothetical example of sub-string formation for an arbitrary string of letters from the alphabet. The letters could be any kind of symbol, but for this hypothetical are only alphabetical letters. Normally the entire corpus of known strings would be used to build a dictionary of possible sub-strings, but this example is a short string in order to simplify the description.

- a g e c q r t 1 m s d s p 1 c q r r a g e t 1 p r a g e s 1

Assume for this hypothetical example that the threshold for purposes of sub-string formation is 2. Each character is initially taken to be a sub-string 405. The character pair “ag” appears 3 times, making it a frequently appearing sub-string 410, according to the threshold value for sub-string formation. The 3-gram, “age,” appears 3 times making it a frequently appearing sub-string 415. The 4-gram “rage” also occurs twice. The sub-strings “agec,” “aget,” and “ages” each only appear once. Therefore no more characters should be added and the 4-gram “rage” is the longest n-gram that can be formed to meet the frequently appearing sub-string criteria. The same process could be performed on the second character in the string, “g.” However, since “ge” overlaps with “rage”, the “ge” sub-string would be eliminated when favoring longer sub-strings 420.

The same process could be performed to form the sub-string “cqr” as to form the sub-string “rage.” After the system finished forming sub-string on this string the following sub-strings would be formed: “rage” appearing 2 times, “cqr” appearing 2 times, “t1” appearing 2 times, and a number of 1-grams appearing various numbers of times. Thus the decomposition of the larger string into sub-strings would be:

- a/g/e/cqr/t1/m/s/d/s/p/1/cqr/rage/t1/p/rage/s/1

An information retrieval technique is performed on the sub-strings to find an equivalent text. In one embodiment, the information retrieval technique involves the sub-strings and scoring known pieces of text that are also divided into sub-strings. In one embodiment, the present invention employs an inverse document frequency method for scoring sub-strings in the string of characters and known text.

Referring now to FIG. 5, there is shown a flow diagram of a method of weighting sub-strings and scoring texts according to the present invention. The method illustrated in FIG. 5 is shown in terms of matching sub-strings in a text equivalencing system. One skilled in the art will recognize that the method may be adapted and applied to many domains and techniques.

A total number of texts N is determined 504. The system also determines a text frequency for each sub-string j (the number of times sub-string j occurred, or TF_j=Σ(k_ij>0) where k_ijis the number of times that text i contains sub-string j) 504.

Weights are computed for query sub-strings and for sub-strings of each equivalent text. The weights are computed according to a product of up to three components: l=the number of times a sub-string appears in the string of characters; g=the number of times a sub-string appears in the known text; and n=a normalizing factor based on the entire weight vector.

The first weighting factor, l, is a local weighting factor. It represents the frequency of the sub-string within a string of characters. It may be represented and defined according to the following alternatives as well as others known in the art of information retrieval:

- l_τ=k_ij=Number of times sub-string j occurs in the string i; or
- l_L=log (k_ij+1); or
- l_x=1 (a constant, used if this weighting factor is not to be considered).
- l may be adjusted to account to optimize performance for different kinds of strings.

The second weighting factor, g, represents the frequency of the sub-strings within all the known texts. It may be represented and defined according to the following alternatives as well as others known in the art of information retrieval:

$g_{l} = \log \frac{N + 1}{{TF}_{j} + 1}$

(inverse text frequency, i.e., the log of the total number of texts divided by the text frequency for sub-string j); or

- g_x=1 (a constant, used if this weighting factor is not to be considered).
- g may be adjusted in a similar manner as is l to optimize performance.

The third weighting factor, n, represents a normalizing factor, which serves to reduce the bias that tends to give long texts higher scores than short ones. Using a normalizing factor, a short, specifically relevant text should score at least as well as a longer text with more general relevance. n may be represented and defined according to the following alternatives as well as others known in the art of information retrieval:

$n_{c} = \frac{1}{\sqrt{\sum_{j} {(l_{j})}^{2} {(g_{ij})}^{2}}}; or$

- n_x=1 (a constant, used if this weighting factor is not to be considered).

By employing the above-described combination of three weighting factors in generating weights for sub-strings and scores for texts, the present invention avoids the problems of overstating frequently appearing sub-strings and overstating coincidental co-occurrence. If a sub-string is frequently occurring, the second weighting factor will tend to diminish its overpowering effect. In addition, the effect of coincidental co-occurrence is lessened by the normalization factor.

In one embodiment, the system of the present invention generates scores as follows. Each sub-string j 502 in query string 501 i is weighted 503 using the general formula q_ij=l_ijg_jn_i, where l, g, n are defined above. The weight w_ijfor each sub-string 505 j in each known text i is obtained 506 using the general formula w_ij=l_ijg_jn_i, where l, g, n are defined above. The specific options for defining l, g and n can be chosen based on the desired results. In one embodiment of the present invention, it is desired to preserve diversity, to give rarity a bonus, and to normalize. In that embodiment, the weighting options L, I, and C may be chosen. Weighting option L acts to preserve diversity, option I acts to give variety a bonus, and C acts to normalize the results. Thus, in that embodiment the weighting would be equal to l_Lg_In_C. In other embodiments, different weighting options may be used. In one embodiment, the present invention uses the same weighting for the query sub-strings as it does for scoring the known texts. In an alternative embodiment, different weighting options can be chosen for the query sub-strings and for scoring the known texts.

A score for a text is determined by taking the dot product 507 of the query vector and the document vector

$\sum_{j} q_{kj} w_{ij} .$

In one embodiment of the present invention, the above-described weighting factors are used to create the vector terms in order to improve the results of the scoring process. The score falls into one of four categories 508 and the results are output 509, depending on the value of the score, as described below. If the highest score is greater than a first threshold value, t₁, the text is accepted as an equivalent text. In one embodiment, this range of scores is considered the gold range. If the highest score is between the first threshold value, t₁, and a second threshold value, t₂, that text is accepted as an equivalent text after manual user confirmation. In one embodiment, this range of scores is considered the silver range. If one or more scores falls in a range between the second threshold value, t₂, and a third threshold value, t₃, each of the texts scoring in that range are presenting to the user. The user can select a text to be accepted as equivalent text. In one embodiment, this range of scores is considered the bronze range. If all the texts score below the third threshold value, t₃, all of the texts are ignored and an equivalent text is not determined.

In one embodiment, the threshold values are adjusted empirically to give the desired accuracy. In one embodiment, the desired accuracy is about 100% in the gold range. In one embodiment, the desired accuracy in the silver range is 70-80% the top one text equivalent and 90% in the top five text equivalents. In one embodiment, the bronze range is about 20-40% accurate and 50% in the top five.

Once the query sub-strings have been weighted and the known texts have been scored by computing the dot product of two vectors, the text or texts with the highest score or highest scores are determined. An equivalent text is selected using the threshold values described above.

From the above description, it will be apparent that the invention disclosed herein provides a novel and advantageous system and method for text equivalencing. The foregoing discussion discloses and describes merely exemplary methods and embodiments of the present invention. As will be understood by those familiar with the art, the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. For example, the invention may be applied to other domains and environments, and may be employed in connection with additional applications where text equivalencing is desirable. Accordingly, the disclosure of the present invention is intended to illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.

Claims

1. A computer-implemented method comprising: modifying a query string of characters using a predetermined set of heuristics;performing a character-by-character comparison of the modified query string with at least one known string of characters in a corpus in order to locate an exact match for the modified query string; andresponsive to not finding an exact match, performing the following steps in order to locate an equivalent for the modified query string: forming a plurality of sub-string of characters from the query string, the sub-strings having varying lengths such that at least two of the formed sub-strings differ in length; andusing an information retrieval technique on the sub-strings formed from the query string to identify a known string of characters equivalent to the query string,wherein the information retrieval technique further comprises:weighting the sub-strings;scoring known strings of characters; andretrieving information associated with the known string having the highest score.
2. The method of claim 1, further comprising, responsive to the highest score being greater than a first threshold, automatically accepting the known string having the highest score as an exact match.
3. The method of claim 1, further comprising, responsive to the highest score being less than a second threshold and greater than a first threshold, presenting the known string having the highest score to a user for manual confirmation.
4. The method of claim 1, further comprising, responsive to the highest score being less than a second threshold and greater than a third threshold, presenting the known string having the highest score to a user to select the equivalent string.
5. The method of claim 1, wherein forming a plurality of sub-strings of character comprises successively extending sub-strings based on frequency of occurrence in the modified query string.
6. The method of claim 1, wherein the query string is selected from the group consisting of a song title, a song artist, an album name, a book title, an author's name, a book publisher, a genetic sequence, and a computer program.
7. The method of claim 1, wherein the predetermined set of heuristics comprises removing whitespace from the query string.
8. The method of claim 1, wherein the predetermined set of heuristics comprises removing a portion of the query string.
9. The method of claim 1, wherein the predetermined set of heuristics comprises replacing a symbol in the query string with an alternate representation for the symbol.
10. The method of claim 1 further comprising storing a database entry indicating that the query string is an equivalent of the identified known string.
11. The computer-implemented method of claim 1, wherein the length of a sub-string is determined based on one or more character sequences identified in the modified query string and a corresponding frequency of occurrence for each identified character sequences.
12. The computer-implemented method of claim 1, wherein a weight for a given sub-string is based at least in part on a number of times the sub-string occurs in the query.

CROSS REFERENCE TO RELATED APPLICATIONS

The present application claims priority from U.S. patent application Ser. No. 09/846,823 for “Relationship Discovery Engine,” filed Apr. 30, 2001, the disclosure of which is incorporated herein by reference. The present application also claims priority from provisional U.S. Patent Application Ser. No. 60/201,622, for “Recommendation Engine,” filed May 3, 2000, the disclosure of which is incorporated herein by reference.

US Referenced Citations (198)

Number	Name	Date	Kind
3568156	Thompson	Mar 1971	A
4384329	Rosenbaum et al.	May 1983	A
4833610	Zamora et al.	May 1989	A
5062143	Schmitt	Oct 1991	A
5182708	Ejiri	Jan 1993	A
5241674	Kuorsawa et al.	Aug 1993	A
5303150	Kameda	Apr 1994	A
5303302	Burrows	Apr 1994	A
5371807	Register et al.	Dec 1994	A
5392212	Geist	Feb 1995	A
5404505	Levinson	Apr 1995	A
5418951	Damashek	May 1995	A
5497488	Akizawa et al.	Mar 1996	A
5499046	Schiller	Mar 1996	A
5539635	Larson, Jr.	Jul 1996	A
5548507	Martino et al.	Aug 1996	A
5583763	Atcheson	Dec 1996	A
5592511	Schoen	Jan 1997	A
5608622	Church	Mar 1997	A
5616876	Cluts	Apr 1997	A
5661787	Pocock	Aug 1997	A
5675786	McKed	Oct 1997	A
5678054	Shibata	Oct 1997	A
5706365	Rangarajan et al.	Jan 1998	A
5708709	Rose	Jan 1998	A
5713016	Hill	Jan 1998	A
5721827	Logan et al.	Feb 1998	A
5726909	Krikorian	Mar 1998	A
5740134	Peterson	Apr 1998	A
5751672	Yankowski	May 1998	A
5754938	Herz	May 1998	A
5758257	Herz	May 1998	A
5764235	Hunt et al.	Jun 1998	A
5774357	Hoffberg	Jun 1998	A
5790423	Lan et al.	Aug 1998	A
5790935	Payton	Aug 1998	A
5809246	Goldman et al.	Sep 1998	A
5819160	Foladare et al.	Oct 1998	A
5842010	Jain	Nov 1998	A
5862220	Perlman	Jan 1999	A
5862339	Bonnaure	Jan 1999	A
5864868	Contois	Jan 1999	A
5872921	Zahariev	Feb 1999	A
5881234	Schwab	Mar 1999	A
5883986	Kopec	Mar 1999	A
5884312	Dustan	Mar 1999	A
5898833	Kidder	Apr 1999	A
5913040	Rakavy	Jun 1999	A
5913041	Ramanathan	Jun 1999	A
5926207	Vaughan	Jul 1999	A
5930526	Iverson	Jul 1999	A
5930768	Hooban	Jul 1999	A
5931907	Davies	Aug 1999	A
5941951	Day	Aug 1999	A
5945988	Williams	Aug 1999	A
5950189	Cohen	Sep 1999	A
5956482	Agraharam	Sep 1999	A
5960430	Haimowitz et al.	Sep 1999	A
5969283	Looney	Oct 1999	A
5977964	Williams	Nov 1999	A
5983176	Hoffert	Nov 1999	A
5987525	Roberts	Nov 1999	A
5996015	Day	Nov 1999	A
6000008	Simcoe	Dec 1999	A
6009382	Martino et al.	Dec 1999	A
6012098	Bayeh	Jan 2000	A
6020883	Herz	Feb 2000	A
6021203	Douceur	Feb 2000	A
6026398	Brown et al.	Feb 2000	A
6026439	Chowdhury	Feb 2000	A
6029195	Herz	Feb 2000	A
6031795	Wehmeyer	Feb 2000	A
6031797	Van Ryzin	Feb 2000	A
6035268	Carus et al.	Mar 2000	A
6038527	Renz	Mar 2000	A
6038591	Wolfe	Mar 2000	A
6047251	Pon	Apr 2000	A
6047268	Bartoli	Apr 2000	A
6047320	Tezuka	Apr 2000	A
6047327	Tso	Apr 2000	A
6052717	Reynolds	Apr 2000	A
6061680	Scherf	May 2000	A
6064980	Jacobi	May 2000	A
6065051	Steele	May 2000	A
6065058	Hailpern	May 2000	A
6070185	Anupam	May 2000	A
6085242	Chandra	Jul 2000	A
6097719	Benash	Aug 2000	A
6102406	Miles	Aug 2000	A
6105022	Takahashi et al.	Aug 2000	A
6131082	Hargrave et al.	Oct 2000	A
6134532	Lazarus	Oct 2000	A
6138142	Linsk	Oct 2000	A
6154773	Roberts	Nov 2000	A
6161132	Roberts	Dec 2000	A
6161139	Win	Dec 2000	A
6167369	Schulze	Dec 2000	A
6182142	Win	Jan 2001	B1
6185560	Young et al.	Feb 2001	B1
6192340	Abecassis	Feb 2001	B1
6205126	Moon	Mar 2001	B1
6222980	Asai	Apr 2001	B1
6225546	Kraft	May 2001	B1
6230192	Roberts	May 2001	B1
6230207	Roberts	May 2001	B1
6240459	Roberts	May 2001	B1
6246672	Lumelsky	Jun 2001	B1
6249810	Kiraly	Jun 2001	B1
6252988	Ho	Jun 2001	B1
6263313	Milsted	Jul 2001	B1
6272456	de Campos	Aug 2001	B1
6272495	Hetherington	Aug 2001	B1
6282548	Burner	Aug 2001	B1
6292795	Peters et al.	Sep 2001	B1
6298446	Schreiber	Oct 2001	B1
6314421	Sharnoff et al.	Nov 2001	B1
6317761	Landsman	Nov 2001	B1
6321205	Eder	Nov 2001	B1
6321221	Bieganski	Nov 2001	B1
6330593	Roberts	Dec 2001	B1
6343317	Glorikian	Jan 2002	B1
6353849	Linsk	Mar 2002	B1
6370315	Mizuno	Apr 2002	B1
6370513	Kolawa	Apr 2002	B1
6389467	Eyal	May 2002	B1
6405203	Collart	Jun 2002	B1
6430539	Lazarus et al.	Aug 2002	B1
6434535	Kupka	Aug 2002	B1
6438579	Hosken	Aug 2002	B1
6487598	Valencia	Nov 2002	B1
6490553	Van Thong	Dec 2002	B2
6505160	Levy	Jan 2003	B1
6512763	DeGolia, Jr.	Jan 2003	B1
6513061	Ebata	Jan 2003	B1
6522769	Rhoads	Feb 2003	B1
6526411	Ward	Feb 2003	B1
6532477	Tang	Mar 2003	B1
6535854	Buchner	Mar 2003	B2
6538996	West	Mar 2003	B1
6560403	Tanaka	May 2003	B1
6560704	Dieterman	May 2003	B2
6587127	Leeke	Jul 2003	B1
6611812	Hurtado	Aug 2003	B2
6611813	Bratton	Aug 2003	B1
6614914	Rhoads	Sep 2003	B1
6615208	Behrens	Sep 2003	B1
6655963	Horvitz	Dec 2003	B1
6657117	Weare	Dec 2003	B2
6658151	Lee	Dec 2003	B2
6661787	O'Connell	Dec 2003	B1
6677894	Sheynblat	Jan 2004	B2
6725446	Hahn	Apr 2004	B1
6741980	Langseth	May 2004	B1
6757740	Parekh	Jun 2004	B1
6807632	Carpentier	Oct 2004	B1
6889383	Jarman	May 2005	B1
6925441	Jones, III	Aug 2005	B1
6952523	Tanaka	Oct 2005	B2
20010005823	Fischer	Jun 2001	A1
20010042107	Palm	Nov 2001	A1
20010042109	Bolas	Nov 2001	A1
20010044855	Vermeire	Nov 2001	A1
20010052028	Roberts	Dec 2001	A1
20010055276	Rogers	Dec 2001	A1
20020002039	Qureshey	Jan 2002	A1
20020004839	Wine	Jan 2002	A1
20020007418	Hegde	Jan 2002	A1
20020010621	Bell	Jan 2002	A1
20020010714	Hetherington	Jan 2002	A1
20020010789	Lord	Jan 2002	A1
20020013852	Janik	Jan 2002	A1
20020016839	Smith	Feb 2002	A1
20020035561	Archer et al.	Mar 2002	A1
20020045717	Grenda	Apr 2002	A1
20020056004	Smith	May 2002	A1
20020065857	Michalewicz	May 2002	A1
20020082901	Dunning	Jun 2002	A1
20020095387	Sosa	Jul 2002	A1
20020099696	Prince	Jul 2002	A1
20020099737	Porter	Jul 2002	A1
20020111912	Hunter	Aug 2002	A1
20020129123	Johnson	Sep 2002	A1
20020152204	Ortega et al.	Oct 2002	A1
20020175941	Hand	Nov 2002	A1
20030002608	Glenn	Jan 2003	A1
20030007507	Rajwan et al.	Jan 2003	A1
20030028796	Roberts	Feb 2003	A1
20030046283	Roberts	Mar 2003	A1
20030083871	Foote	May 2003	A1
20030093476	Syed	May 2003	A1
20030133453	Makishima	Jul 2003	A1
20030135513	Quinn	Jul 2003	A1
20030139989	Churquina	Jul 2003	A1
20030165200	Pugel	Sep 2003	A1
20030182139	Harris	Sep 2003	A1
20030190077	Ross et al.	Oct 2003	A1
20030206558	Parkkinen	Nov 2003	A1
20050149759	Vishwanath	Jul 2005	A1

Foreign Referenced Citations (24)

Number	Date	Country
A-5303198	Aug 1998	AU
0 847 156	Oct 1998	EP
0 955 592	Nov 1999	EP
0 955 592	Nov 1999	EP
1 050 833	Nov 2000	EP
1 050 833	Nov 2000	EP
1 236 354	May 2001	EP
1 010 098	Jun 2003	EP
2 306 869	Jul 1997	GB
2001202368	Jul 2001	JP
2001521642	Nov 2001	JP
WO 9707467	Feb 1997	WO
WO 9825269	Jun 1998	WO
WO 9847080	Oct 1998	WO
WO 9927681	Jun 1999	WO
WO 9943111	Aug 1999	WO
WO 0031964	Jun 2000	WO
WO 0046681	Aug 2000	WO
WO 0133379	Oct 2000	WO
WO 0154323	Jan 2001	WO
WO 0135667	May 2001	WO
WO 0173639	Oct 2001	WO
WO 0242862	May 2002	WO
WO 03012695	Feb 2003	WO

Provisional Applications (1)

	Number	Date	Country
	60201622	May 2000	US

Determining a known character string equivalent to a query string

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

US Classifications

Field of Search

US