Claims
- 1. A method comprising the steps of:
creating a document stack from at least one word in a handwritten document; creating a query stack from a query; and determining a measure between the document stack and the query stack.
- 2. The method of claim 1, wherein:
the at least one word comprises a plurality of words; the document stack corresponds to one of the plurality of words in the handwritten document; the query comprises a plurality of query words and at least one operator; the query stack corresponds to one of the plurality of query words; and the step of determining a measure farther comprises the step of, for each query stack, determining a measure between the query stack and each document stack in the handwritten document.
- 3. The method of claim 2, wherein each document stack comprises a plurality of document scores, and wherein the method further comprises the step of optimizing each of the document scores for the document stacks.
- 4. The method of claim 1, wherein the measure quantifies an amount of similarity between the document stack and the query stack.
- 5. The method of claim 1, wherein the query is handwritten, typewritten, or partially handwritten and partially typewritten.
- 6. The method of claim 5, wherein the query is typewritten, and wherein the step of creating a query stack comprises creating a query stack for each query word of the query, wherein each query stack comprises a corresponding word from the query and an associated high word score for this word, and wherein each query stack comprises a plurality of other words having zero word scores associated therewith.
- 7. The method of claim 5, wherein the query is typewritten, and wherein the step of creating a query stack comprises creating a query stack for each query word of the query, wherein each query stack comprises a corresponding word from the query and an associated high word score for this word, and wherein each query stack comprises at least one other word having a small word score associated therewith.
- 8. The method of claim 1, wherein the measure is selected from the group consisting of a dot product measure, an Okapi measure, a score-based keyword measure, a rank-based keyword measure, a measure using n-grams, and a measure using edit distances.
- 9. The method of claim 1, where each query stack and document stack comprises a plurality of scores, wherein the measure is a dot product measure defined as follows
- 10. The method of claim 1, wherein each stack is not constrained to words in a vocabulary, wherein each of the words in a query stack or document stack are comprised of a number of n-grams, wherein probabilities are determined for each n-gram of the query stack and document stack, and wherein the probabilities of the n-grams are used in the measure.
- 11. The method of claim 1, wherein each of the query and document stacks comprises a plurality of words, wherein the measure uses edit distances to compare words in the query stack to words in the document stack.
- 12. The method of claim 1, further comprising the step of determining a document score for the handwritten document by using the measure.
- 13. A method comprising the steps of:
for each of a plurality of documents, performing the following steps:
creating a document stack from at least one word in a text document; creating a query stack from a query; determining a measure between the document stack and the query stack; and scoring the documents based on the measure, thereby creating a document score; and displaying each document whose document score meets a predetermined threshold.
- 14. The method of claim 13, wherein the query is a handwritten query.
- 15. The method of claim 13, wherein the query is a typewritten query.
- 16. A method for retrieving a subset of handwritten documents from a set of handwritten documents, each of the handwritten documents having a plurality of document stacks associated therewith, the method comprising the steps of:
a) creating at least one query stack from a query comprising one or more words, wherein each word is handwritten or typed; b) selecting a handwritten document from the set of handwritten documents; c) selecting a document stack from the selected handwritten document; d) determining a measure between the at least one query stack and the selected document stack; e) performing steps (c) and (d) for at least one document stack associated with the selected handwritten document; f) performing steps (b), (c), and (d) for each handwritten document of the set of handwritten documents; g) scoring each of the handwritten documents in the set of handwritten documents by using the query and the measures, thereby creating a number of document scores; and h) selecting the subset of handwritten documents for display by using the document scores.
- 17. The method of claim 16, wherein step (h) further comprises the step of selecting handwritten documents that are above a predetermined threshold.
- 18. The method of claim 17, wherein the predetermined threshold is selected from the group consisting of a rank threshold and a score threshold.
- 19. The method of claim 16, wherein each document stack comprises a plurality of word scores, and wherein the method further comprises the step of:
i) optimizing each of the word scores for the document stacks.
- 20. The method of claim 16, wherein the measure quantifies similarity between the document stack and the query stack.
- 21. The method of claim 16, wherein at least one of the words of the query is typewritten, and wherein step (a) further comprises the step of creating a query stack for each of the at least one words of the query, wherein each query stack comprises a corresponding word from the query and an associated high word score for this word, and wherein each query stack comprises a plurality of other words having zero word scores associated therewith.
- 22. The method of claim 16, wherein at least one of the words of the query is typewritten, and wherein step (a) further comprises the step of creating a query stack for each of the at least one words of the query, wherein each query stack comprises a corresponding word from the query and an associated high word score for this word, and wherein each query stack comprises at least one other word having a small word score associated therewith.
- 23. The method of claim 16, wherein the measure is selected from the group consisting of a dot product measure, an Okapi measure, a score-based keyword measure, a rank-based keyword measure, a measure using n-grams, and a measure using edit distances.
- 24. The method of claim 16, wherein each stack is not constrained to words in a vocabulary, wherein each of the words in a query stack or document stack are comprised of a number of n-grams, wherein probabilities are determined for each n-gram of the query stack and document stack, and wherein the probabilities of the n-grams are used in the measure.
- 25. The method of claim 16, wherein each of the query and document stacks comprises a plurality of words, wherein the measure uses edit distances to compare words in the query stack to words in the document stack.
- 26. A method comprising the steps of:
creating a first word stack, by using a first handwriting recognizer, from at least one word; creating a second word stack, by using a second handwriting recognizer, from the at least one word; and comparing the first and second word stacks with a third word stack to determine whether a handwritten document should be retrieved.
- 27. The method of claim 26, wherein:
the at least one word is at least one handwritten word from the handwritten document; the first word stack comprises a first document stack; the second word stack comprises a second document stack; and the third word stack is a query stack determined from at least one query word.
- 28. The method of claim 26, wherein:
the at least one word is at least one word from a query; the first word stack comprises a first query stack; the second word stack comprises a second query stack; and the third word stack is a document stack determined from at least one handwritten word in the handwritten document.
- 29. The method of claim 26, further comprising the steps of:
configuring a handwriting recognizer into a first configuration to create the first handwriting recognizer; and configuring the handwriting recognizer into a second configuration to create the second handwriting recognizer, wherein the first and second configuration are different.
- 30. The method of claim 29, wherein the first configuration comprises a configuration caused by selecting a constraint from the group consisting essentially of an uppercase letter constraint, a lowercase letter constraint, a recognize digits constraint, a language constraint, a constraint wherein characters and words are recognized only if in a vocabulary, and a constraint wherein characters and words are hypothesized when not in a vocabulary, and wherein the second configuration comprises a configuration caused by selecting a constraint from the group consisting essentially of an uppercase letter constraint, a lowercase letter constraint, a recognize digits constraint, a language constraint, a constraint wherein characters and words are recognized only if in a vocabulary, and a constraint wherein characters and words are hypothesized when not in a vocabulary.
- 31. The method of claim 26, wherein the step of comparing further comprises the step of merging the first and second word stacks to create a fourth word stack that is compared with the third word stack.
- 32. The method of claim 26, wherein the first handwriting recognizer has a first configuration, wherein the second handwriting recognizer has a second configuration, and wherein the first and second configurations are different.
- 33. The method of claim 32, wherein the first configuration comprises a configuration caused by selecting a constraint from the group consisting essentially of an uppercase letter constraint, a lowercase letter constraint, a recognize digits constraint, a language constraint, a constraint wherein characters and words are recognized only if in a vocabulary, and a constraint wherein characters and words are hypothesized when not in a vocabulary, and wherein the second configuration comprises a configuration caused by selecting a constraint from the group consisting essentially of an uppercase letter constraint, a lowercase letter constraint, a recognize digits constraint, a language constraint, a constraint wherein characters and words are recognized only if in a vocabulary, and a constraint wherein characters and words are hypothesized when not in a vocabulary.
- 34. A computer system comprising:
a memory that stores computer-readable code; and a processor operatively coupled to the memory, the processor configured to implement the computer-readable code, the computer-readable code configured to: create a document stack from at least one word in a handwritten document; create a query stack from a query; and determine a measure between the document stack and the query stack.
- 35. A computer system comprising:
a memory that stores computer-readable code; and a processor operatively coupled to the memory, the processor configured to implement the computer-readable code, the computer-readable code configured to: create a first word stack, by using a first handwriting recognizer, from at least one word; create a second word stack, by using a second handwriting recognizer, from the at least one word; and compare the first and second word stacks with a third word stack to determine whether a handwritten document should be retrieved.
- 36. An article of manufacture comprising:
a computer readable medium having computer-readable code means embodied thereon, the computer-readable program code means comprising: a step to create a document stack from at least one word in a handwritten document; a step to create a query stack from a query; and a step to determine a measure between the document stack and the query stack.
- 37. An article of manufacture comprising:
a computer readable medium having computer-readable code means embodied thereon, the computer-readable program code means comprising: a step to create a first word stack, by using a first handwriting recognizer, from at least one word; a step to create a second word stack, by using a second handwriting recognizer, from the at least one word; and a step to compare the first and second word stacks with a third word stack to determine whether a handwritten document should be retrieved
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application is related to United States Provisional Patent Application entitled “Adaptive Recognition Improvement Using Modified N-Best Lists: The Use of Handwritten Word Recognition Characteristics to Improve Handwritten Word Recognition Accuracy,” filed Feb. 22, 2001, by inventors Kwok and Perrone, Serial No. 60/271,012, and incorporated by reference herein.
[0002] This application claims the benefit of United States Provisional Application Number 60/327,604, filed Oct. 4, 2001.
Provisional Applications (2)
|
Number |
Date |
Country |
|
60271012 |
Feb 2001 |
US |
|
60327604 |
Oct 2001 |
US |