Claims
- 1. A method for facilitating recognition of content of a body of text, the method comprising:
filtering the content a body of text to remove elements of the content; determining a recognition representation of the content of such body based upon the filtered subtext.
- 2. A method as recited in claim 1, wherein the filtering is text-sifting.
- 3. A method as recited in claim 1, wherein the determining comprises calculating the recognition representation as a hash value that identifies the content in the body.
- 4. A method as recited in claim 1, wherein the determining comprises calculating the recognition representation as a hash value that is proximally similar to other bodies of text having similar semantic content.
- 5. A method as recited in claim 1, wherein the filtering comprises removing superfluous elements from the content of the body.
- 6. A computer comprising one or more computer-readable media having computer-executable instructions that, when executed by the computer, perform the method as recited in claim 1.
- 7. One or more computer-readable media having computer-executable instructions that, when executed by a computer, performs the method as recited in claim 1.
- 8. A method for facilitating recognition of content of a body of text, the method comprising:
obtaining a body of text; determining a self-synchronized recognition representation of the content of such body.
- 9. A method as recited in claim 8, wherein the self-synchronized recognition representation is derived from a subset of the content of the body of text.
- 10. A method as recited in claim 8, wherein the self-synchronized recognition representation is derived from a subset of the content of the body of text, the subset excludes superfluous elements of the content of the body of text.
- 11. A method as recited in claim 8, wherein the self-synchronized recognition representation is derived from a subset of the content of the body of text.
- 12. One or more computer-readable media having computer-executable instructions that, when executed by a computer, performs the method as recited in claim 8.
- 13. A computer comprising one or more computer-readable media having computer-executable instructions that, when executed by the computer, perform the method as recited in claim 8.
- 14. A method for facilitating recognition of content of a body of text, the method comprising:
filtering the content of a body of text to select a subset of content of such body; determining a recognition representation of the content of such body based upon the selected subtext.
- 15. A method as recited in claim 14, wherein the filtering is text-sifting.
- 16. A method as recited in claim 14 further comprising storing the recognition representation in a database, the recognition representation being associated with the body of text from which it was determined.
- 17. A method as recited in claim 14, wherein the determining comprises calculating the recognition representation as a hash value that identifies the content in the body.
- 18. A method as recited in claim 14, wherein the determining comprises calculating the recognition representation as a hash value that is proximally similar to other bodies of text having similar semantic content.
- 19. A method as recited in claim 14, wherein the filtering comprises removing elements from the content of the body.
- 20. A method as recited in claim 14, wherein the filtering comprises removing superfluous elements from the content of the body.
- 21. A method as recited in claim 14, wherein the filtering comprises removing elements from the content of the body, wherein at least some of the elements removed are associated with a format of the content of the body.
- 22. A method as recited in claim 19, wherein the removing comprises:
converting white space in the body of text into single spaces; purging all content of the body of text that is not letters or spaces; converting all content of the body of text into one form of capitalization.
- 23. A method as recited in claim 19, wherein the removing comprises:
referencing a list of common words; purging all words from the body of text that are on the list of common words.
- 24. A method as recited in claim 14, wherein the filtering comprises cryptographically extracting the subset of text of such body.
- 25. A method as recited in claim 14, wherein the subset has a fixed size that is independent of size of the subset's body of text.
- 26. A method as recited in claim 14, wherein the subset has a variable size that is dependent upon size of the subset's body of text.
- 27. A method as recited in claim 14, wherein the filtering comprises:
removing superfluous elements from the content of the body to produce filtered text; cryptographically extracting the subset of text of such body from the filtered text.
- 28. A method as recited in claim 14 further comprising comparing recognition representations of text of at least two bodies of text.
- 29. A method as recited in claim 28 further comprising indicating a match if recognition representations of text of at least two bodies of text substantially match.
- 30. A method as recited in claim 14 further comprising:
comparing recognition representation of text of a body of text with recognition representations of text of a group of bodies; grouping the body with the group if all compared recognition representations are proximally similar.
- 31. A computer comprising one or more computer-readable media having computer-executable instructions that, when executed by the computer, perform the method as recited in claim 14.
- 32. One or more computer-readable media having computer-executable instructions that, when executed by a computer, performs the method as recited in claim 14.
- 33. A method for facilitating detection of textual similarity, the method comprising:
comparing recognition representations of text of at least two bodies of text, wherein such recognition representations are computed by:
text sifting text of the bodies of text to select a subset of text for each body; determining such recognition representation of the text for each body based upon the selected subtext of each body; indicating a match if recognition representations of the text of at least two of the bodies substantially match.
- 34. A method as recited in claim 33, wherein the determining comprises calculating the recognition representation as a hash value that identifies the content of the body.
- 35. A method as recited in claim 33, wherein the text sifting comprises cryptographically extracting the subset of text of such body.
- 36. A method as recited in claim 33, wherein the text sifting comprises:
removing superfluous elements from the text of a body to produce filtered text; cryptographically extracting the subset of text of such body from the filtered text.
- 37. A computer comprising one or more computer-readable media having computer-executable instructions that, when executed by the computer, perform the method as recited in claim 33.
- 38. One or more computer-readable media having computer-executable instructions that, when executed by a computer, performs the method as recited in claim 33.
- 39. A method of manipulating content of a source body of text, the method comprising:
obtaining a source body of text; generating content of a target body of text by deriving the content of the target body from the source body; wherein the content of the target body has a self-synchronized recognition representation that does not substantially match a self-synchronized recognition representation of the content of the source body.
- 40. A method as recited in claim 39, wherein the content of the target body has a self-synchronized recognition representation that does not match a self-synchronized recognition representation of the content of the source body.
- 41. A method as recited in claim 39, wherein the self-synchronized recognition representations are determined by producing a hash value of a subset of the content of a body, wherein the subset excludes superfluous elements.
- 42. A text recognition system, comprising:
text retriever for obtaining body of text; text sifter for selecting a subset of text of such body; recognition representation determiner for determining a recognition representation of the text of such body based upon the selected subtext.
- 43. A system as recited in claim 42 further comprising a database for storing the recognition representation in association with the body of text from which it was determined.
- 44. A system as recited in claim 42, wherein the determiner comprises a calculator to calculate the recognition representation as a hash value that identifies the content of the body.
- 45. A system as recited in claim 42, wherein the determiner comprises a calculator to calculate the recognition representation as a hash value that is proximally similar to other bodies of text having similar semantic content.
- 46. A system as recited in claim 42, wherein the text sifter comprises a extractor for cryptographically extracting the subset of text of such body.
- 47. A system as recited in claim 42 further comprising a comparator for comparing recognition representations of text of at least two bodies of text.
- 48. A system as recited in claim 42 further comprising:
a comparator for comparing recognition representations of text of at least two bodies of text; an indicator for indicating a match if recognition representations of text of at least two bodies of text substantially match.
- 49. A system as recited in claim 42 further comprising:
a comparator for comparing recognition representation of text of a body of text with recognition representations of text of a group of bodies; a categorizer for grouping the body with the group if all compared recognition representations are proximally similar.
- 50. One or more computer-readable media having stored thereon a data structure, comprising an library containing bodies of text where at least one body is associated with a recognition representation determined by the system as recited in claim 42.
- 51. One or more computer-readable media having stored thereon a data structure, comprising:
a first data field containing a body of text; a second data field derived from the first field by text sifting the text of such body to select a subset of text of such body and determining a recognition representation of the text of such body based upon the selected subtext; a third data field functioning to delimit the end of the data structure.
- 52. One or more computer-readable media having computer-executable instructions that, when executed by a computer, performs the method comprising:
obtaining a body of text; text sifting the text of such body to select a subset of text of such body; determining a recognition representation of the text of such body based upon the selected subtext.
RELATED APPLICATIONS
[0001] This application is a continuation of and claims priority to U.S. patent application Ser. No. 09/843,255, filed Apr. 24, 2001, the disclosure of which is incorporated by reference herein.
Continuations (1)
|
Number |
Date |
Country |
| Parent |
09843255 |
Apr 2001 |
US |
| Child |
10893769 |
Jul 2004 |
US |