Claims
- 1. A method for comparing at least two sequences of letters from a finite alphabet by using a set of at least two sampling templates comprising the steps of:
(i) sampling at least a first sequence horizontally by at least a first sampling template from said set of sampling templates; (ii) sampling at least a second sequence vertically by said first sampling template; (iii) creating a first set of bins for said first sampling template; (iv) assigning said first sequence to at least one bin from said first set of bins; (v) assigning said second sequence to at least one bin from said first set of bins; and (vi) detecting presence of at least said first sequence and said second sequence in the same bin from said first set of bins; whereby said presence of said first sequence and said second sequence in the same bin indicates atomic similarity between said first and said second sequences.
- 2. The method of claim 1, further comprising the step of
(vii) collating atomic similarities, whereby the collating of at least two distinct atomic similarities between said first sequence and said second sequence indicates similarity between said first and said second sequences.
- 3. The method of claim 1, wherein said first and said second sequences are nucleic acid sequences.
- 4. The method of claim 1, wherein said first and said second sequences are amino acid sequences.
- 5. The method of claim 1, wherein said first and said second sequences are texts in English language.
- 6. The method of claim 1, wherein said two sequences are contained within at least a first supersequence.
- 7. The method of claim 6, wherein said two sequences are at least 500 letters long and overlap by at least 250 basepairs.
- 8. The method of claim 1, wherein said two sequences are not contained within a common supersequence.
- 9. The method of claim 1, wherein at least one of the sampling templates comprises a set of contiguous diagonally spaced cells.
- 10. The method of claim 1, wherein at least one of said sampling templates comprises a set of noncontiguous diagonally spaced cells.
- 11. The method of claim 1, wherein at least one of said sampling templates comprises a set of at least five noncontiguous diagonally spaced cells.
- 12. The method of claim 1, wherein each bin from said first set of bins corresponds to a discrete countable value.
- 13. The method of claim 12, wherein said discrete countable value is a subsequence.
- 14. The method of claim 13, wherein said sampling of at least a first sequence horizontally by at least the first sampling template produces the same subsequence as said sampling of at least the second sequence vertically by said first sampling template, and wherein said first and said second sequences are assigned to a bin that corresponds to said subsequence.
- 15. The method of claim 2, wherein atomic similarities corresponding to at least two sampling templates along one diagonal are collated.
- 16. The method of claim 2, wherein atomic similarities corresponding to at least two sampling templates along a set of at least three neighboring diagonals are collated.
- 17. The method of claim 1, wherein said first set of bins for said first sampling template resides in random access memory of a first computer, and sets of bins for a second sampling template used for the same comparing reside in random access memory of a second computer.
- 18. The method of claim 1, wherein said assigning of said first sequence to at least one bin from said first set of bins is performed by a first CPU, and said assigning of said second sequence to at least one bin from said first set of bins if performed by a second CPU.
- 19. The method of claim 18, wherein said first CPU and said second CPU are components of a multi-processor computer.
- 20. The method of claim 18, wherein said first CPU and said second CPU are components of a computer cluster.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This non-provisional application claims benefit of priority to copending U.S. Provisional Patent Application Serial No. 60/359,833 filed Feb. 27, 2002, the entire contents of which are incorporated herein by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60359833 |
Feb 2002 |
US |