Claims
- 1. A method for analyzing a plurality of transcript sequences in a cluster comprising:
aligning the transcript sequences with genomic sequences; and determining whether the clusters need to be modified according to the aligning.
- 2. The method of claim 1 wherein the step of determining comprises classifying a cluster as a chimeric cluster if the cluster is aligned to two separate locations in the genomic sequence.
- 3. The method of claim 2 wherein the chimeric cluster has at least 5% of its sequences aligned to each of the two separate locations.
- 4. The method of claim 3 wherein the chimeric cluster has at least 10% of its sequences aligned to each of the two separate locations.
- 5. The method of claim 4 wherein the chimeric cluster has at least 20% of its sequences aligned to each of the two separate locations.
- 6. The method of claim 5 wherein the chimeric cluster has at least 30% of its sequences aligned to each of the two separate locations.
- 7. The method of claims 4 or 5 further comprising subclustering the chimeric clusters; realigning subclusters to the genomic sequence; and analyzing the realigning to determine chimeric clusters.
- 8. The method of claim 7 wherein the process is repeated until no chimeric cluster is detected.
- 9. The method of claim 1 wherein the step of determining comprises detecting clusters with consensus which overlap in genomic space.
- 10. The method of claim 9 further comprising merging the clusters with consensus which overlap in genomic space.
- 11. The method of claim 1 wherein the step of determining comprises detecting clusters with consensus within 1000 bases and on the same strand.
- 12. The method of claim 11 further comprising merging the clusters with consensus within 1000 bases and on the same strand.
- 13. A method for triming a transcript sequence comprising: aligning the transcript sequence to its corresponding genomic sequence; removing a side sequence of the transcript sequence if the side is poorly aligned with the genomic sequence.
- 14. The method of claim 13 wherein the transcript sequence aligns with the genomic sequence with at least 80% identity.
- 15. The method of claim 14 wherein the transcript sequence aligns with the genomic sequence with at least 90% identity.
- 16. A method of designing a nucleic acid probe array comprising:
aligning a plurality of transcript sequences in a cluster to their corresponding genomic sequence; modifying the clusters according to their aligning to the genomic sequence to obtain at least one modified cluster; and selecting probes targeting the at least one modified cluster.
- 17. The method of claim 16 wherein the step of modifying comprises subclustering chimeric clusters.
- 18. The method of claim 17 wherein a cluster is classified as a chimeric cluster if the cluster is aligned to two separate locations in the genomic sequence.
- 19. The method of claim 18 wherein the chimeric cluster has at least 5% of its sequences aligned to each of the two separate locations.
- 20. The method of claim 19 wherein the chimeric cluster has at least 10% of its sequences aligned to each of the two separate locations.
- 21. The method of claim 20 wherein the chimeric cluster has at least 20% of its sequences aligned to each of the two separate locations.
- 22. The method of claim 21 wherein the chimeric cluster has at least 30% of its sequences aligned to each of the two separate locations.
- 23. The method of claim 16 wherein the step of modifying comprises merging the clusters with consensus which overlap in genomic space.
- 24. The method of claims 16 further comprising merging the clusters with consensus within 1000 bases and on the same strand.
- 25. A method of designing a nucleic acid probe array comprising:
aligning a transcript sequence to its corresponding genomic sequence; triming a side of the transcript sequence to obtain a trimmed transcript sequence if the side of the transcript sequence is poorly align with the genomic sequence; and selecting probes targeting the trimmed transcript sequence or clusters including the trimmed transcript sequence.
- 26. A computer readable medium comprising computer-executable instructions for performing the method comprising:
aligning transcript sequences from a cluster with genomic sequences; and determining whether the clusters need to be modified according to the aligning.
- 27. The computer readable medium of claim 26 wherein the step of determining comprises classifying a cluster as a chimeric cluster if the cluster is aligned to two separate locations in the genomic sequence.
- 28. The computer readable medium of claim 27 wherein the chimeric cluster has at least 5% of its sequences aligned to each of the two separate locations.
- 29. The computer readable medium of claim 28 wherein the chimeric cluster has at least 10% of its sequences aligned to each of the two separate locations.
- 30. The computer readable medium of claim 29 wherein the chimeric cluster has at least 20% of its sequences aligned to each of the two separate locations.
- 31. The computer readable medium of claim 30 wherein the chimeric cluster has at least 30% of its sequences aligned to each of the two separate locations.
- 32. The computer readable medium of claims 29, 30 or 31 further comprising
subclustering the chimeric clusters; realigning subclusters to the genomic sequence; and analyzing the re-aligning to determine chimeric clusters.
- 33. The computer readable medium of claim 32 wherein the process is repeated until no chimeric cluster is detected.
- 34. The computer readable medium of claim 33 wherein the step of determining comprises detecting clusters with a consensus that overlaps in the genomic space.
- 35. The computer readable medium of claim 34 further comprising merging the clusters with consensus which overlap in genomic space.
- 36. The computer readable medium of claim 25 wherein the step of determining comprises detecting clusters with consensus within 1000 bases and on the same strand.
- 37. The computer readable medium of claim 36 further comprising merging the clusters with consensus within 1000 bases and on the same strand.
- 38. A computer readable medium comprising computer-executable instructions for performing the method comprising: aligning a transcript sequence to its corresponding genomic sequence; removing a side sequence of the transcript sequence if the side is poorly aligned with the genomic sequence.
- 39. The computer readable medium of claim 38 wherein the transcript sequence aligns with the genomic sequence with at least 80% identity.
- 40. The computer readable medium of claim 39 wherein the transcript sequence aligns with the genomic sequence with at least 90% identity.
- 41. A computer readable medium comprising computer-executable instructions for performing the method comprising:
aligning a plurality of transcript sequences in a cluster to their corresponding genomic sequence; modifying the cluster according to their aligning to the genomic sequence to obtain at least one modified cluster; and selecting probes targeting the at least one modified cluster.
- 42. The computer readable medium of claim 42 wherein the step of modifying comprises subclustering a chimeric cluster.
- 43. The computer readable medium of claim 42 wherein the a cluster is classified as a chimeric cluster if the cluster is aligned to two separate locations in the genomic sequence.
- 44. The computer readable medium of claim 43 wherein the chimeric cluster has at least 5% of its sequences aligned to each of the two separate locations.
- 45. The computer readable medium of claim 44 wherein the chimeric cluster has at least 10% of its sequences aligned to each of the two separate locations.
- 46. The computer readable medium of claim 45 wherein the chimeric cluster has at least 20% of its sequences aligned to each of the two separate locations.
- 47. The computer readable medium of claim 46 wherein the chimeric cluster has at least 30% of its sequences aligned to each of the two separate locations.
- 48. The computer readable medium of claim 47 wherein the step of modifying comprises merging the clusters with consensus which overlap in genomic space.
- 49. The computer readable medium of claims 48 further comprising merging the clusters with consensus within 1000 bases and on the same strand.
- 50. A computer readable medium comprising computer-executable instructions for performing the method of
aligning a transcript sequence to its corresponding genomic sequence; triming a side of the transcript sequence to obtain a trimmed transcript sequence if the side of the transcript sequence is poorly align with the genomic sequence; and selecting probes targeting the trimmed transcript sequence or clusters including the trimmed transcript sequence.
RELATED APPLICATIONS
[0001] This application is related to U.S. patent application Ser. No. 09/721,042, filed on Nov. 21, 2000, entitled “Methods and Computer Software Products for Predicting Nucleic Acid Hybridization Affinity”; U.S. patent application Ser. No. 09/718,295, filed on Nov. 21, 2000, entitled “Methods and Computer Software Products for Selecting Nucleic Acid Probes”; U.S. patent application Ser. No. 09/745,965, filed on Dec. 21, 2000, entitled “Methods For Selecting Nucleic Acid Probes”; U.S. patent application Ser. No. ______, attorney Docket No. 3439, filed on Dec. 21, 2001, and U.S. patent application Ser. No. ______, attorney docket number 3440, filed on Dec. 21, 2001. All the cited applications are incorporated herein by reference in their entireties for all purposes.