Claims
- 1. A method of annotating biomolecular sequences according to a hierarchy of interest, the method comprising:
(a) computationally constructing a dendrogram having multiple nodes, said dendrogram representing the hierarchy of interest, wherein each node of said multiple nodes of said dendrogram is annotated by at least one keyword; (b) computationally assigning each biomolecular sequence of the biomolecular sequences to a specific node of said multiple nodes of said dendrogram to thereby generate assigned biomolecular sequences; and (c) computationally classifying each of said assigned biomolecular sequences to nodes hierarchically higher than said specific node, thereby annotating biomolecular sequences according to the hierarchy of interest.
- 2. The method of claim 1, wherein the biomolecular sequences are selected from the group consisting of polypeptide sequences and polynucleotide sequences.
- 3. The method of claim 2, wherein said polynucleotides are selected from the group consisting of genomic sequences, expressed sequence tags, contigs, complementary DNA (cDNA) sequences, pre-messenger RNA (mRNA) sequences, and mRNA sequences.
- 4. The method of claim 1, wherein the biomolecular sequences are selected from the group consisting of annotated biomolecular sequences unannotated biomolecular sequences and partially annotated biomolecular sequences.
- 5. The method of claim 1, further comprising homology clustering of the biomolecular sequences prior to step (b).
- 6. The method of claim 1, wherein said dendrogram is selected from the group consisting of a graph, a list, a map and a matrix.
- 7. The method of claim 1, wherein the hierarchy of interest is selected from the group consisting of a tissue expression hierarchy, a developmental expression hierarchy, a pathological expression hierarchy, a cellular expression hierarchy, an intracellular expression hierarchy, a taxonomical hierarchy and a functional hierarchy.
- 8. The method of claim 1, wherein each node of said multiple nodes is a parental node in an additional hierarchy of interest.
- 9. The method of claim 8, further comprising classifying the biomolecular sequences of said parental node according to said additional hierarchy of interest.
- 10. The method of claim 1, wherein each of the biomolecular sequences is a member of a sequence contig.
- 11. The method of claim 1, further comprising the step of confirming annotations of said assigned biomolecular sequence in-vivo and/or in-vitro prior to or following step (c).
- 12. A method of identifying differentially expressed biomolecular sequences, the method comprising:
(a) computationally constructing a dendrogram having multiple nodes, said dendrogram representing the hierarchy of interest, wherein each node of said multiple nodes of said dendrogram is annotated by at least one keyword; (b) computationally assigning each biomolecular sequence of the biomolecular sequences to a specific node of said multiple nodes of said dendrogram to thereby generate assigned biomolecular sequences; (c) computationally classifying each of said assigned biomolecular sequences to nodes hierarchically higher than said specific node, to thereby generate annotated biomolecular sequences; and (d) identifying annotated biomolecular sequences assigned to a portion of said multiple nodes, thereby identifying differentially expressed bimolecular sequences.
- 13. The method of claim 12, wherein the biomolecular sequences are selected from the group consisting of polypeptide sequences and polynucleotide sequences.
- 14. The method of claim 13, wherein said polynucleotides are selected from the group consisting of genomic sequences, expressed sequence tags, contigs, complementary DNA (cDNA) sequences, pre-messenger RNA (mRNA) sequences, and mRNA sequences.
- 15. The method of claim 12, wherein the biomolecular sequences are selected from the group consisting of annotated biomolecular sequences, unannotated biomolecular sequences and partially annotated biomolecular sequences.
- 16. The method of claim 12, further comprising homology clustering of the biomolecular sequences prior to step (b).
- 17. The method of claim 12, wherein said dendrogram is selected from the group consisting of a graph, a list, a map and a matrix.
- 18. The method of claim 12, wherein the hierarchy of interest is selected from the group consisting of a tissue expression hierarchy, a developmental expression hierarchy, a pathological expression hierarchy, a cellular expression hierarchy, an intracellular expression hierarchy, a taxonomical hierarchy and a functional hierarchy.
- 19. The method of claim 12, wherein each node of said multiple nodes is a parental node in an additional hierarchy of interest.
- 20. The method of claim 19, further comprising recursively classifying the biomolecular sequences of said parental node according to said additional hierarchy of interest.
- 21. The method of claim 12, wherein each of the biomolecular sequences is a member of a sequence contig.
- 22. The method of claim 12, further comprising the step of confirming differential expression of the differentially expressed biomolecular sequences in-vivo and/or in-vitro following step (d).
- 23. A computer readable storage medium comprising a database stored in a retrievable manner, said database including files each containing data of a specific node of a dendrogram, said data including biomolecular sequence information and bimolecular sequence annotations, wherein said biomolecular sequence annotations are selected from the group consisting of contig description, tissue specific expression, pathological specific expression, functional features, parameters for ontological annotation assignment, cellular localization, database sequence source and functional alterations.
- 24. The computer readable storage medium of claim 23, wherein said database further includes information pertaining to generation of said data and/or potential uses of said data.
- 25. The computer readable storage medium of claim 23, wherein said database includes the files set forth in enclosed CD-ROMs 1, 2 and/or 3.
- 26. The computer readable storage medium of claim 23, wherein the medium is selected from the group consisting of a magnetic storage medium, an optical storage medium and an optico-magnetic storage medium.
- 27. The computer readable storage medium of claim 23, wherein said database is a relational database.
- 28. The computer readable storage medium of claim 23, wherein said database is a hierarchical database.
- 29. A system for generating a database of annotated biomolecular sequences, the system comprising a processing unit, said processing unit executing a software application configured for:
(a) constructing a dendrogram having multiple nodes, said dendrogram representing a hierarchy of interest, wherein each node of said multiple nodes of said dendrogram is annotated by at least one keyword; (b) assigning each biomolecular sequence of the biomolecular sequences to a specific node of said multiple nodes of said dendrogram to thereby generate assigned biomolecular sequences; (c) classifying each of said assigned biomolecular sequences to nodes hierarchically higher than said specific node, to thereby generate annotated biomolecular sequences; and (d) storing sequence annotations and sequence information of the annotated biomolecular sequences, thereby generating the database of annotated biomolecular sequences.
- 30. The system of claim 29, wherein the biomolecular sequences are selected from the group consisting of polypeptide sequences and polynucleotide sequences.
- 31. The system of claim 30, wherein said polynucleotides are selected from the group consisting of genomic sequences, expressed sequence tags, contigs, complementary DNA (cDNA) sequences, pre-messenger RNA (mRNA) sequences, and mRNA sequences.
- 32. The system of claim 29, wherein the biomolecular sequences are selected from the group consisting of annotated biomolecular sequences, unannotated biomolecular sequences and partially annotated biomolecular sequences.
- 33. The system of claim 29, wherein said software application is further configured for homology clustering of the biomolecular sequences prior to step (b).
- 34. The system of claim 29, wherein said dendrogram is selected from the group consisting of a graph, a list, a map and a matrix.
- 35. The system of claim 29, wherein said hierarchy of interest is selected from the group consisting of a tissue expression hierarchy, a developmental expression hierarchy, a pathological expression hierarchy, a cellular expression hierarchy, an intracellular expression hierarchy, a taxonomical hierarchy and a functional hierarchy.
- 36. The system of claim 29, wherein each node of said multiple nodes is a parental node in an additional hierarchy of interest.
- 37. The system of claim 36, wherein said software application is further configured for classifying the biomolecular sequences of said parental node according to said additional hierarchy of interest.
- 38. The system of claim 29, wherein each of the biomolecular sequences is a member of a sequence contig.
- 39. A method of identifying sequence features unique to differentially expressed mRNA splice variants, the method comprising:
(a) computationally identifying unique sequence features in each splice variant of an alternatively spliced expressed sequences; and (b) identifying differentially expressed splice variants of said alternatively spliced expressed sequences, thereby identifying sequence features unique to differentially expressed mRNA splice van-ants.
- 40. The method of claim 39, wherein step (b) is effected by qualifying annotations associated with said alternatively spliced expressed sequences.
- 41. The method of claim 40, further comprising scoring said annotations associated with said alternatively spliced expressed sequences according to:
(i) prevalence of said alternatively spliced expressed sequences in normal tissues; (ii) prevalence or said alternatively spliced expressed sequences in pathological tissues; (iii) prevalence of said alternatively spliced expressed sequence in total tissues; and (iv) number of tissues and/or tissue types expressing said alternatively spliced expressed sequences;
- 42. The method of claim 39, wherein step (b) is effected by identifying said unique sequence feature.
- 43. The method of claim 39, wherein said unique sequence feature is selected from the group consisting of a donor-acceptor concatenation, an alternative exon, an exon and a retained intron.
- 44. The method of claim 39, wherein said identifying unique sequence features in each splice variant of an alternatively spliced expressed sequence is effected by expressed sequence alignment.
- 45. A computer readable storage medium comprising data stored in a retrievable manner, said data including sequence information of sequence features unique to differentially expressed mRNA splice variants as set forth in files Transcripts_nucleotide seqs_part1”, “Transcripts_nucleotide_seqs_part2” “Transcripts_nucleotide_seqs_part3.new”, and/or “Protein.seqs” of enclosed CD-ROMs 1 and/or 2, and sequence annotations as set forth in annotation categories “#TAA_CD” and/or “#TAA_TIS”, in the file “Summary_table.new” of enclosed CD-ROM3.
- 46. The computer readable storage medium of claim 45, wherein said database further includes information pertaining to generation of said data and potential uses of said data.
- 47. The computer readable storage medium of claim 45, wherein said medium is selected from the group consisting of a magnetic storage medium, an optical storage medium and an optico-magnetic storage medium.
- 48. The computer readable storage medium of claim 45, wherein said database further includes information pertaining to gain and/or loss of function of said differentially expressed mRNA splice variants or polypeptides encoded thereby.
- 49. A system for generating a database of sequence features unique to differentially expressed mRNA splice variants, the system comprising a processing unit, said processing unit executing a software application configured for:
(a) identifying unique sequence features in each splice variant of an alternatively spliced expressed sequences; and (b) identifying differentially expressed splice variants of said alternatively spliced expressed sequences, thereby identifying sequence features unique to differentially expressed mRNA splice variants. (c) storing the sequence features unique to the differentially expressed mRNA splice variants, thereby generating the database of sequence features unique to differentially expressed mRNA splice variants.
- 50. The system of claim 49, wherein step (b) is effected by qualifying annotations associated with said alternatively spliced expressed sequences.
- 51. The system of claim 50, further configured for scoring said annotations associated with said alternatively spliced expressed sequences according to:
(i) prevalence of said alternatively spliced expressed sequences in normal tissues; (ii) prevalence of said alternatively spliced expressed sequences in pathological tissues; (iii) prevalence of said alternatively spliced expressed sequence in total tissues; and (iv) number of tissues and/or tissue types expressing said alternatively spliced expressed sequences.
- 52. The method of claim 49, wherein step (b) is effected by identifying said unique sequence feature.
- 53. The system of claim 49, wherein said unique sequence feature is selected from the group consisting of a donor-acceptor concatenation, an alternative exon, an exon and a retained intron.
- 54. The system of claim 49, wherein said identifying unique sequence features in each splice variant of an alternatively spliced expressed sequence is effected by expressed sequence alignment.
- 55. A kit useful for detecting differentially expressed polynucleotide sequences, the kit comprising at least one oligonucleotide being designed and configured to be specifically hybridizable with a polynucleotide sequence selected from the group consisting of sequence files “Transcripts_nucleotide_seqs_part1”, “Transcripts_nucleotide_seqs_part2”, and/or “Transcripts_nucleotide_seqs_part3.new” of enclosed CD-ROMs 1 and/or 2 under moderate to stringent hybridization conditions.
- 56. The kit of claim 55, wherein said at least one oligonucleotide is labeled.
- 57. The kit of claim 55, wherein said at least one oligonucleotide is attached to a solid substrate.
- 58. The kit of claim 57, wherein said solid substrate is configured as a microarray and whereas said at least one oligonucleotide includes a plurality of oligonucleotides each being capable of hybridizing with a specific polynucleotide sequence of the polynucleotide sequences set forth in the files “Transcripts_nucleotide_seqs_part1”, “Transcripts_nucleotide_seqs_part2” and/or “Transcripts_nucleotide_seqs_part3.new” of enclosed CD-ROMs 1 and/or 2.
- 59. The kit of claim 58, wherein each of said plurality of oligonucleotides is being attached to said microarray in a regio-specific manner.
- 60. The kit of claim 55, wherein said at least one oligonucleotide is designed and configured for DNA hybridization.
- 61. The kit of claim 55, wherein said at least one oligonucleotide is designed and configured for RNA hybridization.
- 62. A method of annotating biomolecular sequences, the method comprising:
(a) computationally clustering the biomolecular sequences according to a progressive homology range, to thereby generate a plurality of clusters each being of a predetermined homology of said homology range; and (b) assigning at least one ontology to each cluster of said plurality of clusters, said at least one ontology being: (i) derived from an annotation preassociated with at least one biomolecular sequence of each cluster, and/or (ii) generated from analysis of said at least one biomolecular sequence of each cluster thereby annotating biomolecular sequences.
- 63. The method of claim 62, wherein the biomolecular sequences are selected from the group consisting of polynucleotide sequences and polypeptide sequences.
- 64. The method of claim 62, wherein said homology range is between 99%-35%.
- 65. The method of claim 62, wherein said analysis of said at least one biomolecular sequence includes literature text mining.
- 66. The method of claim 62, wherein said analysis of said at least one biomolecular sequence includes cellular localization prediction.
- 67. The method of claim 62, wherein said analysis of said at least one biomolecular sequence includes homology analysis.
- 68. The method of claim 62, wherein said at least one ontology is selected from the group consisting of molecular biology, microbiology, developmental biology, immunology, virology, biochemistry, physiology, pharmacology, medicine, bioinformatics, cell biology, endocrinology, structural biology, mathematics, chemistry, medicine, plant sciences, neurology, genetics, biology, ecology, genomics, cheminformatics, computer sciences, statistics, physics and artificial intelligence.
- 69. The method of claim 62, wherein said ontology includes a subontology.
- 70. The method of claim 62, further comprising scoring said at least one ontology assigned to a cluster of said plurality of clusters according to:
(i) a degree of homology characterizing said cluster; and (ii) relevance of annotation to information obtained from literature text mining.
- 71. The method of claim 62, further comprising generating a sequence profile to each cluster of said plurality of clusters following step (b).
- 72. A system for generating a database of annotated biomolecular sequences, the system comprising a processing unit, said processing unit executing a software application configured for:
(a) clustering the biomolecular sequences according to a progressive homology range, to thereby generate a plurality of clusters each being of a predetermined homology of said homology range; and (b) assigning at least one ontology to each cluster of said plurality of clusters, said at least one ontology being:
(i) derived from an annotation preassociated with at least one biomolecular sequence of each cluster, and/or (ii) generated from analysis of said at least one biomolecular sequence of each cluster, to thereby annotate the biomolecular sequences; and (c) storing sequence annotations and sequence information of the annotated biomolecular sequences, thereby generating said database of annotated biomolecular sequences.
- 73. The system of claim 72, wherein the biomolecular sequences are selected from the group consisting of polynucleotide sequences and polypeptide sequences.
- 74. The system of claim 72, wherein said homology range is between 99%-35%.
- 75. The system of claim 72, wherein said analysis of said at least one biomolecular sequence includes literature text mining.
- 76. The system of claim 72, wherein said analysis of said at least one biomolecular sequence includes cellular localization prediction.
- 77. The system of claim 72, wherein said analysis of said at least one biomolecular sequence includes homology analysis.
- 78. The system of claim 72, wherein said at least one ontology is selected from the group consisting of molecular biology, microbiology, developmental biology, immunology, virology, biochemistry, physiology, pharmacology, medicine, bioinformatics, cell biology, endocrinology, structural biology, mathematics, chemistry, medicine, plant sciences, neurology, genetics, zoology, ecology, genomics, cheminformatics, computer sciences, statistics, physics and artificial intelligence.
- 79. The system of claim 72, wherein said ontology includes a subontology.
- 80. The system of claim 72 further comprising scoring said at least one ontology assigned to a cluster of said plurality of clusters according to;
(i) a degree of homology characterizing said cluster; and (ii) relevance of annotation to information obtained from literature text mining.
- 81. The system of claim 72, further comprising generating a sequence profile to each cluster of said plurality of clusters following step (b).
- 82. A computer readable storage medium comprising a database stored in a retrievable manner said database including sequence information as set forth in files “Transcripts_nucleotide_seqs_part1”, “Transcripts_nucleotide_seqs_part2”, “Transcripts_nucleotide_seqs_part3.new” and/or “Protein.seqs” of enclosed CD-ROMs 1 and/or 2, and sequence ontological annotations in #GO_P, #GO_F and/or f#GO_C annotation categories in file “Summary_table.new” of enclosed CD-ROM3.
- 83. The computer readable storage medium of claim 82, wherein said database further includes information pertaining to generation of said data and potential uses of said data.
- 84. The computer readable storage medium of claim 82, wherein the medium is selected from the group consisting of a magnetic storage medium, an optical storage medium and an optico-magnetic storage medium.
- 85. A computer readable storage medium, comprising a database stored in a retrievable manner, said database including biomolecular sequence information as set forth in files “Transcripts_nucleotide_seqs_part1”, “Transcripts_nucleotide_seqs_part2”, “Transcripts_nucleotide_seqs_part3new” and/or “Protein.seqs” of enclosed CD-ROMs 1 and/or 2, and biomolecular sequence annotations as set forth in file “Summary_table.new” of enclosed CD-ROM 3.
- 86. The computer readable storage medium of claim 85, wherein said database further includes information pertaining to generation of said data and potential uses of said data.
- 87. The computer readable storage medium of claim 85, wherein the medium is selected from the group consisting of a magnetic storage medium, an optical storage medium and an optico-magnetic storage medium.
- 88. The computer readable storage medium of claim 85, wherein said sequence annotations are selected from the group consisting of contig description, position of unique sequence features, tissue specific expression, pathological specific expression, functional features, parameters for ontological annotation assignment, cellular localization, database sequence source and functional alterations.
- 89. A method of diagnosing colon cancer in a subject, the method comprising identifying in the subject the presence or absence of a biomolecular sequence selected from the group consisting of SEQ ID NOs: 4, 39, 24-28, 35-38, 12 and 29-31 wherein presence of said biomolecular sequence indicates colon cancer in the subject.
- 90. A method of diagnosing lung cancer in a subject, the method comprising identifying in the subject the presence or absence of a biomolecular sequence selected from the group consisting of SEQ ID NOs: 15, 18, 21 and 32 wherein presence of said biomolecular sequence indicates lung cancer in the subject.
- 91. A method of diagnosing Ewing sarcoma in a subject, the method comprising identifying in the subject the presence or absence of a biomolecular sequence as set forth in SEQ ID NO: 7, wherein presence of said biomolecular sequence indicates Ewing sarcoma in the subject.
- 92. A computer readable storage medium comprising data stored in a retrievable manner, said data including sequence information of differentially expressed biomolecular sequences as set forth in files “Transcripts_nucleotide seqs_part1”, “Transcripts_nucleotide_seqs_part2” “Transcripts_nucleotide_seqs_part3.new”, and/or “Protein.seq” of enclosed CD-ROMs 1 and/or 2, and sequence annotations as set forth in annotation categories “SA” and/or “RA”, in the file “Summary_table.new” of enclosed CD-ROM3.
- 93. The computer readable storage medium of claim 92, wherein said database further includes information pertaining to generation of said data and potential uses of said data.
- 94. The computer readable storage medium of claim 92, wherein said medium is selected from the group consisting of a magnetic storage medium, an optical storage medium and an optico-magnetic storage medium.
- 95. The computer readable storage medium of claim 92, wherein said database further includes information pertaining to gain and/or loss of function of said differentially expressed mRNA splice variants or polypeptides encoded thereby.
- 96. A computer readable storage medium comprising data stored in a retrievable manner, said data including sequence information of biomolecular sequences exhibiting gain of function or loss of function as set forth in files “Transcripts_nucleotide_seqs_part1”, “Transcripts_nucleotide_seqs_part2” “Transcripts_nucleotide_seqs_part3.new”, and/or “Protein.seqs” of enclosed CD-ROMs 1 and/or 2, and sequence annotations as set forth in annotation category “DN”, ill the file “Summary-table.new” of enclosed CD-ROM3.
- 97. The computer readable storage medium of claim 96, wherein said database farther includes information pertaining to generation of said data and potential uses of said data.
- 98. The computer readable storage medium of claim 96, wherein said medium is selected from the group consisting of a magnetic storage medium, an optical storage medium and an optico-magnetic storage medium.
Parent Case Info
[0001] This application claims the benefit of priority from U.S. Provisional Patent Application No. 60/322,285, filed Sep. 14, 2001; 60/322,359, filed Sep. 14, 2001; 60/322,506, filed Sep. 14, 2001; 60/324,524, filed Sep. 26, 2001; 60/354,242, filed Feb. 6, 2002; 60/371.494, filed Apr. 11, 2002; 60/384, 096, filed May 31, 2002; 60/397,784, filed Jul. 24, 2002. This application is filed with a request for non-publication
Provisional Applications (8)
|
Number |
Date |
Country |
|
60322285 |
Sep 2001 |
US |
|
60322359 |
Sep 2001 |
US |
|
60322506 |
Sep 2001 |
US |
|
60324524 |
Sep 2001 |
US |
|
60354242 |
Feb 2002 |
US |
|
60371494 |
Apr 2002 |
US |
|
60384096 |
May 2002 |
US |
|
60397784 |
Jul 2002 |
US |