Claims
- 1. A method of identifying a nucleic acid sequence, the method comprising:
(a) providing one or more input populations of nucleic acid molecules; (b) partitioning the one or more populations of nucleic acid molecules to produce one or more subpopulations of nucleic acid molecules, each subpopulation comprising a common subsequence or biological characteristic or physical characteristic; (c) partitioning said subpopulations to produce one or more partitioned fractions; (d) constructing a library from each said partitioned fraction, wherein said library comprises specific nucleic acid fragments arrayed to retain mapping information of its original location in the partitioned fraction; (e) pooling said specific fragments from said library, wherein the pooling further provides for mapping a fragment's original location; (f) partitioning said pooled fragments to provide one or more selected subsets; (g) identifying a specific fragment from one or more selected subsets; (h) deconvoluting said subsets, thereby mapping the specific fragment to its original location; and (i) optionally sequencing said specific fragment, thereby identifying a nucleic acid sequence.
- 2. The method of claim 1 wherein said one or more input populations of nucleic acid molecules is chosen from the group consisting of messenger RNA, cDNA, and genomic DNA.
- 3. The method of claim 2 wherein said genomic DNA is isolated from a plurality of cell types.
- 4. The method of claim 3 wherein said cell types are mixed tissue types.
- 5. The method of claim 1 wherein said one or more subpopulations of nucleic acid molecules are modified by at least one step selected from:
i) digesting said nucleic acid molecules with at least two restriction endonucleases; ii) ligating an adapter oligonucleotide to one or more ends of a digestion product; and iii) amplifying a ligated product from a PCR-mediated amplification reaction.
- 6. The method of claim 1 further comprising amplifying one or more subpopulations of nucleic acid molecules of step (b).
- 7. The method of claim 1 further comprising amplifying one or more partitioned fractions of nucleic acid molecules of step (c).
- 8. The method of claim 1 wherein said specific nucleic acid fragments are distinguishable from members of the library by physical and biochemical properties.
- 9. The method of claim 8 wherein said physical and biochemical properties are selected from molecular weight, molecular size, terminal nucleotide sequences, exact migratory pattern, ionic charge, or affinity.
- 10. The method of claim 1 wherein said nucleic acid fragments comprise a unique identifier for each input population source.
- 11. The method of claim 1 wherein said specific fragments are identified by size.
- 12. The method of claim 1 wherein said one or more populations of nucleic acids are isolated from pooled or unpooled samples.
- 13. The method according to claim 1 wherein said array is traced on a pooling map.
- 14. The method according to claim 1 further comprising correlating said selected subsets to said pooling map prior to sequencing.
- 15. The method of claim 1, wherein nucleic acids are processed using multiplexing methodology.
- 16. The method of claim 15, wherein each input population comprises a unique signature sequence.
- 17. The method of claim 1, wherein the population of nucleic acid molecules is amplified using a first primer and a second primer.
- 18. The method of claim 1, wherein said library is generated from at least one partitioned fraction selected from the 5′ end of nucleic acid molecules, the internal regions of nucleic acid molecules, and the 3′ end of nucleic acid molecules.
- 19. The method of claim 1, wherein step (c) comprises two or more partitioning means.
- 20. The method of claim 19, wherein said partitioning means is selected from the group consisting of gel electrophoresis; high pressure liquid chromatography; size selection; separation based on physical and/or biochemical properties including molecular weight, molecular size, terminal nucleotide sequences, exact migratory pattern, and the like; elution; gel slicing; nucleotide subsequence probing; restriction digest; ligating an adapter oligonucleotide to one or more ends of the fragment; hybridizing; and amplification.
- 21. The method of claim 1, wherein the population of nucleic acid molecules comprises a normalized population of nucleic acids.
- 22. The method of claim 1 wherein said partitioning step is electrophoretic separation of said one or more subpopulations.
- 23. The method of claim 22 wherein the partitioning step is accomplished by polyacrylamide gel electrophoresis or liquid chromatography.
- 24. The method of claim 22 wherein the partitioning step is accomplished by agarose gel electrophoresis.
- 25. The method of claim 1, wherein the input populations are from normal or diseased tissue, said tissues either untreated or drug treated.
- 26. The method of claim 1, wherein said partitioning step optionally comprises hybridizing a probe nucleic acid sequence to the population of nucleic acids.
- 27. The method of claim 1, wherein the identifying of a nucleic acid comprises sizing a first insert from one or more nucleic acid fragments in the cloned library subset array and comparing its size to the size of a second fragment generated by the same partition method in the same library subset array, and determining the probability that the nucleic acid fragments represent clones of identical sequences.
- 28. The method of claim 1, wherein the subpopulations of nucleic acid molecules differ by fewer than 20, 15, 12, 8, 6 or 4 nucleotides in length.
- 29. The method of claim 1, further comprising ligating adapter oligonucleotides to the tennini of the digested cDNA molecules, thereby producing ligation products.
- 30. The method of claim 1, wherein step (b) comprises generating one or more subpopulations from at least one input population having been probed by at least one restriction enzyme, each subpopulation being produced by recognition of one or more target nucleotide subsequences in said nucleic acid by said restriction enzyme, wherein the output of the restriction digest is a representation of (i) the length between occurrences of target nucleotide subsequences in said nucleic acid, and (ii) the identities of said target nucleotide subsequences in said nucleic acid; said method further comprising dividing said sample of nucleic acids into a plurality of portions and perfonming the steps of claim 1 individually on a plurality of said portions, wherein a different one or more restriction digests are used with each portion.
- 31. The method of claim 30, wherein the restriction enzyme is a Type II or Type IIS restriction endonuclease.
- 32. The method of claim 31, wherein the restriction enzyme recognizes any one or more of a 4, 5, 6, 7 or 8 nucleotide recognition sequence.
- 33. The method of claim 30, wherein a first nucleic acid sequence is identified by comparing the size of one or more digestion products produced by a member of the subpopulation of nucleic acid sequences to the sizes of a second fragment generated by the same restriction enzyme or enzymes in said reference nucleic acid or nucleic acids.
- 34. The method of claim 1, wherein the library is prepared by a process comprising:
i) ligating the one or more subpopulations of nucleic acid molecules to a vector to form a population of ligated vector-insert molecules; ii) transforming the population of vector-insert nucleic acid molecules into a host cell; iii) culturing the host cell under conditions allowing for replication of the vector-insert; iv) recovering the vector-insert from said host; v) characterizing the insert by a biological or physical property; vi) comparing the biological or physical property of the insert to the biological or physical properties of fragments generated by the same method in said reference nucleic acid or nucleic acids; vii) determining the probability that the compared fragments are the same; and viii) optionally sequencing the selected fragments; thereby identifying nucleic acid fragment comprising one or more unique sequences in the library.
- 35. The method of claim 34, wherein at least a portion of the first nucleic acid sequence is determined and compared to the nucleotide sequence of one or more reference nucleic acid sequences.
- 36. The method of claim 34, wherein the determining step comprises hybridizing the first nucleic acid sequence to one or more of the reference nucleic acid sequences.
- 37. The method of claim 21, wherein normalizing the representation of a nucleic acid sequence in a population of nucleic acid sequences comprises the steps of:
providing an input population of CDNA molecules derived from a population of RNA molecules, wherein said CDNA population comprises a first nucleic acid and a second nucleic acid sequence having a nucleic acid sequence distinct from the first nucleic acid sequence, and wherein said first nucleic acid sequence is present at a higher level in said population than said second nucleic acid sequence; partitioning said CDNA population into one or more subpopulations of nucleic acid sequences, wherein said partitioning comprises digesting the CDNA population with one or more restriction enzymes; and lowering the level of said first nucleic acid sequence relative to the level of said second nucleic acid sequence in the subpopulation of nucleic acid sequences, thereby normalizing the representation of nucleic acid sequences in said population of nucleic acid sequences.
- 38. A method for producing a population of nucleic acid molecules enriched for 5′ regions of mRNA molecules for analysis by the method of claim 1, the method comprising:
providing a population of RNA molecules, said population including RNA molecules having a 5′ terminal Gppp cap structure and a 5′ terminal phosphate group; contacting said population of RNA molecules with a phosphatase under conditions that result in removal of the 5′ terminal phosphate group while leaving the 5′ terminal Gppp cap structure intact; inactivating said phosphatase; contacting the population of RNA molecules with a pyrophosphatase under conditions that result in the removal of the 5′ terminal Gppp and the formation of a 5′ phosphate group; annealing an oligonucleotide in the presence of an RNA ligase to form a hybrid molecule; and forming a CDNA from said oligonucleotide.
- 39. A method of identifying an RNA sequence in a sample comprising a plurality of RNA sequences, the method comprising:
synthesizing CDNA copies of a plurality of RNA species to form a CDNA sample; determining the size of one or more of said CDNA molecules in said CDNA sample; comparing the size of said sample with the size of a reference nucleic acid according to the method of claim 1; and thereby identifying the cDNA sequence.
- 40. The method of claim 39, wherein said CDNA molecules are digested with one or more restriction enzymes prior to the determining step.
- 41. The method of claim 40, further comprising ligating adapter oligonucleotides to the termini of the digested cDNA molecules prior to the determining step.
- 42. The method of claim 39, wherein said identifying step comprises comparing the size of one or more digestion products produced by one or more said cDNA molecules to a reference nucleic acid or nucleic acids.
- 43. The method of claim 39, further comprising the steps of:
assembling the plurality of nucleic acid sequences to provide an assembled sequence; and determining whether the assembled sequence is absent in a reference set of one or more reference nucleic acid sequences; whereby if the assembled sequence is absent from the reference the set assembled sequence is a novel nucleic acid sequence.
- 44. The method of claim 1, wherein the partitioning step optionally comprises one or more processes selected from:
a) isolating nucleic acid sequences from different cell types; b) separating the nucleic acid sequences in the subpopulation by physical properties; c) amplification of a specific subpopulation of nucleic acid sequences; d) amplifying 5′ terminal sequences of the nucleic acid sequences; e) amplifying interior sequences of the nucleic acid sequences; and f) amplifying 3′ terminal sequences of the nucleic acid sequences; g) partitioned subtraction screening, h) length selection by lariat formation, i) use of identical primers, j) use of shortened primers, k) use of intermediate annealing temperature, and l) use of modified cycle times.
- 45. The method of claim 37, wherein the normalization step comprises processes selected from the group consisting of partitioned subtraction screening, length selection by lariat formation, use of identical primers, use of shortened primers, use of intermediate annealing temperature, use of modified cycle times, and use of a 5′-capped end.
- 46. The method of claim 1, wherein the input population comprises multiple input sources.
- 47. The method of claim 46, wherein different input sources are selected from the group consisting of: tissue type, cell type, treatment condition, disease state, and organism type.
- 48. The method of claim 39, wherein the input population comprises multiple input sources.
- 49. The method of claim 39, wherein nucleic acids are processed using multiplexing.
- 50. The method of claim 47, wherein nucleic acids are processed using multiplexing methodology.
- 51. A method of identifying a nucleic acid sequence, the method comprising:
(a) providing one or more populations of nucleic acid molecules comprising at least one set of nucleic acid sequences; (b) partitioning the one or more populations of nucleic acid molecules to produce one or more subpopulations of nucleic acid molecules, each subpopulation comprising a common subsequence or biological characteristic or physical characteristic; (c) partitioning said subpopulations to produce one or more partitioned fractions; (d) constructing a library from said partitioned fractions wherein said library comprises specific nucleic acid fragments distributed in an array; (e) pooling said specific fragments from said library to provide pooled fragments that are mapped; (f) sizing said pooled fragments to provide multiplex sized sets; (g) deconvoluting said multiplexed sets; and (h) identifying a nucleic acid sequence from said multiplexed sets.
- 52. A method if identifying sequence, the method comprising:
(a) partioning the one or more populations of nucleic acid molecules to produce one or more subpopulations of nucleic acid molecules, each subpopulation comprising a common subsequence or biological characteristric or physical characteristic; (b) poooing said subpopulations of nucleic acid molecules; (c) fractionating said pooled subpopulations to produce a plurality of fractions; (d) cloning said fractions to provide a library of fractions wherein said library comprises specific nucleic acid fragments distributed in an array; (e) sizing said pooled fragments to provide multiplex sized sets; (g) deconvoluting said multiplexed sets; and (h) sequencing one or more fragments in said multiplex set to provide a nucleic acid sequence.
RELATED APPLICATIONS
[0001] This application is a continuation of U.S. Ser. No. 09/614,505, filed Jul. 11, 2000, which is a continuation in part of U.S. Ser. No. 09/417,386, filed Oct. 13, 1999, which claims priority to U.S. S No. 60/115,109, filed Jan. 8, 1999. All of these applications are incorporated herein by reference in their entirety.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60115109 |
Jan 1999 |
US |
Continuations (1)
|
Number |
Date |
Country |
Parent |
09614505 |
Jul 2000 |
US |
Child |
10426179 |
Apr 2003 |
US |
Continuation in Parts (1)
|
Number |
Date |
Country |
Parent |
09417386 |
Oct 1999 |
US |
Child |
09614505 |
Jul 2000 |
US |