Methods of enriching for and identifying polymorphisms

FIELD OF THE INVENTION

The present invention relates in general to nucleic acid sequence analysis, and in particular to methods which facilitate the identification of sequence polymorphisms.

BACKGROUND OF THE INVENTION

Genomic amplification strategies using the polymerase chain reaction (PCR; Mullis & Faloona, 1987, Meth. Enzymol. 155:335) are employed to facilitate the identification of polymorphic sequences. PCR is used to amplify regions of genomic DNA that carry potential polymorphisms. One method hybridizes the PCR products to allele-specific hybridization probes (Saiki et al., 1986, Nature 324:163). Other methods utilize oligonucleotide primers that either match or mismatch the targeted polymorphism (Newton et al., 1989, Nucleic Acids Res. 17:2503).

With methods that hybridize the PCR product to an allele-specific probe, PCR is used to reduce the complexity of the DNA sample being assayed for the polymorphic marker and to increase the number of copies of the polymorphism-bearing DNA. If 100,000 polymorphic markers were to be assayed per genome, it would be very expensive to perform 100,000 individual PCR reactions. Some advances have been made to multiplex PCR reactions (Chamberlain et al., 1988, Nucl. Acids Res. 16: 11141), and the degree of multiplexing of the PCR has been scaled up, followed by hybridization to an array of allele-specific probes (Wang et al., 1998, Science 280: 1077). However, in the studies by Wang et al., the percentage of PCR products that successfully amplified decreased as the number of PCR primers added to the reaction increased. When approximately 100 primer pairs were used, about 90% of the PCR products were successfully amplified. When the number of primer pairs was increased to about 500, about 50% of the PCR products were successfully amplified. Another disadvantage with multiplex PCR is that individual primer pairs must be synthesized for each polymorphic target. Genotyping DNA with 100,000 polymorphism targets would require, in theory, 200,000 different PCR primers. Not only is the synthesis of such primers costly and time consuming, but not all primer designs succeed in producing a desired PCR product. Therefore considerable time and energy may be spent optimizing the primer designs.

Hatada et al. have cleaved genomic DNA with a rarely cutting restriction enzyme, separated the cleaved DNA by gel electrophoresis, again cleaved the separated DNA with a second restriction enzyme in the gel, and again separated the DNA in a second dimension by electrophoresis (Hatada et al., 1991, Proc. Natl. Acad. Sci. USA 88: 9523). According to the Hatada et al. method, one then examines the two-dimensional pattern of DNA spots using DNA from different individuals. Differences in DNA migration patterns result from sequence or nucleotide methylation differences in the restriction enzyme recognition sequences.

Hayashizaki et al. (Hayashizaki et al., 1992, Genomics 14:733) use solid-phase adapters specific for restriction fragment ends to physically separate a subset of fragments from genomic DNA. After purification of the adapter-bound DNA fraction away from the rest of the genomic DNA, the bound DNA is separated from the adapters by cleaving again with the restriction enzyme used for the adapter ligation. The DNA released from the adapters is then cloned into a replication vector to make a gene library.

Others have used DNA binding factors to reduce the complexity of populations of synthetic oligonucleotides with stretches of randomized sequences, with the aim of elucidating the consensus binding sequences of the proteins (Mavrothalassitis et al., 1990, DNA Cell Biol., 9:783; Blackwell & Weintraub, 1990, Science, 250: 1104; Woodring et al., 1993, Trends Biol. Sci. 18: 77, Hardenbol & Van Dyke, 1996, Proc. Natl. Acad. Sci. U.S.A., 93: 2811).

There is a need in the art for improved methods of identifying polymorphic sequences.

SUMMARY

The invention encompasses a method of enriching for and identifying a nucleic acid sequence difference with respect to a reference sequence comprising: a) contacting a nucleic acid sample with a molecule comprising a sequence-specific binding activity under conditions which permit specific binding, wherein the sample comprises a subset of nucleic acid molecules having a sequence that binds to the sequence-specific binding activity, and wherein a bound subset of nucleic acid molecules is retained by the sequence-specific binding activity, such that the subset of bound nucleic acid molecules is enriched for molecules comprising the sequence recognized by the sequence-specific binding activity; and b) detecting a sequence difference with respect to a reference sequence in the subset of nucleic acid molecules.

In a preferred embodiment of the invention, the molecule comprising sequence-specific binding activity is selected from the group consisting of: transcription factors or DNA binding domains thereof; proteins with zinc-finger DNA binding domains: restriction endonuclease DNA recognition domains; sequence-specific antibodies; oligonucleotides complementary to an adapter ligated to a population of DNA molecules; nucleic acid molecules; aptamers; peptide nucleic acid molecules; peptides; and affinity resins which recognize DNA having a particular G+C content or methylation status.

In a preferred embodiment of the invention, the sequence-specific binding activity is bound to a solid support.

The invention also encompasses a method of identifying nucleic acid sequence differences with respect to a reference sequence comprising: a) cleaving a nucleic acid sample from one or more individuals with one or more sequence-specific cleavage agents to produce nucleic acid fragments; b) operatively linking the fragments of step (a) with molecules capable of being replicated; c) introducing the linked molecules of step (b) into a system capable of replicating only a subset of the linked molecules, and replicating the subset to form a collection of replicated molecules; and d) detecting one or more nucleic acid sequence differences with respect to a reference sequence in the members of the collection of step (c) with a method capable of detecting one or more nucleotide differences with respect to a reference sequence.

In a preferred embodiment, the system capable of replicating the linked molecules comprises host cells and the collection of replicated molecules comprises a library.

In a preferred embodiment, the method capable of detecting one or more nucleotide differences comprises DNA sequencing.

In a preferred embodiment, the method capable of detecting one or more nucleotide differences comprises denaturing HPLC.

In a preferred embodiment, the method capable of detecting one or more nucleotide differences comprises electrophoresis capable of detecting conformational differences in the nucleic acids.

In a preferred embodiment, the method capable of detecting one or more nucleotide differences comprises a protein capable of detecting mismatches between duplexed strands of nucleic acid.

In a preferred embodiment, the sequencing is performed using primers that hybridize to the molecules capable of being replicated.

In a preferred embodiment, the system capable of replicating the linked molecules comprises in vitro replication of the linked molecules.

In a preferred embodiment, the in vitro replication comprises a step utilizing primers for nucleic acid polymerization that hybridize specifically to the molecules capable of being replicated.

In a preferred embodiment, the in vitro replication comprises a step utilizing primers for nucleic acid polymerization that hybridize specifically to sequences comprising both a segment of the molecules capable of being replicated and the fragment ends of a subset of the nucleic acid molecules linked to the molecules capable of being replicated.

In a preferred embodiment, the one or more cleavage agents may be one or more restriction endonucleases. It is preferred that at least one of the restriction endonuclease cleaves DNA infrequently.

In a preferred embodiment, the infrequently cleaving restriction endonuclease is selected from the group consisting of AscI, BssHII, EagI, NheI, NotI, PacI, PmeI, RsrII, SalI, SbfI, SfiI, SgrAI, SpeI, SrfI, and SwaI restriction endonucleases.

The invention also encompasses a method of identifying nucleic acid sequence differences with respect to a reference sequence comprising: a) cleaving a nucleic acid sample from one or more individuals with one or more sequence-specific cleavage agents to produce nucleic acid fragments, wherein the ends of only a subset of the fragments comprise sequences capable of being operatively linked to a separation element; b) operatively linking the subset of step (a) with the separation element; c) separating the linked molecules; and d) detecting one or more nucleic acid sequence differences with respect to a reference sequence in the members of the separated molecules of step (c) with a method capable of detecting one or more nucleotide differences with respect to a reference sequence.

In a preferred embodiment, the method capable of detecting one or more nucleotide differences comprises DNA sequencing.

In a preferred embodiment, the method capable of detecting one or more nucleotide differences comprises denaturing HPLC.

In a preferred embodiment, the method capable of detecting one or more nucleotide differences comprises electrophoresis capable of detecting conformational differences in the nucleic acids.

In a preferred embodiment, the method capable of detecting one or more nucleotide differences comprises a protein capable of detecting mismatches between duplexed strands of nucleic acid.

In a preferred embodiment, the sequencing is performed using primers that hybridize to the sequences capable of being operatively linked to a separation element. In a preferred embodiment, the one or more cleavage agents are one or more restriction endonucleases. It is preferred that at least one restriction endonuclease cleaves DNA infrequently.

In a preferred embodiment, the infrequently cleaving restriction endonuclease is selected from the group consisting of AscI, BssHII, EagI, NheI, NotI, PacI, PmeI, RsrII. SalI, SbfI, SfiI, SgrAI, SpeI, SrfI, and SwaI restriction endonucleases.

The invention also encompasses a method of enriching for and identifying nucleic acid sequence differences with respect to a reference sequence comprising: a) fragmenting a nucleic acid sample from one or more individuals to an average fragment length; b) physically separating a subset of the nucleic acid fragments generated in step (a) based on the presence or absence of a particular nucleotide sequence within the fragments: c) operatively linking the subset of step (b) with molecules capable of being replicated; d) introducing the linked molecules of step (c) into a system capable of replicating the linked molecules, and replicating the linked molecules to form a collection of replicated molecules, and e) detecting a nucleic acid sequence difference with respect to a reference sequence in the collection of replicated molecules of step (d) using a method capable of detecting one or more nucleotide differences with respect to a reference sequence.

In a preferred embodiment, the system capable of replicating the linked molecules comprises host cells and the collection of replicated molecules comprises a library.

In a preferred embodiment, the method capable of detecting one or more nucleotide differences comprises DNA sequencing.

In a preferred embodiment, the method capable of detecting one or more nucleotide differences comprises denaturing HPLC.

In a preferred embodiment, the method capable of detecting one or more nucleotide differences comprises electrophoresis capable of detecting conformational differences in the nucleic acids.

In a preferred embodiment, the method capable of detecting one or more nucleotide differences comprises a protein capable of detecting mismatches between duplexed strands of nucleic acid.

In a preferred embodiment, the DNA sequencing is performed using primers that hybridize to the molecules capable of being replicated.

In a preferred embodiment, the system capable of replicating the linked molecules comprises in vitro replication of the linked molecules.

In a preferred embodiment, the in vitro replication comprises a step utilizing primers for nucleic acid polymerization that hybridize specifically to the molecules capable of being replicated.

In a preferred embodiment, the in vitro replication is repeated one or more times to increase the enrichment of the linked molecules.

In a preferred embodiment, the method used to physically separate a subset of fragments comprises using a sequence-specific binding molecule.

In a preferred embodiment, the sequence-specific binding molecule is a protein.

In a preferred embodiment, the one or more cleavage agents are restriction endonucleases.

In a preferred embodiment, the method capable of detecting one or more nucleotide differences comprises DNA sequencing.

In a preferred embodiment, the method capable of detecting one or more nucleotide differences comprises denaturing HPLC.

In a preferred embodiment, the method capable of detecting one or more nucleotide differences comprises electrophoresis capable of detecting conformational differences in the nucleic acids.

In a preferred embodiment, the method capable of detecting one or more nucleotide differences comprises a protein capable of detecting mismatches between duplexed strands of nucleic acid.

In a preferred embodiment, the DNA sequencing is performed using primers that hybridize to the molecules capable of being replicated.

In a preferred embodiment, the method used to physically separate a subset of fragments comprises using a sequence-specific binding molecule.

In a preferred embodiment, the sequence-specific binding molecule is a protein.

The invention also encompasses a method of enriching for and identifying nucleic acid sequence differences with respect to a reference sequence comprising: a) hybridizing a nucleic acid sample from one or more individuals with oligonucleotide primers under conditions wherein each of the primers permits extension by a polymerase of two or more different sequences, and wherein the sequences replicated by extension of the primers comprise regions where there are known sequence differences between individuals of the species being examined; b) extending the oligonucleotide primers hybridized in step (a) to form an enriched collection of replicated molecules; and c) detecting one or more nucleic acid sequence differences in the members of the collection with respect to a reference sequence with a method capable of detecting one or more nucleotide differences with respect to a reference sequence.

In a preferred embodiment, the method capable of detecting one or more nucleotide differences comprises DNA sequencing.

In a preferred embodiment, the method capable of detecting one or more nucleotide differences comprises denaturing HPLC.

In a preferred embodiment, the method capable of detecting one or more nucleotide differences comprises electrophoresis capable of detecting conformational differences in the nucleic acids.

In a preferred embodiment, the method capable of detecting one or more nucleotide differences comprises a protein capable of detecting mismatches between duplexed strands of nucleic acid.

In a preferred embodiment, the DNA sequencing is performed using primers that hybridize to the primers hybridized in step (a) and extended in step (b).

In a preferred embodiment, steps (a) and (b) are repeated one or more times to increase the enrichment of the enriched collection of replicated molecules.

In a preferred embodiment, the method further comprises, after step (b) and before step (c) the step of hybridizing a second set of primers that hybridize specifically to sequences comprising both a segment of the first set of primers and a segment of the replicated portion of the molecules generated in step (b).

The invention also encompasses a method of enriching for and identifying nucleic acid sequence differences with respect to a reference sequence comprising: a) fragmenting a nucleic acid sample from one or more individuals; b) physically separating a subset of the nucleic acid fragments based on the size of the fragments: c) operatively linking the subset of step (b) with molecules capable of being replicated; d) introducing the linked subset of molecules of step (c) into a system capable of replicating the linked subset of molecules, and replicating the subset of linked molecules to form an enriched collection of replicated molecules; and e) detecting one or more nucleotide sequence differences in the members of the collection of step (d) with a method capable of detecting one or more nucleotide differences with respect to a reference sequence.