Amplification-based cloning method

Abstract
The invention provides a method for isolating a nucleic acid molecule of interest from a nucleic acid library, which involves generating multiple copies of a nucleic acid molecule of interest present in the library and an enrichment step to remove template nucleic acid molecules. The method of invention has greater efficiency and possesses superior features compared to conventional cloning methods, and is capable of use for identifying and isolating known and novel nucleic acid molecules.
Description


FIELD OF THE INVENTION

[0002] The invention concerns a method for the enrichment and isolation of genes of interest from recombinant DNA libraries.



BACKGROUND OF THE INVENTION

[0003] Recombinant DNA (including CDNA and genomic) libraries consist of a large number of recombinant DNA clones, each containing a different segment of foreign DNA. In order to ensure that a recombinant cDNA library contains at least one copy of each mRNA in the cell, it generally needs to include between about 500,000 and 1,000,000 independent cDNA clones. Current Protocols in Molecular Biology, Ausubel et al., editors, Greene Publishing Associates and Wiley-Interscience, New York, 1991, vol. 1, Unit 5.8.1. Similarly, a genomic library with a base of about 700,000 clones is required to obtain a complete library of mammalian DNA. Ausubel et al., supra, Unit 5.7.1. While the frequency of different genes in any particular library varies, most genes will be present at a frequency of about 1 part in 103 to 106. A particularly rare mRNA will be represented by a single clone out of 106 clones, while the majority of the genes will be present at a frequency of 1 in 104 to 105 clones.


[0004] The identification and isolation of any desired recombinant DNA clone from among such a daunting number of total clones is not an easy task. Over the past 25 years, several cloning methods have been developed. In most cases, desired clones are identified by screening DNA libraries with nucleic acid probes, ligands or antibodies. Usually, libraries are introduced into host cells, plated out, colonies transferred to nitrocellulose filters, and hybridized to 32P-labeled probes or bound to antibodies. Such filter hybridization methods (see Sambrook et al., 1989, infra) do not involve an enrichment step. In order to clone a particular gene, sometimes as many as one million clones must be screened. Subtractive hybridization techniques have also been used to isolate target DNA. In this technique, the CDNA molecules created from a first population of cells is hybridized to cDNA or RNA of a second population of cells in order to “subtract out” those CDNA molecules that are complementary to nucleic acid molecules present in the second population that reflect nucleic acid molecules present in both populations, therefore leaving only molecules unique to the population of interest.


[0005] Inverse polymerase chain reaction (IPCR) has been described, see Ochman et al. Genetic applications of an inverse PCR reaction, Genetics 120: 621 (1988)). In IPCR, the primers are oriented in the reverse direction of the usual orientation in conventional PCR, i.e., the two primers extend away from each other. Inverse PCR was originally used to amplify uncharacterized sequences immediately flanking transposable elements. Inverse PCR has been used to isolate a target gene; however, there is no selection or enrichment for the target gene in conventional inverse PCR protocols, resulting in a high background of colonies.


[0006] Li et al. in U.S. Pat. Nos. 5,500,356, and 5,789,166, describe a method of isolating a desired target nucleic acid from a nucleic acid library, that involves the use of biotinylated probes and enzymatic repair-cleavage to eliminate the parental template nucleic acid of the library. This method requires a single stranded nucleic acid library (M13 phagemid library). If the library consists of double stranded plasmids, single strands have to be prepared initially. A biotinylated oligonucleotide probe is hybridized to a target sequence within the single-stranded molecules. This hybridized complex is then captured on avidin-coated beads and the library recovered from the beads by denaturation of the hybridized molecules. This selection eliminates undesired single-stranded phagemid DNA. This method of cloning does not involve amplification of the target gene before the selection step. The recovered single-stranded DNA is converted to ds DNA in the presence of dNTPs (but not dUTP) and then the mixture digested with the enzyme HhaI that digests away residual ss DNA that contain dUTP. Transformation and isolation of the desired molecule follows.


[0007] PCR based site-directed mutagenesis has been used to create a desired mutation such as a point mutation, deletion or insertion. In the site-directed mutagenesis method of Bauer et al., U.S. Pat. No. 5,789,166, the starting DNA template for the PCR amplification is typically a homogeneous population of plasmids all containing the one insert of interest that is to be mutated. Both oligonucleotide primers for such a PCR reaction are mutagenic primers which must contain the desired mutation; for point mutations, these primers are designed to contain at least one mismatched base relative to the template which upon primer extension will result in the desired mutation of the target gene. For the point mutations, the primers are overlapping, i.e., they need to anneal to the same sequence on opposite strands of the plasmid. For deletion mutagenesis, the primers are designed such that there is a gap between the 5′ ends of the primer pair. Thus, the product of the primer extension has a gap in the sequence of the target gene corresponding to the sequence to be deleted. Mutated plasmids containing the desired mutation are selected for and transformed into competent bacteria.


[0008] From the above discussion, it is apparent that there is a need for a cloning method that is versatile, easier to perform and less laborious, that can provide higher throughput, and is economical. The present invention overcomes the limitations of conventional cloning methods and provides additional advantages that will be apparent from the detailed description below.



SUMMARY OF THE INVENTION

[0009] The invention provides a method of amplifying and isolating a nucleic acid molecule of interest from a mixture of nucleic acid molecules, comprising:


[0010] (i) providing a recombinant nucleic acid library with a heterogeneous population of methylated, circular nucleic acid molecules as template molecules;


[0011] (ii) annealing a first and a second primer to complementary strands of the circular nucleic acid molecule, to produce an annealed mixture, wherein


[0012] the two primers in the 5′ to 3′ direction, extend in opposite directions relative to each other during polymerase chain reaction;


[0013] the 5′ ends of the two primers are adjacent to each other; and


[0014] wherein each primer is identical in sequence to its corresponding sequence in the nucleic acid molecule of interest;


[0015] (iii) subjecting the annealed mixture to polymerase chain reaction, thereby producing an amplified mixture containing linear amplicons;


[0016] (iv) digesting the amplified mixture with an enzyme that selectively cleaves methylated DNA, thereby eliminating the template molecules and enriching the nucleic acid molecule of interest; and


[0017] (v) isolating the nucleic acid molecule of interest.


[0018] In one embodiment, the first and second primers are provided phosphorylated on their 5′ ends. In this embodiment, the method will comprise a ligation step after the PCR step but prior to the digesting step, to ligate the linear amplicons to produce circular replicons. Alternatively, after the digestion step, amplicons larger than the size of the cloning vector of the nucleic acid library are isolated such as by gel purification, and the isolated amplicons ligated.


[0019] In a preferred embodiment, the enzyme used to digest the amplified mixture is a restriction endonuclease. In a specific embodiment, the restriction endonuclease is Dpn I.


[0020] Multiple recombinant nucleic acid libraries can be mixed into a single reaction mix for polymerase chain reaction. In addition, multiple nucleic acid molecules of interest can be cloned simultaneously by applying aliquots of a solution of the mixed libraries to wells of a 96-well microtiter plate and performing the polymerase chain reaction on the microtiter plate.


[0021] In one embodiment, the nucleic acid library is a double-stranded DNA library wherein the DNA is methylated. Preferred DNA libraries are human cDNA or human genomic DNA libraries. Preferably, the nucleic acid molecule of interest is represented in the library at a frequency of greater than 5×105, even more preferably, at a frequency of equal to or greater than 1×106.


[0022] The method of the above embodiments further comprises the step of transforming the circular replicons into a suitable host cell after the digesting step, to generate clones. In one embodiment, the host cell is a competent bacterial host cell. The clones are then screened to identify the clone containing the nucleic acid molecule of interest. In an alternative embodiment, the screening step is omitted and the clones are directly sequenced. Sequencing is performed on nucleic acid isolated from the clones. Nucleic acid from multiple clones can be pooled for the sequencing step.


[0023] The invention provides a method of amplifying and isolating a nucleic acid molecule of interest from a recombinant nucleic acid library, comprising:


[0024] (i) providing a recombinant nucleic acid library with a heterogeneous population of methylated, circular nucleic acid molecules as template molecules;


[0025] (ii) annealing a first and a second primer to complementary strands of the circular nucleic acid molecule, to produce an annealed mixture, wherein


[0026] the two primers extend in opposite directions relative to each other during polymerase chain reaction;


[0027] the 5′ ends of the two primers are phosphorylated and adjacent to each other; and


[0028] wherein each primer is identical in sequence to its corresponding sequence in the nucleic acid molecule of interest;


[0029] (iii) subjecting the annealed mixture to polymerase chain reaction, thereby producing an amplified mixture containing amplicons;


[0030] (iv) ligating the amplicons in a ligation mix to produce circular replicons;


[0031] (v) subjecting the ligation mix after ligation to digestion with an enzyme that selectively cleaves methylated nucleic acid, thereby eliminating the template molecules and enriching the nucleic acid molecule of interest;


[0032] (vi) transforming the replicons into a suitable host cell to generate transformed clones; and


[0033] (vii) isolating the nucleic acid molecule of interest.


[0034] In one embodiment of the preceding method, a screening step is provided to screen the transformed clones to identify the clone containing the nucleic acid molecule of interest.


[0035] In any of the above embodiments, greater than 50 cDNA libraries can be provided in a single mix for polymerase chain reaction.







BRIEF DESCRIPTION OF THE FIGURES

[0036]
FIG. 1 is a flow chart illustrating the FLIP cloning method up to the restriction digestion (with Dpn I in this case) selection step, as described in detail in Example 1. The shaded boxes flanking the vector sequence represent the target gene sequences.


[0037]
FIG. 2 is a flow chart showing one alternative embodiment to the above FLIP procedure, as performed in Example 2.


[0038]
FIG. 3 shows the nucleotide sequence of Incyte clone 509 1511H. (SEQ ID No:4), as used in Example 2.


[0039]
FIG. 4 shows a nucleotide sequence (SEQ ID NO: 11) of a native sequence DNA98853 polypeptide cDNA (nucleotides 1-903). Also presented is the position of three cysteine-rich repeats encoded by nucleotides 10-126, 133-252 and 259-357 as underlined. The putative transmembrane domain of the protein is encoded by nucleotides 409-474 in the figure. See Example 2.


[0040]
FIG. 5 shows the amino acid sequence (SEQ ID NO: 12) derived from nucleotides 4-900 of the nucleotide sequence shown in FIG. 4. A potential transmembrane domain exists between and including amino acids 137 to 158 in the figure. See Example 2.







DETAILED DESCRIPTION OF THE INVENTION

[0041] A. Definitions


[0042] An “amplicon” is a product of a polymerase chain reaction extended from the two primers of the primer pair used in the reaction.


[0043] A “replicon” is a nucleic acid molecule capable of being replicated in a suitable host cell. Generally, a replicon will have an origin of replication for replication in a compatible host cell.


[0044] The nucleic acid molecule of interest is isolated from a mixture or library of cloned molecules. The clones comprises DNA or RNA or mixed polymer molecules that may be either single-stranded or double-stranded. Typically, the library of clones will comprise plasmids or other vectors (such as viral vectors) prepared using recombinant DNA methods, i.e., recombinant nucleic acid library, to contain a fragment of DNA or RNA derived from a nucleic acid source such as cells of a cell line, or primary cells or a tissue. The cells may be prokaryotic or eukaryotic cells (including animals, humans, yeast and higher plants). The nucleic acid library will contain a heterogeneous population of clones. In preferred embodiments, the nucleic acid molecule of interest is a gene comprising an open reading frame and may further include 5′ and/or 3′ untranslated regions (UT). Where the examples use the term “target gene” or “gene of interest”, it will be understood they encompass “nucleic acid molecule of interest”.


[0045] Unless otherwise indicated, the term “DNA” is used to refer collectively to genomic DNA and cDNA, prepared from any source, including bacteria, plant cells, and mammalian cells, preferably cells of high primates, such as monkeys or humans, most preferably humans.


[0046] The phrase “recombinant DNA library” is used to refer collectively to genomic and cDNA libraries. Preferably, a “recombinant DNA library” contains a substantially complete representation of all genomic or cDNA sequences from a particular cell or tissue source.


[0047] For genomic DNA libraries, the “frequency” of any given gene is the ratio of a particular gene fragment of a given length to the total number of base pairs (bp) present in the genome. For example, if the genome consists of 3×109 base pairs, a particular 3000-bp gene of interest will be present at a frequency of 1 part in 106. The “frequency” of a particular cDNA within a cDNA library is expressed by the ratio of its mRNA to the total poly(A) containing RNA. This ratio is usually unaffected by the process of copying the mRNA into cDNA.


[0048] The technique of “polymerase chain reaction,” or “PCR,” as used herein generally refers to a procedure wherein minute amounts of a specific piece of nucleic acid, RNA and/or DNA, are amplified as described in U.S. Pat. No. 4,683,195 issued Jul. 28, 1987. Generally, some sequence information from the region of interest or beyond needs to be available, such that oligonucleotide primers can be designed; these primers will be identical or similar in sequence to opposite strands on the template to be amplified. Generally, the PCR method involves repeated cycles of primer extension synthesis, using two primers capable of hybridizing preferentially to a template nucleic acid comprising the nucleotide sequence to be amplified. If the template nucleic acid is DNA, the DNA to be amplified is denatured by heating the sample. In the presence of DNA polymerase and excess deoxynucleotide triphosphates, oligonucleotides that anneal specifically to the target sequence prime new DNA synthesis. PCR produces an “amplified mixture” in which the individual amplicons increase exponentially in amount with respect to the number of cycles of primer extension. PCR can be used to amplify specific RNA sequences, specific DNA sequences from total genomic DNA, and cDNA transcribed from total cellular RNA, bacteriophage or plasmid sequences, etc. See, generally, Mullis et al., Cold Spring Harbor Symp. Quant. Biol., 51:263 (1987); Erlich, ed., PCR Technology, (Stockton Press, NY, 1989); Wang & Mark, in PCR Protocols, pp.70-75 (Academic Press, 1990); Scharf, in PCR Protocols, pp. 84-98; Kawasaki & Wang, in PCR Technology, pp. 89-97 (Stockton Press, 1989).


[0049] A “primer” as used herein, is a single-stranded oligonucleotide that can be extended by the covalent addition of nucleotide monomers during the template-dependent polymerization reaction catalyzed by a polymerase. The oligonucleotide primers are generally at least 15 nucleotides in length, preferably between about 25 to about 45 bases, more preferably between about 25 to 35 nucleotides, even more preferably between 20-24 nucleotides.


[0050] A pair of oligonucleotide primers is used in the inverse PCR step. The two primers of the pair are “adjacent” to each other in the sense that they do not overlap or have any gap in between. “Adjacent” primers are illustrated in FIG. 3 (see left and right primer). As used herein, forward and reverse when used to describe a primer, refer to the direction in which the 3′ end of the primer is pointing relative to the target sequence. As used herein, a “forward” primer will extend downstream towards the 3′ end of the gene. A reverse primer extends in the opposite direction of the forward primer during primer extension. As an illustration, assuming a nucleic acid fragment arbitrarily numbered base 1-32 serves as a template for the design of the primer pair, and the left primer from 5′ to 3′ is identical to the upper strand nucleotides 16 down to 1, then the right primer is identical to the lower strand nucleotides 17-32. In this way, the primers do not have gaps or overlaps over the stretch of template sequence.


[0051] Two sequences are “complementary” to one another if they are capable of hybridizing to one another to form a stable anti-parallel, double-stranded nucleic acid structure.


[0052] In a “ligation” reaction, the ligase enzyme catalyzes the formation of a phosphodiester bond between juxtaposed 5′ phosphate and 3′ hydroxyl termini in two nucleic acid fragments. To ligate the nucleic acid fragments together, their ends must be compatible. In some cases, the ends will be directly compatible after endonuclease digestion. However, it may be necessary to first convert the staggered ends commonly produced after endonuclease digestion to blunt ends to make them compatible for ligation. To blunt ends, the DNA is treated in a suitable buffer with Klenow fragment of DNA polymerase I or T4 DNA polymerase in the presence of the four deoxyribonucleotide triphosphates. The DNA is then purified such as by phenol-chloroform extraction and ethanol precipitation. The DNA fragments that are to be ligated together are mixed in solution in about equimolar amounts. The solution will include ligase buffer, and a ligase such as T4 DNA ligase at about 10 units per 0.5 μg of DNA. If the DNA is to be ligated into a vector, the vector is first linearized by digestion with the appropriate restriction endonuclease(s). The linearized fragment is then treated with bacterial alkaline phosphatase, or calf intestinal phosphatase to prevent self-ligation during the ligation step.


[0053] The term “enriching” or “enrichment” is used to refer to the increase in the frequency of occurrence of a particular nucleic acid molecule within a recombinant nucleic acid library after application of the enrichment step (also referred to herein as selection) of the present invention, relative to the frequency of the particular nucleic acid molecule within the same recombinant nucleic acid library prior to the application of the enrichment step, e.g. the frequency of occurrence of a particular cDNA within a recombinant cDNA library. Accordingly, the degree of enrichment (fold enrichment) is expressed as the ratio of the frequency of occurrence of a particular cDNA within a recombinant DNA library after application of the enrichment method of the present invention, to the frequency of the particular cDNA or corresponding genomic DNA within the library prior to the application of the enrichment method.


[0054] “Transformation” means introducing DNA into a host cell so that the DNA is replicable, either as an extrachromosomal element or chromosomal integrant. Transformation is usually performed by electroporation (Miller et al., Proc. Nati. Acad. Sci. USA 85, 856-860 (1988)), CaCI2 transfection (Mandel and Higa, J. Mol. Biol. 53, 159-162 (1970)), Shigekawa and Dower, BioTechnique 6, 742-751 (1988)), DEAE-dextran technique (eukaryotic cells, Lopata et al., Nucleic Acids Res. 12, 5707 (1984)), and liposome-mediated transfection (Felgner et al., Proc. Natl. Acad. Sci. USA 84, 7413-7417 (1987)). Unless otherwise provided, the method used herein for transformation of E. coli is electroporation.


[0055] The terms “transformant”, “transformed host cell” and “transformed” refer to the introduction of DNA into a cell. The cell is termed a “host cell”, and it may be a prokaryotic or a eukaryotic cell. Typical prokaryotic host cells include various strains of E. coli. Typical eukaryotic host cells are mammalian, such as Chinese hamster ovary cells or human embryonic kidney 293 cells. The introduced DNA sequence may be from the same species as the host cell or a different species from the host cell, or it may be a hybrid DNA sequence, containing some foreign and some homologous DNA.


[0056] The term “plate” is used to refer to petri dishes or 96-well microtiter dishes filled with solid medium used to grow separated bacterial colonies or plaques. The terms “plating” or “plating out” refer to the placement of bacteria or phage on plates so that colonies or plaques are formed.


[0057] In the context of the present invention the expressions “cell”, “cell line”, and “cell culture” are used interchangeably, and all such designations include progeny. It is also understood that all progeny may not be precisely identical in DNA content, due to deliberate or inadvertent mutations. Mutant progeny that have the same function or biological property as screened for in the originally transformed cell are included.


[0058] B. Preferred Embodiments


[0059] The present invention provides a cloning and selection method called Full Length Inverse PCR (“FLIP”), also referred to herein as inverse long distance PCR because of the ability of this method to isolate long genes. FLIP is a very rapid and high throughput method of isolating an entire clone, vector plus insert, of a specific nucleic acid molecule from any nucleic acid library which was propagated in a host cell that methylates the nucleic acid library. The FLIP cloning method amplifies a target gene or nucleotide sequence and generates a highly purified population of the target gene.


[0060] There are several advantages of FLIP over conventional cloning methods. The FLIP method is easy to perform, employing standard molecular biology techniques, PCR, ligation, digestion, and DNA hybridization procedures, that ultimately generates a population of clones representing a single gene. The FLIP method is versatile in applications, provides high throughput, and is economical. FLIP allows the cloning of novel as well as known genes; this is in contrast to T/A cloning and standard PCR cloning which require that the full length gene be known in order to clone it. Library array cloning techniques such as HUCL (Human Universal cDNA Library; Stratagene, catalog #937811 & 937820 for HUCL Array I Membranes) cloning system are expensive and or labor intensive, compared to FLIP, and do not yield to high throughput applications. Using the FLIP method, one can simultaneously amplify and clone as many as 96 or more separate target genes in a single FLIP procedure in a few days. A unique advantage of FLIP is that several cDNA libraries can be mixed together and effectively screened in a single FLIP-IPCR reaction. To date, we have mixed 56 separate cDNA libraries into a single mix and routinely and successfully screen this library mixture to isolate specific genes. This mixture of 56 libraries generates a library with a clonal complexity of several hundred million different cDNA clones. Another advantage is that the FLIP procedure can use double stranded cDNA libraries instead of other methods which require single stranded cDNA libraries. Single stranded libraries are typically less complex, have shorter insert sizes than their double stranded counterparts and are more laborious to produce. Thus far, the length of the nucleic acid molecule able to be cloned using the FLIP methodology is only limited by the size of the DNA molecules of the library itself. FLIP enables amplification of rare genes. While the frequency of different genes in any particular library varies, most genes will be present at a frequency of about 1 part in 103 to 106. A particularly rare mRNA will be represented by a single DNA out of about 5×105 or more clones, while the majority of the genes will be present at a frequency of I in 104 to 105 clones. In a preferred embodiment, FLIP uses 5′ phosphorylated primers instead of non-phosphorylated primers in the inverse PCR step, which allows the amplicon to self-ligate and generate a circular molecule. Due to the use of 5′ phosphorylated primers and the fact that the gene of interest is amplified together with the vector sequences, a single ligation event of the ends of the linear amplicon is sufficient to regenerate a replicon that can then be propagated and amplified further in a host cell. More advantageously, due to the selection step, FLIP will typically amplify the gene of interest to high purity instead of generating a positive clone in a sea of background clones. These and other advantages of the FLIP method will be apparent from the following description of the procedure.


[0061] In one embodiment, the FLIP method involves the following steps:


[0062] (i) Inverse PCR to amplify the entire plasmid containing the nucleic acid of interest plus vector sequences from a nucleic acid library or mixture of libraries;


[0063] (ii) Ligation to circularize the amplicon;


[0064] (iii) Enzyme digest to eliminate the parental template plasmids of the library;


[0065] (iv) Transformation into host cells to amplify the resultant amplified plasmids;


[0066] (v) Screening the transformants to isolate the clone with the target gene; and


[0067] (vi) Sequencing to identify the target gene.


[0068] It will be understood that certain steps can be performed in a different order and certain steps are optional. For example, although the above order of ligation before enzyme digestion is preferred in the present embodiment of FLIP, the amplicons can be treated with Dpn I before the ligation step. In another embodiment, the screening step after transformation can be omitted. Because FLIP can generate highly purified populations of the target gene, it is possible to directly sequence FLIP reaction products by transforming the plasmids remaining after the restriction digest selection, preparing the plasmid DNA from the total population of transformed bacteria, and sequencing the plasmid DNA directly using gene specific primers. To directly sequence FLIP reaction products, the concentration of the target gene is preferably at least 20% of the total genes in the FLIP reaction product mixture. If the concentration of target gene in the FLIP reaction product is below 20% of total genes, the mixture can still sequenced such as by first PCR-amplifying the target gene using a gene specific primer and a vector primer and then sequencing the amplicon. Sequencing of complex mixtures of FLIP reactions will allow for the rapid determination of novel genes without the need to isolate several purified clones.


[0069] In an alternative embodiment, following the inverse PCR step, the PCR mixture is treated with an enzyme, typically a restriction enzyme, to digest away the template plasmids, followed by gel purification of PCR amplified products that are larger than the size of the library cloning vector. The gel purified fragments are then self-ligated and transformed into competent bacterial cells. Colonies are then screened for the target gene and plasmids prepared from the positive clones are sequenced.


[0070] The steps of the FLIP method will now be described in more detail.


[0071] (i) Inverse PCR Step


[0072] The starting material for inverse PCR amplification is methylated, circular nucleic acid the source of which can be one or more nucleic acid libraries. The library can be methylated in vivo or in vitro. The DNA library can be a cDNA or a genomic DNA library. The construction of plasmid, cosmid, and phagemid cDNA libraries, or genomic libraries have been described, see, e.g., Sambrook, J. et al., In: Molecular Cloning, A Laboratory Manual, 2nd Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989). Usually, the target gene or sequence of interest is.amplified from a cDNA library. The vector into which the DNA fragments of the library are inserted will typically contain an origin of replication to allow it to replicate in the appropriate host cell, and additionally, at least one selectable marker. For example, the vector for replication in E. coli may include the Col E1 origin of replication and the ampicillin resistance gene as a selectable marker. A β-galactosidase gene may also be included as an additional selectable marker.


[0073] To eliminate the steps of initially screening individual libraries to predetermine which library contains the gene of interest and then selecting that library for amplification, with FLIP, the inverse PCR can be performed on a mixture of cDNA libraries. This saves time and labor. The FLIP method has been successfully used to amplify and isolate target genes from a mixture of 8, 10, 15, and even up to 56 separate cDNA libraries in one mix. There is no apparent limit to the number of libraries that can be screened in a single reaction. For example, 96 different target genes can be screened in a single PCR reaction on a microtiter plate, with each well of the 96 well microtiter plate containing a mixture of 15 libraries. Nucleic acid molecules of interest can be amplified from a single mix containing greater than 40, preferably greater than 50, even more preferably greater than 60 libraries. The 56 cDNA libraries prepared from different tissues and cell lines mixed into a single mix generates a library with a clonal complexity of several hundred million different cDNA clones. Several specific genes have been isolated from the 56-library mix, some of which were not successfully isolated by traditional methods. In FLIP, the entire plasmid containing the target gene and vector sequences are amplified by inverse PCR.


[0074] For each target gene or nucleic acid sequence of interest, a pair of synthetic oligonucleotide primers is needed for the IPCR step. The primers are each complementary to opposite strands of a stretch of known sequence of the target gene to be isolated. The known sequence can be any sequence that is an indicator of a potential nucleic acid molecule of interest; it may show homology to a domain or a highly conserved region of a known gene. For genomic DNA, one might look for an exon and identify a fragment to make the primers. The known sequence can be an EST (Expressed Sequence Tag) which is usually about 300-600 bases. For the purposes of FLIP, a short region of known sequence, e.g., about 30 bp, is sufficient. The primers for FLIP inverse PCR are identical in sequence to the corresponding stretch of sequences in the target gene; the primer sequence has no mismatch to the template, unlike mutagenic oligonucleotides that intentionally incorporate a mismatch nucleotide in site-directed mutagenesis protocols. The pair of oligonucleotide primers for the inverse PCR are designed in such a way that the primers are adjacent to each other and the 5′ to 3′ direction of each primer extends in opposite directions relative to each other. The primers do not overlap and they do not have to be the same length.


[0075] The primers added to the IPCR reaction mix do not have to be but are preferably phosphorylated on the 5′ ends to enable self-ligation of the linear amplicon to generate a replicon. If the primers are not initially phosphorylated, the 5′ ends of the amplicon strands can be phosphorylated with a kinase in vitro before ligation.


[0076] The oligonucleotide primers are generally at least 15 nucleotides in length, preferably between about 25 to about 45 bases, more preferably between about 25 to 35 nucleotides, even more preferably between 20-24 nucleotides. The amount of PCR primers used can be varied a lot, by as much as 10-fold. Usually, the primers are used at a final concentration of about 1 μM (See e.g., Molecular Cloning: A Laboratory Manual, J. Sambrook et al., Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y., 1989, section 14.15). The melting temperature (tm) can be varied. In one embodiment, the primers have a tm of 68-71° C. to optimize specific hybridization to the target sequence. In a preferred embodiment, the primers of each primer pair have a tm of within 1° C. of each other. Other parameters to be taken into account in the design of PCR primers, such as GC content, etc. are as taught in the art. The oligonucleotide primers can be synthesized by known methods and are available custom-designed from commercial sources.


[0077] The oligonucleotide primers are annealed to the template strands and first and second DNA strands corresponding to the two DNA strands of the closed circular template DNA are synthesized in an exponential cyclic amplification reaction. The oligonucleotide primers are extended during temperature cycling by using an appropriate polymerase, preferably, a thermostable or thermophilic polymerase. A thermostable polymerase can catalyze nucleotide addition at temperatures of between about 50° C. to about 100° C. Exemplary thermostable polymerases are described in European Patent Application No. 0258017, incorporated herein by reference. Thermophilic DNA polymerases useful in FLIP include Pfu (Pfu Turbo from Stratagene, La Jolla Calif.), the Vent® DNA polymerases (New England BioLabs, Beverly, Mass.), and Platinum Pfx (Life Technologies, Rockville, Md.). Thermostable DNA polymerase with high fidelity (having proofreading properties) and low error rates are preferred. Pfu and Vent polymerases both have an integral 3′to 5′ exonuclease proofreading activity that enables the polymerase to correct nucleotide misincorporation errors.


[0078] The annealing temperature for annealing of the primers to the template strands can be varied. Preferably, the annealing temperature is between 65°-71° C., preferably 65° C. which provides a high degree of specificity to the gene of interest. Lowering the annealing temperature below 65° C. can reduce specificity.


[0079] The number of PCR cycles can be varied; 5, 10, 15, 20 and 23 cycles have been used herein with success. Twenty PCR cycles gives a million fold amplification. For a common gene that is well represented in the cDNA library, 5-10 cycles may be adequate. The number of PCR cycles that will optimize isolation of a possibly rare target gene while maintaining a low PCR-induced mutation rate, is preferred and can be determined by routine methods.


[0080] FLIP can be used advantageously to isolate full length known genes. For this, primers for the IPCR step are designed near the 5′ ATG start codon. For the screening step, the probe can be designed to the desired 3′ end, e.g., at the stop codon. This will favor the isolation of any clones containing the full length sequence between the ATG start and the desired 3′ end. FLIP is also useful for amplifying long genes because the DNA polymerases used (e.g., Pfu, Pfx) can polymerize long fragments. Pfu polymerase can polymerize primer extension up to 10 kb, while Pfx can apparently amplify genomic templates up to 12 kb and plasmid templates up to 20 kb. The isolation of the Toll 6 gene which is 4.2 kb in length (open reading frame plus 5′ and 3′ UT) plus the pRK5D vector of about 5.1 kb adding up to a total amplicon length of about 9.3 kb, illustrates the capabilities of the FLIP cloning method.


[0081] (ii) Ligation


[0082] The PCR reaction generates a linear, double-stranded DNA amplicon that contains the target gene plus the vector sequences. In a ligation reaction, this linear amplicon is then self-ligated using the 5′-phosphorylated ends of the primer, to regenerate the replicon, able to replicate in the appropriate host cell.


[0083] (iii) Enzyme Digest


[0084] To select against the parental templates of the library and enrich for the PCR amplified products, the circularized amplicons are then digested with a selection enzyme that the template plasmids are susceptible to. This selection step reduces the background of clones for screening. Typically, the CDNA libraries are propagated in methylation positive bacteria. DNA isolated from almost all E. coli strains is dam methylated and is therefore susceptible to digestion by a restriction endonuclease specific for a methylated recognition sequence. Template DNA can also be methylated in vitro. Therefore, at least one restriction enzyme which digests methylated but not unmethylated DNA, can be used to remove the parental template plasmids. Preferably, the restriction enzyme that cleaves methylated DNA is a frequent cutter, i.e., has shorter than a 6-base recognition site, preferably has a 4 base pair recognition site. In one embodiment, the restriction enzyme specific for methylated DNA is Dpn I. Dpn I is a 4-base cutter with the recognition sequence 5′ GmATC 3′; it is specific for methylated and hemimethylated DNA. The vector pRK5D has 23 Dpn I sites. Thus, even if the DpnI digestion was only 5% effective, the enzyme would still cut all of the methylated template vectors at least once making the digested vectors incapable of transfoming a bacteria and therefore all transformed bacteria would represent unmethylated, circularized amplicons. Other methylation requiring endonuclease include McrBC (NEB, Beverly, Mass.).


[0085] (iv) Transformation. The resultant plasmid replicons after restriction enzyme treatment can be amplified and propagated in suitable host cells, typically competent bacterial cells. Transformation of competent bacteria can be any performed by any method well known in the art and described, e.g., in Sambrook et al., supra. Transformation methods include lipofection, heat shock, electroporation, calcium phosphate co-precipitation, rubidium chloride or polycation (such as DEAE-dextran)-mediated transfection. Competent bacteria for transformation are commercially available and transformation of these cells can be performed following the instructions of the manufacturer. The transformation method that provides optimal transformation frequency is favored. In a preferred embodiment, transformation is by electroporation. Transformation can be performed efficiently in a 96-well format.


[0086] (v) Screening


[0087] Typically, the FLIP method will amplify the target gene so that it is the major species in the final FLIP reaction product. The transformants, if they are bacteria, are then plated on agar plates, typically under appropriate drug selection depending on the specific selectable marker on the vector. The resultant bacterial colonies are then screened for presence of the target gene by routine methods. Colonies can be screened, e.g., by PCR or colony hybridization.


[0088] (vi) From the screening, positive clones are identified and DNA is prepared from the clones for sequencing. Sequencing is performed using routine methods as previously described, e.g., in Sambrook et al. 1989, supra. Sequencing can be preformed on DNA preparations of individual positive clones or pools of clones.


[0089] The invention will be more fully understood by reference to the following examples, which are intended to illustrate the invention but not to limit its scope. All literature and patent citations are expressly incorporated by reference.



C. EXAMPLES


Example 1


Isolation of cDNA Clones Encoding Toll 6 Gene

[0090] The Toll 6 gene (Genbank Accession# AB020807) has a known sequence of 2760 bp corresponding to its open reading frame (ORF). Using the FLIP methodology in the present experiment, a cDNA clone was isolated which contained the Toll 6 gene that included, in addition to the ORF, the 5′ and 3′ UT. The total length of the isolated Toll 6 gene was 4.2 kb; the vector pRK5D used was 5.1 kb, thus adding to a total length of 9.3 kb of the DNA molecule amplified by IPCR and isolated.


[0091] Toll 6 gene was cloned using FLIP as follows. Two adjacent 5′ phosphorylated primers, 128185.snrl and 128185.snfl, were designed on opposite strands with melting temperatures of 69.3° C. and 69.8° C. respectively. These primers were used in an inverse PCR reaction. The PCR primer sequences were as follows:


[0092] 128185.snr1 (SEQ ID NO.1)>pGCTATCCTAAAGGGTTGTTCTTCTTCAGAGCAT and


[0093] 128185.snf1(SEQ ID NO.2)>pCACTGCAACATCATGACCAAAGACAAAGA.


[0094] In a 50 ul reaction, the following reagents were added: 50 ng of a bone marrow cDNA library in the vector pRK5D, which was propagated in a methylation positive bacteria; 50 picomoles of each PCR primer; 10 nmoles of each deoxynucleotide triphosphate, 5 ul of Pfu10× buffer (Stratagene, La Jolla Calif.), and 1 μl of Pfu Turbo (Stratagene, La Jolla Calif.). The plasmid vector pRK5 (4,661 bp) has been described (EP 307,247 published Mar. 15, 1989); pRK5D (5117 hp) is a derivative of pRK5.


[0095] The PCR cycle conditions were one cycle at 94° C. for 3 minutes, then 94° C. for 30 seconds, 65° C. for 30 seconds, 72° C. for 13 minutes for 20 cycles. The PCR reaction generated a linear 5′ phosphorylated amplicon that contained the Toll 6 cDNA insert plus the pRK5D vector.


[0096] Next, 10 ul of the completed PCR reaction was ligated in a 100 ul reaction containing the following other reagents: 10 ul 10×T4 DNA ligase buffer (New England BioLabs, Beverly, Mass.), 4 ul T4 DNA ligase (New England BioLabs, Beverly, Mass.), 76 ul H2O. The ligation was allowed to incubated at ambient temperature for 1 hour on the bench top.


[0097] After 1 hour of ligation, 2 ul of the restriction enzyme Dpn1 (New England BioLabs, Beverly, Mass.) was added to the ligation reaction and the digestion was allowed to continue for 1 hour at 37° C. Dpn1 will specifically digest methylated DNA and not unmethylated DNA; therefore, the original bone marrow cDNA library which was used as a template will be digested, leaving only the Toll 6/vector amplicon intact. After the completion of the digestion the sample was cleaned using the QIAquick PCR purification kit (Qiagen, Valencia, Calif.), eluted in 30 ul of elution buffer or H2O, and then ethanol precipitated. The pellet was resuspended in 2 ul of H2O and the entire sample was then used to transformed bacteria.


[0098] Transformation was done by electroporation into DHIOB electromax competent bacteria (Life Technologies, Rockville, Md.). The transformed bacteria were plated on Luria broth agar plates and colonies allowed to grow overnight at 37° C.


[0099] The next day, the colonies were lifted onto a nylon membrane, denatured, renatured and probed with a 32P-ATP kinase-labeled, Toll 6-specific probe. The sequence of the Toll 6 specific probe was


[0100] 128185.p1 (SEQ ID NO.3)>GTTAGCCTGCCAGTTAGAGACAGCCCA.


[0101] Positive colonies were sent to sequencing to confirm that the sequence did not contain point mutations introduced by the PCR reaction.


[0102] Other known genes of different lengths and base composition were cloned following the procedure as described in this Example. These genes are identified by GenBank Accession Nos. as indicated in Table 1.


[0103] Table 2 shows the results of using FLIP to isolate 22 novel (unknown) target genes from a single library, or a mixture of 8 different libraries in a single IPCR reaction mix. The novel genes are identified by DNA#. Table 2 compares the percentage of total plated clones that are positive on probing for the target sequence and the overall success rate of isolating the sought after genes isolated. CFU stand for bacterial colony forming units when plated.
1TABLE 1GeneGenbank accession #ADAMS19AF134707Human secreted protein 155NM_007126PeflinAB026628Human TNCBM18217PLA-2AF058921OX40AR048669Placental proteinAF051315FGF16AB009391YLAT-1AJ130718CGI-128 proteinAF151886CD82NM_002231hPTTG GPI anchored proteinZ48042Sorting nexon 9AF121859TACC3AF093543Inosine 5′ monophosphateNM_000884KIAA1036AB028959


[0104]

2







TABLE 2










FLIP




(8-mix libraries)

FLIP













CFU



CFU




(per ml)
+
% +
Gene #
(per ml)
+
% +
















1,670
820
49
01
65
25
38


1,750
1400
80
02
175
150
86


900
0
0
03
20
0
0


25,200
18,000
71
04
167
110
66


110
0
0
05
32
0
0


125
0
0
06
22
0
0


130
2
1.5
07
22
12
55


300
0
0
08
250
10
4


4,120
0
0
09
2,590
0
0


3,000
1400
47
10
180
150
83


140
17
12
11
82
0
0


100
0
0
12
45
0
0


1,160
780
67
13
32
20
63


5,400
302
5.6
14
25
0
0


2,000
400
20
15
90
2
2.2


40
0
0
16
7
0
0


16,000
0
0
17
440
0
0


417
33
7.9
18
62
5
8.1


150
15
10
19
122
0
0


2,000
2
0.1
20
1,930
260
13


1,030
650
63
21
30
2
6.6


1,400
30
2.1
22
1,680
440
26















FLIP (mixed)
FLIP







Success Rate
13/21 (62%)
11.21 (52%)



Average % Positive
31% (STD 30.11)
39% (STD 33.11)












Example 2


Isolation of cDNA Clones Encoding Human DNA98853 Polypeptide

[0105] Based upon the DNA sequence of Incyte clone 509 151 1H (SEQ ID NO:4) shown in FIG. 3 (from the Incyte Pharmaceuticals LIFESEQ™ database), oligonucleotides were synthesized to identify by PCR, a cDNA library that contained the sequence of interest. These oligonucleotides were:
3Forward primer:(SEQ ID NO:5)5′ GAGGGGGCTGGGTGAGATGTG 3′(509-1)Reverse primer:(SEQ ID NO:6)5′ TGCTTTTGTACCTGCGAGGAGG 3′(509-4AS)


[0106] To isolate the full length coding sequence for DNA98853 polypeptide, the FLIP procedure (also referred to as inverse long distance PCR) was carried out (see FIG. 2). The PCR primers generally ranged from 20 to 30 nucleotides. For inverse long distance PCR, primer pairs were designed in such a way that the 5′ to 3′ direction of each primer pointed away from each other.


[0107] A pair of inverse long distance PCR primers for cloning DNA98853 were synthesized:
4Primer 1 (left primer):(SEQ ID NO:7)5′ pCATGGTGGGAAGGCCGGTAACG 3′(509-P5)Primer 2 (right primer):(SEQ ID NO:8)5′ pGATTGCCAAGAAAATGAGTACTGGGACC 3′(509-P6)


[0108] In the inverse long distance PCR reaction, the template is plasmid cDNA library. As a result, the PCR products contain the entire vector sequence in the middle with insert sequences of interest at both ends. After the PCR reaction, the PCR mixture was treated with Dpn I which digests only the template plasmids, followed by agarose gel purification of PCR products of larger than the size of the library cloning vector. Since the primers used in the inverse long distance PCR were also 5′-phosphorylated, the purified products were then self-ligated and transformed into E. coli competent cells. Colonies were screened by PCR using 5′ vector primer and proper gene specific primer to identify clones with larger 5′ sequence. Plasmids prepared from positive clones were sequenced. If necessary, the process could be repeated to obtain more 5′ sequences based on new sequence obtained from the previous round.


[0109] The purpose of inverse long distance PCR in this experiment is to obtain the complete sequence of the gene of interest. The clone containing the full length coding region was then obtained by conventional PCR.


[0110] The primer pair used to clone the full length coding region of DNA98853 were synthesized:


[0111] Forward primer:


[0112] 5′ ggaggatcgatACCATGGATTGCCAAGAAAATGAG 3′ (Cla-MD-509) (SEQ ID NO:9)


[0113] Reverse primer:


[0114] 5′ ggaggagcggccgcttaAGGGCTGGGAACTTCAAAGGGCAC (509.TAA.not) (SEQ ID NO: 10)


[0115] For cloning purposes, a Cla I site and a Not I site were included in the forward primer and reverse primer respectively.


[0116] To ensure the accuracy of the PCR products, independent PCR reactions were performed and several cloned products were sequenced.


[0117] DNA sequencing of the clones isolated as described above gave the full-length DNA sequence for DNA98853 polypeptide (herein designated as DNA98853-1739) (SEQ ID NO: 11) and the derived protein sequence for DNA98853 polypeptide (SEQ ID NO: 12).


[0118]
FIG. 1 shows the nucleotide sequence of the ORF of DNA98853. The entire cDNA sequence is much longer and includes 5′ and 3′ UT (untranslated region). Clone DNA98853-1739 was deposited with the ATCC under the Budapest Treaty, on Apr. 6, 1999 and assigned ATCC Deposit No. ATCC 203906. American Type Culture Collection (ATCC) is located at 10801 University Boulevard, Manassas, Va. 20110-2209. Clone DNA98853 contains a single open reading frame with an apparent translational initiation site at nucleotide positions 4-6 and ending at the stop codon at nucleotide positions 901-903 (FIG. 4). The predicted polypeptide precursor is 299 amino acids long (FIG. 5) (SEQ ID NO: 12). The full-length DNA98853 polypeptide protein shown in FIG. 5 has an estimated molecular weight of about 3.3 kilodaltons and a p1 of about 4.72. A potential N-glycosylation site exists between amino acids 74 and 77 of the amino acid sequence shown in FIG. 5. A potential N-myristoylation site exists between amino acids 24 and 29 of the amino acid sequence shown in FIG. 5. Potential casein kinase II phosphorylation sites exist between amino acids 123-126, 185-188, 200-203, 252-255, 257-260, 271-274, and 283-286 of the amino acid sequence shown in FIG. 5. A potential transmembrane domain exists between amino acids 137 to 158 of the sequence shown in FIG. 5. It is presently believed that the polypeptide does not include a signal sequence.


[0119] Analysis of the amino acid sequence of the full-length DNA98853 polypeptide suggests that portions of it possess homology to members of the tumor necrosis factor receptor family, thereby indicating that DNA98853 polypeptide may be a novel member of the tumor necrosis factor receptor family. There are three apparent extracellular cysteine-rich domains characteristic of the TNFR family [see, Naismith and Sprang, Trends Biochem. Sci., 23:74-79 (1998)], of which the first two CRDs have 6 cysteines while the third CRD has 4 cysteines.


[0120] Conclusion


[0121] The results of these experiments demonstrated that the present FLIP method has a high success rate in efficiently cloning a wide variety of genes.



REFERENCES

[0122] The practice of the present invention will employ, unless otherwise indicated, conventional techniques of PCR, molecular biology and the like, which are within the skill of the art. Such techniques are explained fully in the literature. See e.g., Molecular Cloning: A Laboratory Manual, (J. Sambrook et al., Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y., 1989); Current Protocols in Molecular Biology (F. Ausubel et al. eds., 1987 and updated); Essential Molecular Biology (T. Brown ed., IRL Press 1991); Gene Expression Technology (Goeddel ed., Academic Press 1991); Methods for Cloning and Analysis of Eukaryotic Genes (A. Bothwell et al. eds., Bartlett Publ. 1990); Gene Transfer and Expression (M. Kriegler, Stockton Press 1990); Recombinant DNA Methodology II (R. Wu ed., Academic Press 1995); PCR: A Practical Approach (M. McPherson et al., IRL Press at Oxford University Press 1991); Oligonucleotide Synthesis (M. Gait ed., 1984); Cell Culture for Biochemists (R. Adams ed., Elsevier Science Publishers 1990); Gene Transfer Vectors for Mammalian Cells (J. Miller & M. Calos eds., 1987).


Claims
  • 1. A method of isolating a nucleic acid molecule of interest from a mixture of nucleic acid molecules, comprising: (i) providing a recombinant nucleic acid library comprising a heterogeneous population of circular template nucleic acid molecules modified such that they are selectively digestible by an enzyme; (ii) contacting the library with a first primer and a second primer, said primers capable of annealing to complementary strands of a nucleic acid molecule of interest present in the library to produce an annealed mixture, and wherein the two primers extend in opposite directions relative to each other during primer extension; and the 5′ ends of the two primers are adjacent to each other; (iii) subjecting the annealed mixture to conditions under which primer extension occurs, thereby producing a reaction mixture containing linear primer extension products; (iv) digesting the mixture containing linear primer extension products with the enzyme of (i), wherein said enzyme selectively digests template nucleic acid molecules but not the primer extension products, resulting in a population enriched for the nucleic acid molecule of interest.
  • 2. The method of claim 1, wherein the two primers are phosphorylated on their 5′ ends.
  • 3. The method of claim 2, further comprising ligating the linear primer extension products prior to the digesting step to produce circular primer extension products.
  • 4. The method of claim 2, further comprising, after the digesting step, isolating the primer extension products and ligating the isolated products.
  • 5. The method of claim 4, wherein the primer extension products are isolated by gel purification.
  • 6. The method of claim 1, wherein the enzyme is a restriction endonuclease.
  • 7. The method of claim 6, wherein the template circular nucleic acid molecules of (i) are methylated and the enzyme is a restriction endonuclease.
  • 8. The method of claim 1, wherein the library comprises nucleic acid molecules obtained from two or more tissue sources.
  • 9. The method of claim 1, wherein the nucleic acid library is a double-stranded DNA library.
  • 10. The method of claim 1, wherein the nucleic acid library is a human cDNA library.
  • 11. The method of claim 1, wherein the recombinant DNA library is a human genomic DNA library.
  • 12. The method of claim 1, wherein the nucleic acid molecule of interest is represented in the library at a frequency of equal to or less than one in 5×105 clones.
  • 13. The method of claim 1, wherein the nucleic acid molecule of interest is represented in the library at a frequency of equal to or less than one in 1×106 clones.
  • 14. The method of claim 3, further comprising transforming the circular primer extension products into a suitable host cell after the digesting step, to generate clones.
  • 15. The method of claim 14, wherein the host cell is a competent bacterial host cell.
  • 16. The method of claim 14, further comprising screening the clones to identify the clone containing the nucleic acid molecule of interest.
  • 17. The method of claim 14, further comprising sequencing the nucleic acid isolated from the clones.
  • 18. The method of claim 8, wherein the library is provided as a microarray and two or more nucleic acid molecules of interest are isolated simultaneously.
  • 19. The method of claim 1, wherein multiple DNA libraries are provided in a single mix.
  • 20. The method of claim 19, wherein greater than 50 cDNA libraries are in the single mix.
Parent Case Info

[0001] This is a continuation application filed under 37 CFR 1.53(b) of application serial no. 10/119,466, filed Apr. 9, 2002, which is a continuation claiming priority to application Ser. No. 09/480,782 filed Jan. 10, 2000, which claims priority under Section 119(e) to provisional application No. 60/128,849 filed on Apr. 12, 1999, which applications are incorporated herein in their entirety by reference.

Provisional Applications (1)
Number Date Country
60128849 Apr 1999 US
Continuations (2)
Number Date Country
Parent 10119466 Apr 2002 US
Child 10413053 Apr 2003 US
Parent 09480782 Jan 2000 US
Child 10119466 Apr 2002 US