Methods and compositions of the invention relate to nucleic acid assembly, and particularly to multiplex nucleic acid assembly reactions.
Recombinant and synthetic nucleic acids have many applications in research, industry, agriculture, and medicine. Recombinant and synthetic nucleic acids can be used to express and obtain large amounts of polypeptides, including enzymes, antibodies, growth factors, receptors, and other polypeptides that may be used for a variety of medical, industrial, or agricultural purposes. Recombinant and synthetic nucleic acids also can be used to produce genetically modified organisms including modified bacteria, yeast, mammals, plants, and other organisms. Genetically modified organisms may be used in research (e.g., as animal models of disease, as tools for understanding biological processes, etc.), in industry (e.g., as host organisms for protein expression, as bioreactors for generating industrial products, as tools for environmental remediation, for isolating or modifying natural compounds with industrial applications, etc.), in agriculture (e.g., modified crops with increased yield or increased resistance to disease or environmental stress, etc.), and for other applications. Recombinant and synthetic nucleic acids also may be used as therapeutic compositions (e.g., for modifying gene expression, for gene therapy, etc.) or as diagnostic tools (e.g., as probes for disease conditions, etc.).
Numerous techniques have been developed for modifying existing nucleic acids (e.g., naturally occurring nucleic acids) to generate recombinant nucleic acids. For example, combinations of nucleic acid amplification, mutagenesis, nuclease digestion, ligation, cloning and other techniques may be used to produce many different recombinant nucleic acids. Chemically synthesized polynucleotides are often used as primers or adaptors for nucleic acid amplification, mutagenesis, and cloning.
Techniques also are being developed for de novo nucleic acid assembly whereby nucleic acids are made (e.g., chemically synthesized) and assembled to produce longer target nucleic acids of interest. For example, different multiplex assembly techniques are being developed for assembling oligonucleotides into larger synthetic nucleic acids that can be used in research, industry, agriculture, and/or medicine.
Aspects of the invention relate to multiplex nucleic acid assembly reactions. In some embodiments, methods, compositions, devices and systems of the invention are useful for enhancing the fidelity of nucleic acid assembly reactions. Aspects of the invention relate to the use of one or more mismatch binding proteins to enrich an assembled nucleic acid sample for nucleic acids having a correct sequence. In some embodiments, an enrichment procedure is performed under conditions that promote the formation of a sliding clamp configuration of a mismatch binding protein.
In one aspect, the invention provides methods for preparing a target nucleic acid by contacting a sample of double-stranded nucleic acids with a MutS or MutS homolog in the presence of ADP for a time and under conditions that allow the MutS or MutS homolog to bind to heteroduplex nucleic acids. The MutS homolog may be of a human-, murine-rat-, Drosophila-yeast-, or Saccharamyces cerevisiae-origin, or others. The MutS homolog may be selected from the group consisting of MSH2, MSH3, MSH4, MSH5 and MSH6, which form a dimer, e.g., MSH2:MSH3, MSH2:MSH6, and MSH4:MSH5. In some embodiments, ATP is provided at a concentration that is greater than the concentration of ADP in the sample (e.g., about 40 times greater than the concentration of ADP). In some embodiments, ATP concentration is increased in the sample to a concentration that promotes the formation of a clamped form of the MutS or MutS homolog (e.g., to at least 10 μM, 400 μM, or 1 mM). In some embodiments, the double-stranded nucleic acids are double-stranded oligonucleotides. The sample may be enriched for homoduplex nucleic acids by increasing the ratio of unbound nucleic acids to nucleic acids that are bound to the MutS or MutS homolog (e.g., to the sliding clamp form of the MutS or MutS homolog). In some embodiments, heteroduplex nucleic acids are preferentially removed by removing nucleic acids bound to the MutS or MutS homolog (e.g., to the sliding clamp form of the MutS or MutS homolog). For example, in some embodiments, the nucleic acids are removed by cleaving the nucleic acids bound to the MutS or MutS homolog. In some embodiments, the nucleic acids are removed by exposing the sample to a material that binds to the MutS or MutS homolog. In yet other embodiments, the nucleic acids are removed by filtering the sample through a filter (e.g., a nitrocellulose filter). In some embodiments, the double-stranded nucleic acids are linear and are blocked at each end. In some embodiments, the double-stranded nucleic acids are circular (e.g., they have been circularized). In some embodiments, the double-stranded nucleic acids are synthetically assembled nucleic acids. In some embodiments, the double-stranded nucleic acids are circularized by cloning into a vector. In some embodiments, the double-stranded nucleic acids are circularized before increasing the ATP concentration. In some embodiments, the double-stranded nucleic acids are circularized before the sample is contacted with the MutS or MutS homolog. In some embodiments, double-stranded nucleic acids that are not bound to the MutS or MutS homolog are isolated. In some embodiments, the ratio of unbound to bound nucleic acids is increased by selectively amplifying double-stranded nucleic acids that are not bound to a MutS or MutS homolog. In some circumstances where double-stranded nucleic acids are circularized, the double-stranded nucleic acids are linearized in the sample after nucleic acids bound to the MutS or MutS homolog are removed. In some embodiments, the double-stranded nucleic acids range from 100 to 800 bases in length (e.g., about 400 bases long). In some embodiments, the enriched homoduplex nucleic acids are cloned into a vector, which can in some circumstances be used to transform a host cell. In some embodiments, cells are transformed with double-stranded nucleic acids that are not bound to the MutS or MutS homolog. In some embodiments, cells are transformed with a sample that has been enriched for double-stranded nucleic acids that are not bound to the MutS or MutS homolog.
The assembly reaction may include a polymerase and/or a ligase. In some embodiments the assembly reaction involves two or more cycles of denaturing, annealing, and extension conditions. In some embodiments, the target nucleic acid may be amplified, sequenced or cloned after it is made. In some embodiments, a host cell may be transformed with the assembled target nucleic acid. The target nucleic acid may be integrated into the genome of the host cell. In some embodiments, the target nucleic acid may encode a polypeptide. The polypeptide may be expressed (e.g., under the control of an inducible promoter). The polypeptide may be isolated or purified. A cell transformed with an assembled nucleic acid may be stored, shipped, and/or propagated (e.g., grown in culture).
In another aspect, the invention provides methods of obtaining target nucleic acids by sending sequence information and delivery information to a remote site. The sequence may be analyzed at the remote site. The starting nucleic acids may be designed and/or produced at the remote site. The starting nucleic acids may be assembled in a process involving an enrichment using a mismatch binding protein under conditions that promote formation of a sliding clamp conformation of the protein at the remote site. In some embodiments, the starting nucleic acids, an intermediate product in the assembly reaction, and/or the assembled target nucleic acid may be shipped to the delivery address that was provided.
Other aspects of the invention provide systems for designing starting nucleic acids and/or for assembling the starting nucleic acids to make a target nucleic. Other aspects of the invention relate to methods and devices for automating a multiplex oligonucleotide assembly reaction that involves an enrichment using a sliding clamp form of a mismatch binding protein. Yet further aspects of the invention relate to business methods of marketing one or more methods, systems, and/or automated procedures that involve nucleic acid enrichment using a sliding clamp form of a mismatch binding protein.
Other features and advantages of the invention will be apparent from the following detailed description, and from the claims. The claims provided below are hereby incorporated into this section by reference.
Aspects of the invention relate to enhancing nucleic acid assembly procedures by using one or more mismatch binding proteins to reduce the amount or frequency of error containing nucleic acids generated during an assembly reaction. According to the invention, one or more mismatch binding proteins (e.g., MutS or a MutS homolog) may be used to recognize and remove heteroduplex nucleic acids generated during a fidelity optimization procedure that may be implemented to remove error-containing nucleic acids from a nucleic acid assembly reaction. In some embodiments of the invention, MutS or a MutS homolog may be a thermostable counterpart thereof, e.g., Taq MutS. In aspects of the invention, one or more mismatch binding proteins may be used under conditions that promote a stable association between the mismatch binding protein(s) and one or more heteroduplex nucleic acids. For example, conditions that promote the formation of a sliding clamp form of MutS or MutS homolog may be used. Under certain conditions, increased stability of mismatch binding proteins associated with heteroduplex nucleic acids may be used to remove a relatively higher number or percentage of error-containing molecules from a nucleic acid assembly reaction.
Accordingly, aspects of the invention may be useful for increasing the fidelity of a nucleic acid assembly reaction (e.g., increasing the proportion of assembled nucleic acids that have a desired predetermined target sequence). As a result, fewer error correction, screening, and/or sequencing steps may be used when assembling a predetermined target nucleic acid from a plurality of starting nucleic acids. In other aspects, increased fidelity of the assembly procedure provides for greater flexibility in the choice of starting nucleic acids. For example, starting nucleic acids (e.g., oligonucleotides) with higher sequence error rates may be tolerated more readily in an assembly procedure that has a higher fidelity. Therefore, aspects of the invention may be useful to increase the throughput rate of a nucleic acid assembly procedure and/or reduce the number of steps or amounts of reagent used to generate a correctly assembled nucleic acid. In certain embodiments, aspects of the invention may be useful in the context of automated nucleic acid assembly to reduce the time, number of steps, amount of reagents, and other factors required for the assembly of each correct nucleic acid. Accordingly, these and other aspects of the invention may be useful to reduce the cost and time of one or more nucleic acid assembly procedures.
Aspects of the invention may be used in conjunction with in vitro and/or in vivo nucleic acid assembly procedures. According to aspects of the invention, a nucleic acid assembly reaction may involve the assembly of a plurality of nucleic acids (e.g., polynucleotides, oligonucleotides, etc.) to form a longer nucleic acid product. Methods and compositions of the invention may be used to remove error containing nucleic acid products from a pool of assembled nucleic acids generated using any of a variety of nucleic acid assembly procedures. Non-limiting examples of assembly reactions are described herein and illustrated in
According to the invention, a preparation of de novo assembled nucleic acids may contain one or more subsets of nucleic acids that have one or more sequence errors in addition to nucleic acids that have the correct desired sequence. In some embodiments, the sequence errors may be present on only one copy of a double-stranded heteroduplex nucleic acid molecule. In addition, or alternatively, the sequence errors may be present on both strands of a double-stranded homoduplex error-containing nucleic acid molecule. According to the invention, denaturing and reannealing reactions may be used to promote the formation of heteroduplex error-containing nucleic acids each incorporating one strand of an error-free nucleic acid and one strand of an error-containing nucleic acid. The amount or percentage of heteroduplex formation may depend on the relative amounts of homoduplex error-containing nucleic acids and homoduplex error-free nucleic acids that were denatured and reannealed. However, regardless of the percentage or amount of heteroduplexes that are formed, their removal using a mismatch binding protein may enrich the nucleic acid sample for error-free nucleic acids. According to aspects of the invention, this enrichment process may be more effective when a mismatch binding protein is used under conditions that promote a stable interaction between the mismatch binding protein and heteroduplex-containing nucleic acid.
In some embodiments, certain conditions that promote the formation of a sliding clamp form of MutS or a MutS homolog may be used. For example, a heterogeneous pool of double-stranded DNA molecules comprising homoduplex and heteroduplex polynucleotides can be separated into a fraction enriched with error-free homoduplex polynucleotides and a fraction enriched with mismatch-containing heteroduplex polynucleotides using a method that takes advantage of the properties of MutS and/or one of its homologs. MutS and its homologs recognize and selectively bind to double-stranded heteroduplex polynucleotides having one of more mismatched nucleotides (e.g., due to a nucleotide change a deletion, or an insertion on one strand). In the presence of ADP, MutS specifically binds to a mismatched site of a heteroduplex polynucleotide. A subsequent addition of ATP promotes dissociation of MutS from the mismatched site. However, MutS remains tightly associated with the polynucleotide in the form of a sliding clamp that can diffuse along the polynucleotide (Gradia et al, 1999, Mol Cell, 3:255-61). According to the invention, MutS remains tightly associated with a heteroduplex polynucleotide under these conditions provided that the polynucleotide does not have a free end where MutS may dissociate or “fall off.” In some embodiments, the ends of the polynucleotide may be blocked to prevent MutS dissociation. In some embodiments, the polynucleotide may be circularized to prevent MutS dissociation. The resulting MutS bound heteroduplex may be removed from the nucleic acid sample as described in more detail herein.
In act 510, the sequence information may be analyzed to determine an assembly strategy. This may involve determining whether the target nucleic acid will be assembled as a single fragment or if several intermediate fragments will be assembled separately and then combined in one or more additional rounds of assembly to generate the target nucleic acid. Once the overall assembly strategy has been determined, input nucleic acids (e.g., oligonucleotides) for assembling the one or more nucleic acid fragments may be designed. The sizes and numbers of the input nucleic acids may be based in part on the type of assembly reaction (e.g., the type of polymerase-based assembly, ligase-based assembly, chemical assembly, or combination thereof) that is being used for each fragment. The input nucleic acids also may be designed to avoid 5′ and/or 3′ regions that may cross-react incorrectly and be assembled to produce undesired nucleic acid fragments. Other structural and/or sequence factors also may be considered when designing the input nucleic acids. In certain embodiments, some of the input nucleic acids may be designed to incorporate one or more specific sequences (e.g., primer binding sequences, restriction enzyme sites, etc.) at one or both ends of the assembled nucleic acid fragment.
In act 520, the input nucleic acids are obtained. These may be synthetic oligonucleotides that are synthesized on-site or obtained from a different site (e.g., from a commercial supplier). In some embodiments, one or more input nucleic acids may be amplification products (e.g., PCR products), restriction fragments, or other suitable nucleic acid molecules. Synthetic oligonucleotides may be synthesized using any appropriate technique as described in more detail herein. It should be appreciated that synthetic oligonucleotides often have sequence errors. Accordingly, oligonucleotide preparations may be selected or screened to remove error-containing molecules as described in more detail herein.
In act 530, an assembly reaction may be performed for each nucleic acid fragment. For each fragment, the input nucleic acids may be assembled using any appropriate assembly technique (e.g., a polymerase-based assembly, a ligase-based assembly, a chemical assembly, or any other multiplex nucleic acid assembly technique, or any combination thereof). An assembly reaction may result in the assembly of a number of different nucleic acid products in addition to the predetermined nucleic acid fragment. Accordingly, in some embodiments, an assembly reaction may be processed to remove incorrectly assembled nucleic acids (e.g., by size fractionation) and/or to enrich correctly assembled nucleic acids (e.g., by amplification, optionally followed by size fractionation). In some embodiments, correctly assembled nucleic acids may be amplified (e.g., in a PCR reaction) using primers that bind to the ends of the predetermined nucleic acid fragment. It should be appreciated that act 530 may be repeated one or more times. For example, in a first round of assembly a first plurality of input nucleic acids (e.g., oligonucleotides) may be assembled to generate a first nucleic acid fragment. In a second round of assembly, the first nucleic acid fragment may be combined with one or more additional nucleic acid fragments and used as starting material for the assembly of a larger nucleic acid fragment. In a third round of assembly, this larger fragment may be combined with yet further nucleic acids and used as starting material for the assembly of yet a larger nucleic acid. This procedure may be repeated as many times as needed for the synthesis of a target nucleic acid. Accordingly, progressively larger nucleic acids may be assembled. At each stage, nucleic acids of different sizes may be combined. At each stage, the nucleic acids being combined may have been previously assembled in a multiplex assembly reaction. However, at each stage, one or more nucleic acids being combined may have been obtained from different sources (e.g., PCR amplification of genomic DNA or cDNA, restriction digestion of a plasmid or genomic DNA, or any other suitable source). It should be appreciated that nucleic acids generated in each cycle of assembly may contain sequence errors if they incorporated one or more input nucleic acids with sequence error(s). Accordingly, a fidelity optimization procedure may be performed after a cycle of assembly in order to remove or correct sequence errors. It should be appreciated that fidelity optimization may be performed after each assembly reaction when several successive cycles of assembly are performed. However, in certain embodiments fidelity optimization may be performed only after a subset (e.g., 2 or more) of successive assembly reactions are complete. In some embodiments, no fidelity optimization is performed.
Accordingly, act 540 is an optional fidelity optimization procedure. Act 540 may be used in some embodiments to remove nucleic acid fragments that seem to be correctly assembled (e.g., based on their size or restriction enzyme digestion pattern) but that may have incorporated input nucleic acids containing sequence errors as described herein. For example, since synthetic oligonucleotides may contain incorrect sequences due to errors introduced during oligonucleotide synthesis, it may be useful to remove nucleic acid fragments that have incorporated one or more error-containing oligonucleotides during assembly. In some embodiments, one or more assembled nucleic acid fragments may be sequenced to determine whether they contain the predetermined sequence or not. This procedure allows fragments with the correct sequence to be identified. However, in some embodiments, other techniques may be used to remove error containing nucleic acid fragments. It should be appreciated that error containing-nucleic acids may be double-stranded homoduplexes having the error on both strands (i.e., incorrect complementary nucleotide(s), deletion(s), or addition(s) on both strands), because the assembly procedure may involve one or more rounds of polymerase extension (e.g., during assembly or after assembly to amplify the assembled product) during which an input nucleic acid containing an error may serve as a template thereby producing a complementary strand with the complementary error. In certain embodiments, a preparation of double-stranded nucleic acid fragments may be suspected to contain a mixture of nucleic acids that have the correct sequence and nucleic acids that incorporated one or more sequence errors during assembly. In some embodiments, sequence errors may be removed using a technique that involves denaturing and reannealing the double-stranded nucleic acids. In some embodiments, single strands of nucleic acids that contain complementary errors may be unlikely to reanneal together if nucleic acids containing each individual error are present in the nucleic acid preparation at a lower frequency than nucleic acids having the correct sequence at the same position. Rather, error containing single strands may reanneal with a complementary strand that contains no errors or that contains one or more different errors. As a result, error-containing strands may end up in the form of heteroduplex molecules in the reannealed reaction product. Nucleic acid strands that are error-free may reanneal with error-containing strands or with other error-free strands. Reannealed error-free strands form homoduplexes in the reannealed sample. Accordingly, by removing heteroduplex molecules from the reannealed preparation of nucleic acid fragments, the amount or frequency of error containing nucleic acids may be reduced. Any suitable method for removing heteroduplex molecules may be used, including chromatography, electrophoresis, selective binding of heteroduplex molecules, etc. In some embodiments, mismatch binding proteins that selectively (e.g., specifically) bind to heteroduplex nucleic acid molecules may be used. One example includes using MutS, a MutS homolog, or a combination thereof to bind to heteroduplex molecules. In E. coli, the MutS protein, which appears to function as a homodimer, serves as a mismatch recognition factor. In eukaryotes, at least three MutSHomolog (MSH) proteins have been identified; namely, MSH2, MSH3, and MSH6, and they form heterodimers. For example in the yeast, Saccharomyces cerevisiae, the MSH2-MSH6 complex (also known as MutSα) recognizes base mismatches and single nucleotide insertion/deletion loops, while the MSH2-MSH3 complex (also known as MutSβ) recognizes insertions/deletions of up to 12-16 nucleotides, although they exert substantially redundant functions. A mismatch binding protein may be obtained from recombinant or natural sources. A mismatch binding protein may be heat-stable. In some embodiments, a thermostable mismatch binding protein from a thermophilic organism may be used. Examples of thermostable DNA mismatch binding proteins include, but are not limited to: Tth MutS (from Thermus thermophilus); Taq MutS (from Thermus aquaticus); Apy MutS (from Aquifex pyrophilus); Tma MutS (from Thermotoga maritima); any other suitable MutS; or any combination of two or more thereof.
According to aspects of the invention, protein-bound heteroduplex molecules (e.g., heteroduplex molecules bound to one or more MutS proteins) may be removed from a sample using any suitable technique (binding to a column, a filter, a nitrocellulose filter, etc., or any combination thereof). In some embodiments, a sliding clamp form of MutS may be used to further enhance the effectiveness of this error removal procedure. It should be appreciated that this procedure may not be 100% efficient. Some errors may remain for at least one of the following reasons. Depending on the reaction conditions, not all of the double-stranded error-containing nucleic acids may be denatured. In addition, some of the denatured error-containing strands may reanneal with complementary error-containing strands to form an error containing homoduplex. Also, the MutS/heteroduplex interaction and the MutS/heteroduplex removal procedures may not be 100% efficient. Accordingly, in some embodiments the fidelity optimization act 540 may be repeated one or more times after each assembly reaction. For example, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more cycles of fidelity optimization may be performed after each assembly reaction. In some embodiments, the nucleic acid is amplified after each fidelity optimization procedure. It should be appreciated that each cycle of fidelity optimization will remove additional error-containing nucleic acid molecules. However, the proportion of correct sequences is expected to reach a saturation level after a few cycles of this procedure.
In some embodiments, the size of an assembled nucleic acid that is fidelity optimized (e.g., using MutS or a MutS homolog) may be determined by the expected number of sequence errors that are suspected to be incorporated into the nucleic acid during assembly. For example, an assembled nucleic acid product should include error free nucleic acids prior to fidelity optimization in order to be able to enrich for the error free nucleic acids. Accordingly, error screening (e.g., using MutS or a MutS homolog) should be performed on shorter nucleic acid fragments when input nucleic acids have higher error rates. In some embodiments, one or more nucleic acid fragments of between about 200 and about 800 nucleotides (e.g., about 200, about 300, about 400, about 500, about 600, about 700 or about 800 nucleotides in length) are assembled prior to fidelity optimization. After assembly, the one or more fragments may be exposed to one or more rounds of fidelity optimization as described herein. In some embodiments, several assembled fragments may be ligated together (e.g., to produce a larger nucleic acid fragment of between about 1,000 and about 5,000 bases in length, or larger), and optionally cloned into a vector, prior to fidelity optimization as described herein.
At act 550, an output nucleic acid is obtained. As discussed herein, several rounds of act 530 and/or 540 may be performed to obtain the output nucleic acid, depending on the assembly strategy that is implemented. The output nucleic acid may be amplified, cloned, stored, etc., for subsequent uses at act 560. In some embodiments, an output nucleic acid may be cloned with one or more other nucleic acids (e.g., other output nucleic acids) for subsequent applications. Subsequent applications may include one or more research, diagnostic, medical, clinical, industrial, therapeutic, environmental, agricultural, or other uses.
It should be appreciated that analogues of ATP and or ADP may be used in embodiments of the invention. In some embodiments, a non-hydrolyzable ATP analogue may be used (e.g., ATPγS). However, any other suitable ATP and/or ADP analogue may be used.
The amount of a MutS or MutS homolog that may be added to a reaction may be from about 0.5 μM to about 5 μM. In some embodiments, the molar ratio of protein to nucleic acid may be about 1:1 or greater (e.g., about 4:1 or more). However, lower or higher amounts may be used.
As discussed herein, the sliding clamp form of MutS or MutS homolog may remain associated with the heteroduplex nucleic acid for a longer period of time if the ends of the nucleic acid are blocked. In some embodiments, the ends may be blocked by circularizing the nucleic acid molecules from the assembled nucleic acid sample. This may be achieved by the addition of a ligase and appropriate buffer. In some embodiments, the nucleic acid molecules may be ligated into a vector (see
In act 640, the sample is enriched for nucleic acids that are not bound to MutS or the MutS homolog. In some embodiments, nucleic acids bound to MutS or a MutS homolog are removed from the sample. In some embodiments, ATP may be removed from the sample once the ADP-ATP exchange occurs and a sliding clamp is formed. However, in some embodiments, ATP may remain in the sample. In some embodiments, the sample may be filtered or screened using any appropriate method that distinguishes between MutS-bound (or MutS homolog-bound) nucleic acids and unbound nucleic acids. In some embodiments, a separation method may be based on the size difference between the bound and unbound products. Separation may include gel separation, filtration, or any other suitable method. In some embodiments, separation may be based on different electrochemical properties of a bound nucleic acid versus an unbound nucleic acid. For example, separation may be based on differences in charge (e.g., between unbound nucleic acid and nucleic acid associated with one, or more copies of a Muts or Muts homolog). In some embodiments, separation may be based on an affinity technique. For example, an antibody or other binding agent that specifically binds to one or more epitopes on MutS or a MutS homolog may be used to isolate and/or remove bound nucleic acids. In certain embodiments, the binding agent may be attached to a support (e.g., a column, a bead, a gel, a matrix, or other suitable support). In certain embodiments, a MutS or MutS homolog may be modified to include a specific antigen (e.g., a known epitope or other tag) that may be used for enrichment (e.g., a specific binding agent that binds to the epitope may be used to remove MutS or MutS homolog-bound nucleic acids from a sample). In some circumstances, multiple types of MutS, MutS homolog or mixture thereof may be employed. In some embodiments, MutS or MutS homolog-bound nucleic acids may be preferentially removed by filtering a sample through a suitable filter. A suitable filter may be a nitrocelluloce membrane, a polyvinylidene fluoride (PVDF) membrane, or any other suitable membrane.
In some embodiments, MutS or the MutS homolog may be cross-linked to the heteroduplex nucleic acid. In certain embodiments, MutS or the MutS homolog may be covalently bound to a nuclease that preferentially degrades the heteroduplex nucleic acid it is associated with. In some embodiments, MutS or the MutS homolog (e.g., in the sliding clamp form) may interfere with amplification (e.g., PCR, LCR, rolling circle, etc.). Accordingly, act 640 may involve amplifying the nucleic acid in the sample using a technique that preferentially amplifies unbound nucleic acids. In some embodiments, rolling circle amplification may be used for preferentially amplifying circularized nucleic acids in act 640. In some embodiments, unbound nucleic acids may be preferentially amplified in vivo. A reaction sample from act 630 may be transformed directly into a host cell and unbound nucleic acids may be preferentially replicated in vivo. In some embodiments, a MutS or MutS homolog may be bound to a toxin. As a result, bound nucleic acids may be cytotoxic or cytostatic.
In act 650, the enriched nucleic acid sample may be processed for subsequent applications. In some embodiments, the nucleic acids may be cloned and sequenced in order to identify one or more nucleic acids with the correct desired sequence. These nucleic acids may be amplified and used in downstream applications as described herein.
In some embodiments, a recombinase (e.g., RecA) or nucleic acid binding protein may be used to increase the fidelity of one or more assembly reactions. In some embodiments, a heat stable RecA protein may be included in one or more reagents or steps of a multiplex nucleic acid assembly reaction. A heat stable RecA protein is disclosed, for example, in Shigemori et al., 2005, Nucleic Acids Research, Vol.33, No. 14, e126. Heat stable RecA proteins may be from one or more thermophilic organisms (e.g., Thermus thermophilus or other thermophilic organisms). Heat stable RecA proteins also may isolated as sequence variants of one or more heat sensitive RecA proteins.
In some aspects, methods and compositions of the invention may be useful to reduce the amount or frequency of error containing nucleic acids generated in a nucleic acid assembly reaction. However, in other aspects, methods and compositions of the invention may be useful to modify the reaction composition or conditions. For example, increased fidelity may allow less starting material to be used (e.g., less starting nucleic acids, less polymerase, less ligase, less nucleotides, etc.).
In some embodiments, starting nucleic acids with higher error rates may be used since the presence of errors may be overcome, at least in part, using one or more fidelity optimization techniques described herein. This may result in certain cost savings since cheaper nucleic acids (e.g., cheaper oligonucleotides) may be used. In addition, a sliding clamp method of the invention may obviate the need for one or more nucleic acid purification step(s) prior to assembly. Furthermore, a sliding clamp method may be particularly useful for filtering high numbers of errors that may be introduced by polymerases that are used for amplification of GC-rich regions (e.g., AccuPrime polymerase from invitrogen one or more nucleic acid.
In certain embodiments, the starting nucleic acids may include sequences that have certain secondary structures or repeat sequences at their 5′ and/or 3′ ends, since any incorrect pairing between starting nucleic acids that may result from the presence of secondary structures or repeat sequences in the 5′ or 3′ ends may be reduced by the use of one or more post-assembly fidelity-optimization techniques (e.g., using a sliding clamp technique as described herein). This allows for greater flexibility in the design of starting nucleic acids.
In some embodiments, starting nucleic acids may be designed with shorter overlapping sequences. Any nucleic acid mismatching due to decreased hybridization specificity associated with shorter overlaps may be reduced by the use of one or more post-assembly fidelity-optimization techniques (e.g., using a sliding clamp technique as described herein). This also allows for greater flexibility in the design of the starting nucleic acids. This also may reduce the cost of the assembly reaction since shorter starting nucleic acids may be used to produce the same target nucleic acid.
In some embodiments, other aspects of the assembly reaction may be modified to reduce cost (e.g., using less reagents or cheaper reagents) even if such a modification results in an increased error rate, because the increase error rate can be overcome using one or more enrichment techniques of the invention.
In some embodiments, high efficiency enrichment techniques of the invention may be used to allow for a greater number of starting nucleic acids and/or longer starting nucleic acids to be used. In some embodiments, longer nucleic acid products may be assembled before a first round of enrichment is performed. For example, fragments longer than 400 nucleotides (e.g., longer than 500, longer than 750, between about 750 and about 1,000, between about 1,000 and about 2,000, between about 2,000 and about 5,000 nucleotides, or longer) may be assembled prior to fidelity optimization (e.g. using a high efficiency enrichment technique of the invention).
In some embodiments, high efficiency enrichment techniques of the invention may be used in the assembly of long nucleic acids (e.g., longer than 1,000; 2,000; 5,000; 10,000; 20,000; 50,000; 100,000 or more nucleotides).
According to aspects of the invention, one or more enrichments may be performed at different stages in an assembly reaction. Different concentrations of mismatch binding protein may be used depending on the stability and activity of the protein, the amount of nucleic acid being assembled, and the reaction conditions being used. A useful amount of protein may be determined by performing parallel enrichment reactions in the presence of different amounts of the protein and determining which concentrations are more effective. Similarly, optimal amounts of ADP and ATP to be used during a sliding clamp enrichment technique may be determined using methods described herein. It should be appreciated that the cost of the protein and other reagents also may be considered when determining the optimal amounts to be used in an enrichment reaction.
In some embodiments, aspects of the invention may be used to isolate or purify heteroduplex-containing nucleic acids. Examples of applications include, but are not limited to, detection and analysis of single nucleotide polymorphism (SNP), and diagnosis of associated disorders or diseases.
Aspects of the invention may include automating one or more acts described herein. For example, an analysis may be automated in order to generate an output automatically. Acts of the invention may be automated using, for example, a computer system.
Aspects of the invention may be used in conjunction with any suitable multiplex nucleic acid assembly procedure involving at least two nucleic acids with complementary regions (e.g., at least one pair of nucleic acids that have complementary 3′ regions). For example, enrichment techniques of the invention may be use in connection with or more of the multiplex nucleic acid assembly procedures described below.
Multiplex Nucleic Acid Assembly
In aspects of the invention, multiplex nucleic acid assembly relates to the assembly of a plurality of nucleic acids to generate a longer nucleic acid product. In one aspect, multiplex oligonucleotide assembly relates to the assembly of a plurality of oligonucleotides to generate a longer nucleic acid molecule. However, it should be appreciated that other nucleic acids (e.g., single or double-stranded nucleic acid degradation products, restriction fragments, amplification products, naturally occurring small nucleic acids, other polynucleotides, etc.) may be assembled or included in a multiplex assembly reaction (e.g., along with one or more oligonucleotides) in order to generate an assembled nucleic acid molecule that is longer than any of the single starting nucleic acids (e.g., oligonucleotides) that were added to the assembly reaction. In certain embodiments, one or more nucleic acid fragments that each were assembled in separate multiplex assembly reactions (e.g., separate multiplex oligonucleotide assembly reactions) may be combined and assembled to form a further nucleic acid that is longer than any of the input nucleic acid fragments. In certain embodiments, one or more nucleic acid fragments that each were assembled in separate multiplex assembly reactions (e.g., separate multiplex oligonucleotide assembly reactions) may be combined with one or more additional nucleic acids (e.g., single or double-stranded nucleic acid degradation products, restriction fragments, amplification products, naturally occurring small nucleic acids, other polynucleotides, etc.) and assembled to form a further nucleic acid that is longer than any of the input nucleic acids.
In aspects of the invention, one or more multiplex assembly reactions may be used to generate target nucleic acids having predetermined sequences. In one aspect, a target nucleic acid may have a sequence of a naturally occurring gene and/or other naturally occurring nucleic acid (e.g., a naturally occurring coding sequence, regulatory sequence, non-coding sequence, chromosomal structural sequence such as a telomere or centromere sequence, etc., any fragment thereof or any combination of two or more thereof). In another aspect, a target nucleic acid may have a sequence that is not naturally-occurring. In one embodiment, a target nucleic acid may be designed to have a sequence that differs from a natural sequence at one or more positions. In other embodiments, a target nucleic acid may be designed to have an entirely novel sequence. However, it should be appreciated that target nucleic acids may include one or more naturally occurring sequences, non-naturally occurring sequences, or combinations thereof.
In one aspect of the invention, multiplex assembly may be used to generate libraries of nucleic acids having different sequences. In some embodiments, a library may contain nucleic acids having random sequences. In certain embodiments, a predetermined target nucleic acid may be designed and assembled to include one or more random sequences at one or more predetermined positions.
In certain embodiments, a target nucleic acid may include a functional sequence (e.g., a protein binding sequence, a regulatory sequence, a sequence encoding a functional protein, etc., or any combination thereof). However, some embodiments of a target nucleic acid may lack a specific functional sequence (e.g., a target nucleic acid may include only non-functional fragments or variants of a protein binding sequence, regulatory sequence, or protein encoding sequence, or any other non-functional naturally-occurring or synthetic sequence, or any non-functional combination thereof). Certain target nucleic acids may include both functional and non-functional sequences. These and other aspects of target nucleic acids and their uses are described in more detail herein.
A target nucleic acid may be assembled in a single multiplex assembly reaction (e.g., a single oligonucleotide assembly reaction). However, a target nucleic acid also may be assembled from a plurality of nucleic acid fragments, each of which may have been generated in a separate multiplex oligonucleotide assembly reaction. It should be appreciated that one or more nucleic acid fragments generated via multiplex oligonucleotide assembly also may be combined with one or more nucleic acid molecules obtained from another source (e.g., a restriction fragment, a nucleic acid amplification product, etc.) to form a target nucleic acid. In some embodiments, a target nucleic acid that is assembled in a first reaction may be used as an input nucleic acid fragment for a subsequent assembly reaction to produce a larger target nucleic acid.
Accordingly, different strategies may be used to produce a target nucleic acid having a predetermined sequence. For example, different starting nucleic acids (e.g., different sets of predetermined nucleic acids) may be assembled to produce the same predetermined target nucleic acid sequence. Also, predetermined nucleic acid fragments may be assembled using one or more different in vitro and/or in vivo techniques. For example, nucleic acids (e.g., overlapping nucleic acid fragments) may be assembled in an in vitro reaction using an enzyme (e.g., a ligase and/or a polymerase) or a chemical reaction (e.g., a chemical ligation) or in vivo (e.g., assembled in a host cell after transfection into the host cell), or a combination thereof. Similarly, each nucleic acid fragment that is used to make a target nucleic acid may be assembled from different sets of oligonucleotides. Also, a nucleic acid fragment may be assembled using an in vitro or an in vivo technique (e.g., an in vitro or in vivo polymerase, recombinase, and/or ligase based assembly process). In addition, different in vitro assembly reactions may be used to produce a nucleic acid fragment. For example, an in vitro oligonucleotide assembly reaction may involve one or more polymerases, ligases, other suitable enzymes, chemical reactions, or any combination thereof.
Multiplex Oligonucleotide Assembly
A predetermined nucleic acid fragment may be assembled from a plurality of different starting nucleic acids (e.g., oligonucleotides) in a multiplex assembly reaction (e.g., a multiplex enzyme-mediated reaction, a multiplex chemical assembly reaction, or a combination thereof). Certain aspects of multiplex nucleic acid assembly reactions are illustrated by the following description of certain embodiments of multiplex oligonucleotide assembly reactions. It should be appreciated that the description of the assembly reactions in the context of oligonucleotides is not intended to be limiting. The assembly reactions described herein may be performed using starting nucleic acids obtained from one or more different sources (e.g., synthetic or natural polynucleotides, nucleic acid amplification products, nucleic acid degradation products, oligonucleotides, etc.). The starting nucleic acids may be referred to as assembly nucleic acids (e.g., assembly oligonucleotides). As used herein, an assembly nucleic acid has a sequence that is designed to be incorporated into the nucleic acid product generated during the assembly process. However, it should be appreciated that the description of the assembly reactions in the context of single-stranded nucleic acids is not intended to be limiting. In some embodiments, one or more of the starting nucleic acids illustrated in the figures and described herein may be provided as double stranded nucleic acids. Accordingly, it should be appreciated that where the figures and description illustrate the assembly of single-stranded nucleic acids, the presence of one or more complementary nucleic acids is contemplated. Accordingly, one or more double-stranded complementary nucleic acids may be included in a reaction that is described herein in the context of a single-stranded assembly nucleic acid. However, in some embodiments the presence of one or more complementary nucleic acids may interfere with an assembly reaction by competing for hybridization with one of the input assembly nucleic acids. Accordingly, in some embodiments an assembly reaction may involve only single-stranded assembly nucleic acids (i.e., the assembly nucleic acids may be provided in a single-stranded form without their complementary strand) as described or illustrated herein. However, in certain embodiments the presence of one or more complementary nucleic acids may have no or little effect on the assembly reaction. In some embodiments, complementary nucleic acid(s) may be incorporated during one or more steps of an assembly. In yet further embodiments, assembly nucleic acids and their complementary strands may be assembled under the same assembly conditions via parallel assembly reactions in the same reaction mixture. In certain embodiments, a nucleic acid product resulting from the assembly of a plurality of starting nucleic acids may be identical to the nucleic acid product that results from the assembly of nucleic acids that are complementary to the starting nucleic acids (e.g., in some embodiments where the assembly steps result in the production of a double-stranded nucleic acid product). As used herein, an oligonucleotide may be a nucleic acid molecule comprising at least two covalently bonded nucleotide residues. In some embodiments, an oligonucleotide may be between 10 and 1,000 nucleotides long. For example, an oligonucleotide may be between 10 and 500 nucleotides long, or between 500 and 1,000 nucleotides long. In some embodiments, an oligonucleotide may be between about 20 and about 100 nucleotides long (e.g., from about 30 to 90, 40 to 85, 50 to 80, 60 to 75, or about 65 or about 70 nucleotides long), between about 100 and about 200, between about 200 and about 300 nucleotides, between about 300 and about 400, or between about 400 and about 500 nucleotides long. However, shorter or longer oligonucleotides may be used. An oligonucleotide may be a single-stranded nucleic acid. However, in some embodiments a double-stranded oligonucleotide may be used as described herein. In certain embodiments, an oligonucleotide may be chemically synthesized as described in more detail below.
In some embodiments, an input nucleic acid (e.g., oligonucleotide) may be amplified before use. The resulting product may be double-stranded. In some embodiments, one of the strands of a double-stranded nucleic acid may be removed before use so that only a predetermined single strand is added to an assembly reaction.
In certain embodiments, each oligonucleotide may be designed to have a sequence that is identical to a different portion of the sequence of a predetermined target nucleic acid that is to be assembled. Accordingly, in some embodiments each oligonucleotide may have a sequence that is identical to a portion of one of the two strands of a double-stranded target nucleic acid. For clarity, the two complementary strands of a double stranded nucleic acid are referred to herein as the positive (P) and negative (N) strands. This designation is not intended to imply that the strands are sense and anti-sense strands of a coding sequence. They refer only to the two complementary strands of a nucleic acid (e.g., a target nucleic acid, an intermediate nucleic acid fragment, etc.) regardless of the sequence or function of the nucleic acid. Accordingly, in some embodiments a P strand may be a sense strand of a coding sequence, whereas in other embodiments a P strand may be an anti-sense strand of a coding sequence. According to the invention, a target nucleic acid may be either the P strand, the N strand, or a double-stranded nucleic acid comprising both the P and N strands.
It should be appreciated that different oligonucleotides may be designed to have different lengths. In some embodiments, one or more different oligonucleotides may have overlapping sequence regions (e.g., overlapping 5′ regions or overlapping 3′ regions). Overlapping sequence regions may be identical (i.e., corresponding to the same strand of the nucleic acid fragment) or complementary (i.e., corresponding to complementary strands of the nucleic acid fragment). The plurality of oligonucleotides may include one or more oligonucleotide pairs with overlapping identical sequence regions, one or more oligonucleotide pairs with overlapping complementary sequence regions, or a combination thereof. Overlapping sequences may be of any suitable length. For example, overlapping sequences may encompass the entire length of one or more nucleic acids used in an assembly reaction. Overlapping sequences may be between about 5 and about 500 nucleotides long (e.g., between about 10 and 100, between about 10 and 75, between about 10 and 50, about 20, about 25, about 30, about 35, about 40, about 45, about 50, etc.) However, shorter, longer or intermediate overlapping lengths may be used. It should be appreciated that overlaps between different input nucleic acids used in an assembly reaction may have different lengths.
In a multiplex oligonucleotide assembly reaction designed to generate a predetermined nucleic acid fragment, the combined sequences of the different oligonucleotides in the reaction may span the sequence of the entire nucleic acid fragment on either the positive strand, the negative strand, both strands, or a combination of portions of the positive strand and portions of the negative strand. The plurality of different oligonucleotides may provide either positive sequences, negative sequences, or a combination of both positive and negative sequences corresponding to the entire sequence of the nucleic acid fragment to be assembled. In some embodiments, the plurality of oligonucleotides may include one or more oligonucleotides having sequences identical to one or more portions of the positive sequence, and one or more oligonucleotides having sequences that are identical to one or more portions of the negative sequence of the nucleic acid fragment. One or more pairs of different oligonucleotides may include sequences that are identical to overlapping portions of the predetermined nucleic acid fragment sequence as described herein (e.g., overlapping sequence portions from the same or from complementary strands of the nucleic acid fragment). In some embodiments, the plurality of oligonucleotides includes a set of oligonucleotides having sequences that combine to span the entire positive sequence and a set oligonucleotides having sequences that combine to span the entire negative sequence of the predetermined nucleic acid fragment. However, in certain embodiments, the plurality of oligonucleotides may include one or more oligonucleotides with sequences that are identical to sequence portions on one strand (either the positive or negative strand) of the nucleic acid fragment, but no oligonucleotides with sequences that are complementary to those sequence portions. In one embodiment, a plurality of oligonucleotides includes only oligonucleotides having sequences identical to portions of the positive sequence of the predetermined nucleic acid fragment. In one embodiment, a plurality of oligonucleotides includes only oligonucleotides having sequences identical to portions of the negative sequence of the predetermined nucleic acid fragment. These oligonucleotides may be assembled by sequential ligation or in an extension-based reaction (e.g., if an oligonucleotide having a 3′ region that is complementary to one of the plurality of oligonucleotides is added to the reaction).
In one aspect, a nucleic acid fragment may be assembled in a polymerase-mediated assembly reaction from a plurality of oligonucleotides that are combined and extended in one or more rounds of polymerase-mediated extensions. In another aspect, a nucleic acid fragment may be assembled in a ligase-mediated reaction from a plurality of oligonucleotides that are combined and ligated in one or more rounds of ligase-mediated ligations. In another aspect, a nucleic acid fragment may be assembled in a non-enzymatic reaction (e.g., a chemical reaction) from a plurality of oligonucleotides that are combined and assembled in one or more rounds of non-enzymatic reactions. In some embodiments, a nucleic acid fragment may be assembled using a combination of polymerase, ligase, and/or non-enzymatic reactions. For example, both polymerase(s) and ligase(s) may be included in an assembly reaction mixture. Accordingly, a nucleic acid may be assembled via coupled amplification and ligation or ligation during amplification. The resulting nucleic acid fragment from each assembly technique may have a sequence that includes the sequences of each of the plurality of assembly oligonucleotides that were used as described herein. These assembly reactions may be referred to as primerless assemblies, since the target nucleic acid is generated by assembling the input oligonucleotides rather than being generated in an amplification reaction where the oligonucleotides act as amplification primers to amplify a pre-existing template nucleic acid molecule corresponding to the target nucleic acid.
Polymerase-based assembly techniques may involve one or more suitable polymerase enzymes that can catalyze a template-based extension of a nucleic acid in a 5′ to 3′ direction in the presence of suitable nucleotides and an annealed template. A polymerase may be thermostable. A polymerase may be obtained from recombinant or natural sources. In some embodiments, a thermostable polymerase from a thermophilic organism may be used. In some embodiments, a polymerase may include a 3′→5′ exonuclease/proofreading activity. In some embodiments, a polymerase may have no, or little, proofreading activity (e.g., a polymerase may be a recombinant variant of a natural polymerase that has been modified to reduce its proofreading activity). Examples of thermostable DNA polymerases include, but are not limited to: Taq (a heat-stable DNA polymerase from the bacterium Thermus aquaticus); Pfu (a thermophilic DNA polymerase with a 3′→5′ exonuclease/proofreading activity from Pyrococcus furiosus, available from for example Promega); VentR® DNA Polymerase and VentR® (exo-) DNA Polymerase (thermophilic DNA polymerases with or without a 3′→5′ exonuclease/proofreading activity from Thermococcus litoralis; also known as Tli polymerase); Deep VentR® DNA Polymerase and Deep VentR® (exo-) DNA Polymerase (thermophilic DNA polymerases with or without a 3′→5′ exonuclease/proofreading activity from Pyrococcus species GB-D; available from New England Biolabs); KOD HiFi (a recombinant Thermococcus kodakaraensis KOD1 DNA polymerase with a 3′→5′ exonuclease/proofreading activity, available from Novagen,); BIO-X-ACT (a mix of polymerases that possesses 5′-3′ DNA polymerase activity and 3′→5′ proofreading activity); Klenow Fragment (an N-terminal truncation of E. coli DNA Polymerase I which retains polymerase activity, but has lost the 5′→3′ exonuclease activity, available from, for example, Promega and NEB); Sequenase™ (T7 DNA polymerase deficient in 3′→5′ exonuclease activity); Phi29 (bacteriophage 29 DNA polymerase, may be used for rolling circle amplification, for example, in a TempliPhi™ DNA Sequencing Template Amplification Kit, available from Amersham Biosciences); TopoTaq™ (a hybrid polymerase that combines hyperstable DNA binding domains and the DNA unlinking activity of Methanopyrus topoisomerase, with no exonuclease activity, available from Fidelity Systems); TopoTaq HiFi which incorporates a proofreading domain with exonuclease activity; Phusion™ (a Pyrococcus-like enzyme with a processivity-enhancing domain, available from New England Biolabs); any other suitable DNA polymerase, or any combination of two or more thereof.
Ligase-based assembly techniques may involve one or more suitable ligase enzymes that can catalyze the covalent linking of adjacent 3′ and 5′ nucleic acid termini (e.g., a 5′ phosphate and a 3′ hydroxyl of nucleic acid(s) annealed on a complementary template nucleic acid such that the 3′ terminus is immediately adjacent to the 5′ terminus). Accordingly, a ligase may catalyze a ligation reaction between the 5′ phosphate of a first nucleic acid to the 3′ hydroxyl of a second nucleic acid if the first and second nucleic acids are annealed next to each other on a template nucleic acid). A ligase may be obtained from recombinant or natural sources. A ligase may be a heat-stable ligase. In some embodiments, a thermostable ligase from a thermophilic organism may be used. Examples of thermostable DNA ligases include, but are not limited to: Tth DNA ligase (from Thermus thermophilus, available from, for example, Eurogentec and GeneCraft); Pfu DNA ligase (a hyperthermophilic ligase from Pyrococcus furiosus); Taq ligase (from Thermus aquaticus), any other suitable heat-stable ligase, or any combination thereof. In some embodiments, one or more lower temperature ligases may be used (e.g., T4 DNA ligase). A lower temperature ligase may be useful for shorter overhangs (e.g., about 3, about 4, about 5, or about 6 base overhangs) that may not be stable at higher temperatures.
Non-enzymatic techniques can be used to ligate nucleic acids. For example, a 5′-end (e.g., the 5′ phosphate group) and a 3′-end (e.g., the 3′ hydroxyl) of one or more nucleic acids may be covalently linked together without using enzymes (e.g., without using a ligase). In some embodiments, non-enzymatic techniques may offer certain advantages over enzyme-based ligations. For example, non-enzymatic techniques may have a high tolerance of non-natural nucleotide analogues in nucleic acid substrates, may be used to ligate short nucleic acid substrates, may be used to ligate RNA substrates, and/or may be cheaper and/or more suited to certain automated (e.g., high throughput) applications.
Non-enzymatic ligation may involve a chemical ligation. In some embodiments, nucleic acid termini of two or more different nucleic acids may be chemically ligated. In some embodiments, nucleic acid termini of a single nucleic acid may be chemically ligated (e.g., to circularize the nucleic acid). It should be appreciated that both strands at a first double-stranded nucleic acid terminus may be chemically ligated to both strands at a second double-stranded nucleic acid terminus. However, in some embodiments only one strand of a first nucleic acid terminus may be chemically ligated to a single strand of a second nucleic acid terminus. For example, the 5′ end of one strand of a first nucleic acid terminus may be ligated to the 3′ end of one strand of a second nucleic acid terminus without the ends of the complementary strands being chemically ligated.
Accordingly, a chemical ligation may be used to form a covalent linkage between a 5′ terminus of a first nucleic acid end and a 3′ terminus of a second nucleic acid end, wherein the first and second nucleic acid ends may be ends of a single nucleic acid or ends of separate nucleic acids. In one aspect, chemical ligation may involve at least one nucleic acid substrate having a modified end (e.g., a modified 5′ and/or 3′ terminus) including one or more chemically reactive moieties that facilitate or promote linkage formation. In some embodiments, chemical ligation occurs when one or more nucleic acid termini are brought together in close proximity (e.g., when the termini are brought together due to annealing between complementary nucleic acid sequences). Accordingly, annealing between complementary 3′ or 5′ overhangs (e.g., overhangs generated by restriction enzyme cleavage of a double-stranded nucleic acid) or between any combination of complementary nucleic acids that results in a 3′ terminus being brought into close proximity with a 5′ terminus (e.g., the 3′ and 5′ termini are adjacent to each other when the nucleic acids are annealed to a complementary template nucleic acid) may promote a template-directed chemical ligation. Examples of chemical reactions may include, but are not limited to, condensation, reduction, and/or photo-chemical ligation reactions. It should be appreciated that in some embodiments chemical ligation can be used to produce naturally-occurring phosphodiester internucleotide linkages, non-naturally-occurring phosphamide pyrophosphate internucleotide linkages, and/or other non-naturally-occurring internucleotide linkages.
In some embodiments, the process of chemical ligation may involve one or more coupling agents to catalyze the ligation reaction. A coupling agent may promote a ligation reaction between reactive groups in adjacent nucleic acids (e.g., between a 5′-reactive moiety and a 3′-reactive moiety at adjacent sites along a complementary template). In some embodiments, a coupling agent may be a reducing reagent (e.g., ferricyanide), a condensing reagent such (e.g., cyanoimidazole, cyanogen bromide, carbodiimide, etc.), or irradiation (e.g., UV irradiation for photo-ligation).
In some embodiments, a chemical ligation may be an autoligation reaction that does not involve a separate coupling agent. In autoligation, the presence of a reactive group on one or more nucleic acids may be sufficient to catalyze a chemical ligation between nucleic acid termini without the addition of a coupling agent (see, for example, Xu Y & Kool E T, 1997, Tetrahedron Lett. 38:5595-8). Non-limiting examples of these reagent-free ligation reactions may involve nucleophilic displacements of sulfur on bromoacetyl, tosyl, or iodo-nucleoside groups (see, for example, Xu Y et al., 2001, Nat Biotech 19:148-52). Nucleic acids containing reactive groups suitable for autoligation can be prepared directly on automated synthesizers (see, for example, Xu Y & Kool E T, 1999, Nuc. Acids Res. 27:875-81). In some embodiments, a phosphorothioate at a 3′ terminus may react with a leaving group (such as tosylate or iodide) on a thymidine at an adjacent 5′ terminus. In some embodiments, two nucleic acid strands bound at adjacent sites on a complementary target strand may undergo auto-ligation by displacement of a 5′-end iodide moiety (or tosylate) with a 3′-end sulfur moiety. Accordingly, in some embodiments the product of an autoligation may include a non-naturally-occurring intemucleotide linkage (e.g., a single oxygen atom may be replaced with a sulfur atom in the ligated product).
In some embodiments, a synthetic nucleic acid duplex can be assembled via chemical ligation in a one step reaction involving simultaneous chemical ligation of nucleic acids on both strands of the duplex. For example, a mixture of 5′-phosphorylated oligonucleotides corresponding to both strands of a target nucleic acid may be chemically ligated by a) exposure to heat (e.g., to 97° C.) and slow cooling to form a complex of annealed oligonucleotides, and b) exposure to cyanogen bromide or any other suitable coupling agent under conditions sufficient to chemically ligate adjacent 3′ and 5′ ends in the nucleic acid complex.
In some embodiments, a synthetic nucleic acid duplex can be assembled via chemical ligation in a two step reaction involving separate chemical ligations for the complementary strands of the duplex. For example, each strand of a target nucleic acid may be ligated in a separate reaction containing phosphorylated oligonucleotides corresponding to the strand that is to be ligated and non-phosphorylated oligonucleotides corresponding to the complementary strand. The non-phosphorylated oligonucleotides may serve as a template for the phosphorylated oligonucleotides during a chemical ligation (e.g. using cyanogen bromide). The resulting single-stranded ligated nucleic acid may be purified and annealed to a complementary ligated single-stranded nucleic acid to form the target duplex nucleic acid (see, for example, Shabarova ZA et al., 1991, Nuc. Acids Res. 19:4247-51).
Aspects of the invention may be used to enhance different types of nucleic acid assembly reactions (e.g., multiplex nucleic acid assembly reactions). Aspects of the invention may be used in combination with one or more assembly reactions described in, for example, Carr et al., 2004, Nucleic Acids Research, Vol. 32, No 20, e162 (9 pages); Richmond et al., 2004, Nucleic Acids Research, Vol. 32, No 17, pp. 5011-5018; Caruthers et al., 1972, J. Mol. Biol. 72, 475-492; Hecker et al., 1998, Biotechniques 24:256-260; Kodumal et al., 2004, PNAS Vol. 101, No. 44, pp. 15573-15578; Tian et al., 2004, Nature, Vol. 432, pp. 1050-1054; and U.S. Pat. Nos. 6,008,031 and 5,922,539, the disclosures of which are incorporated herein by reference. Certain embodiments of multiplex nucleic acid assembly reactions for generating a predetermined nucleic acid fragment are illustrated with reference to
When assembling a nucleic acid fragment using a polymerase, a single cycle of polymerase extension extends oligonucleotide pairs with annealed 3′ regions. Accordingly, if a plurality of oligonucleotides were annealed to form an annealed complex such as the one illustrated in
In one embodiment,
Assembly of a predetermined nucleic acid fragment from the plurality of oligonucleotides shown in
It should be appreciated that other configurations of oligonucleotides may be used to assemble a nucleic acid via two or more cycles of polymerase-based extension. In many configurations, at least one pair of oligonucleotides have complementary 3′ end regions.
As with other assembly reactions described herein, support-bound ligase reactions (e.g., those illustrated in
As illustrated herein, different oligonucleotide assembly reactions may be used to assemble a plurality of overlapping oligonucleotides (with overlaps that are either 5′/5′, 3′/3′, 5′/3′, complementary, non-complementary, or a combination thereof). Many of these reactions include at least one pair of oligonucleotides (the pair including one oligonucleotide from a first group or P group of oligonucleotides and one oligonucleotide from a second group or N group of oligonucleotides) have overlapping complementary 3′ regions. However, in some embodiments, a predetermined nucleic acid may be assembled from non-overlapping oligonucleotides using blunt-ended ligation reactions. In some embodiments, the order of assembly of the non-overlapping oligonucleotides may be biased by selective phosphorylation of different 5′ ends. In some embodiments, size purification may be used to select for the correct order of assembly. In some embodiments, the correct order of assembly may be promoted by sequentially adding appropriate oligonucleotide substrates into the reaction (e.g., the ligation reaction).
In order to obtain a full-length nucleic acid fragment from a multiplex oligonucleotide assembly reaction, a purification step may be used to remove starting oligonucleotides and/or incompletely assembled fragments. In some embodiments, a purification step may involve chromatography, electrophoresis, or other physical size separation technique. In certain embodiments, a purification step may involve amplifying the full length product. For example, a pair of amplification primers (e.g., PCR primers) that correspond to the predetermined 5′ and 3′ ends of the nucleic acid fragment being assembled will preferentially amplify full length product in an exponential fashion. It should be appreciated that smaller assembled products may be amplified if they contain the predetermined 5′ and 3′ ends. However, such smaller-than-expected products containing the predetermined 5′ and 3′ ends should only be generated if an error occurred during assembly (e.g., resulting in the deletion or omission of one or more regions of the target nucleic acid) and may be removed by size fractionation of the amplified product. Accordingly, a preparation containing a relatively high amount of full length product may be obtained directly by amplifying the product of an assembly reaction using primers that correspond to the predetermined 5′ and 3′ ends. In some embodiments, additional purification (e.g., size selection) techniques may be used to obtain a more purified preparation of amplified full-length nucleic acid fragment.
When designing a plurality of oligonucleotides to assemble a predetermined nucleic acid fragment, the sequence of the predetermined fragment will be provided by the oligonucleotides as described herein. However, the oligonucleotides may contain additional sequence information that may be removed during assembly or may be provided to assist in subsequent manipulations of the assembled nucleic acid fragment. Examples of additional sequences include, but are not limited to, primer recognition sequences for amplification (e.g., PCR primer recognition sequences), restriction enzyme recognition sequences, recombination sequences, other binding or recognition sequences, labeled sequences, etc. In some embodiments, one or more of the 5′-most oligonucleotides, one or more of the 3′-most oligonucleotides, or any combination thereof, may contain one or more additional sequences. In some embodiments, the additional sequence information may be contained in two or more adjacent oligonucleotides on either strand of the predetermined nucleic acid sequence. Accordingly, an assembled nucleic acid fragment may contain additional sequences that may be used to connect the assembled fragment to one or more additional nucleic acid fragments (e.g., one or more other assembled fragments, fragments obtained from other sources, vectors, etc.) via ligation, recombination, polymerase-mediated assembly, etc. In some embodiments, purification may involve cloning one or more assembled nucleic acid fragments. The cloned product may be screened (e.g., sequenced, analyzed for an insert of the expected size, etc.).
In some embodiments, a nucleic acid fragment assembled from a plurality of oligonucleotides may be combined with one or more additional nucleic acid fragments using a polymerase-based and/or a ligase-based extension reaction similar to those described herein for oligonucleotide assembly. Accordingly, one or more overlapping nucleic acid fragments may be combined and assembled to produce a larger nucleic acid fragment as described herein. In certain embodiments, double-stranded overlapping oligonucleotide fragments may be combined. However, single-stranded fragments, or combinations of single-stranded and double-stranded fragments may be combined as described herein. A nucleic acid fragment assembled from a plurality of oligonucleotides may be of any length depending on the number and length of the oligonucleotides used in the assembly reaction. For example, a nucleic acid fragment (either single-stranded or double-stranded) assembled from a plurality of oligonucleotides may be between 50 and 1,000 nucleotides long (for example, about 70 nucleotides long, between 100 and 500 nucleotides long, between 200 and 400 nucleotides long, about 200 nucleotides long, about 300 nucleotides long, about 400 nucleotides long, etc.). One or more such nucleic acid fragments (e.g., with overlapping 3′ and/or 5′ ends) may be assembled to form a larger nucleic acid fragment (single-stranded or double-stranded) as described herein.
A full length product assembled from smaller nucleic acid fragments also may be isolated or purified as described herein (e.g., using a size selection, cloning, selective binding or other suitable purification procedure). In addition, any assembled nucleic acid fragment (e.g., full-length nucleic acid fragment) described herein may be amplified (prior to, as part of, or after, a purification procedure) using appropriate 5′ and 3′ amplification primers.
Synthetic Oligonucleotides:
It should be appreciated that the terms P Group and N Group oligonucleotides are used herein for clarity purposes only, and to illustrate several embodiments of multiplex oligonucleotide assembly. The Group P and Group N oligonucleotides described herein are interchangeable, and may be referred to as first and second groups of oligonucleotides corresponding to sequences on complementary strands of a target nucleic acid fragment.
Oligonucleotides may be synthesized using any suitable technique. For example, oligonucleotides may be synthesized on a column or other support (e.g., a chip). Examples of chip-based synthesis techniques include techniques used in synthesis devices or methods available from Combimatrix, Agilent, Affymetrix, or other sources. A synthetic oligonucleotide may be of any suitable size, for example between 10 and 1,000 nucleotides long (e.g., between 10 and 200, 200 and 500, 500 and 1,000 nucleotides long, or any combination thereof). An assembly reaction may include a plurality of oligonucleotides, each of which independently may be between 10 and 200 nucleotides in length (e.g., between 20 and 150, between 30 and 100, 30 to 90, 30-80, 30-70, 30-60, 35-55, 40-50, or any intermediate number of nucleotides). However, one or more shorter or longer oligonucleotides may be used in certain embodiments.
Oligonucleotides may be provided as single stranded synthetic products. However, in some embodiments, oligonucleotides may be provided as double-stranded preparations including an annealed complementary strand. Oligonucleotides may be molecules of DNA, RNA, PNA, or any combination thereof. A double-stranded oligonucleotide may be produced by amplifying a single-stranded synthetic oligonucleotide or other suitable template (e.g., a sequence in a nucleic acid preparation such as a nucleic acid vector or genomic nucleic acid). Accordingly, a plurality of oligonucleotides designed to have the sequence features described herein may be provided as a plurality of single-stranded oligonucleotides having those feature, or also may be provided along with complementary oligonucleotides.
In some embodiments, an oligonucleotide may be amplified using an appropriate primer pair with one primer corresponding to each end of the oligonucleotide (e.g., one that is complementary to the 3′ end of the oligonucleotide and one that is identical to the 5′ end of the oligonucleotide). In some embodiments, an oligonucleotide may be designed to contain a central assembly sequence (designed to be incorporated into the target nucleic acid) flanked by a 5′ amplification sequence (e.g., a 5′ universal sequence) and a 3′ amplification sequence (e.g., a 3′ universal sequence). Amplification primers (e.g., between 10 and 50 nucleotides long, between 15 and 45 nucleotides long, about 25 nucleotides long, etc.) corresponding to the flanking amplification sequences may be used to amplify the oligonucleotide (e.g., one primer may be complementary to the 3′ amplification sequence and one primer may have the same sequence as the 5′ amplification sequence). The amplification sequences then may be removed from the amplified oligonucleotide using any suitable technique to produce an oligonucleotide that contains only the assembly sequence.
In some embodiments, a plurality of different oligonucleotides (e.g., about 5, 10, 50, 100, or more) with different central assembly sequences may have identical 5′ amplification sequences and identical 3′ amplification sequences. These oligonucleotides can all be amplified in the same reaction using the same amplification primers.
A preparation of an oligonucleotide designed to have a certain sequence may include oligonucleotide molecules having the designed sequence in addition to oligonucleotide molecules that contain errors (e.g., that differ from the designed sequence at least at one position). A sequence error may include one or more nucleotide deletions, additions, substitutions (e.g., transversion or transition), inversions, duplications, or any combination of two or more thereof. Oligonucleotide errors may be generated during oligonucleotide synthesis. Different synthetic techniques may be prone to different error profiles and frequencies. In some embodiments, error rates may vary from 1/10 to 1/200 errors per base depending on the synthesis protocol that is used. However, in some embodiments lower error rates may be achieved. Also, the types of errors may depend on the synthetic techniques that are used. For example, in some embodiments chip-based oligonucleotide synthesis may result in relatively more deletions than column-based synthetic techniques.
In some embodiments, one or more oligonucleotide preparations may be processed to remove (or reduce the frequency of) error-containing oligonucleotides. In some embodiments, a hybridization technique may be used wherein an oligonucleotide preparation is hybridized under stringent conditions one or more times to an immobilized oligonucleotide preparation designed to have a complementary sequence. Oligonucleotides that do not bind may be removed in order to selectively or specifically remove oligonucleotides that contain errors that would destabilize hybridization under the conditions used. It should be appreciated that this processing may not remove all error-containing oligonucleotides since many have only one or two sequence errors and may still bind to the immobilized oligonucleotides with sufficient affinity for a fraction of them to remain bound through this selection processing procedure.
In some embodiments of the invention, a sliding clamp technique may be used for enriching error-free oligonucleotides after hybridization of oligonucleotides that are designed to be complementary, provided that the ends are “blocked” to inhibit dissociation of the clamped form of MutS from any heteroduplexes that are present.
In some embodiments, a nucleic acid binding protein or recombinase (e.g., RecA) may be included in one or more of the oligonucleotide processing steps to improve the selection of error free oligonucleotides. For example, by preferentially promoting the hybridization of oligonucleotides that are completely complementary with the immobilized oligonucleotides, the amount of error containing oligonucleotides that are bound may be reduced. As a result, this oligonucleotide processing procedure may remove more error-containing oligonucleotides and generate an oligonucleotide preparation that has a lower error frequency (e.g., with an error rate of less than 1/50, less than 1/100, less than 1/200, less than 1/300, less than 1/400, less than 1/500, less than 1/1,000, or less than 1/2,000 errors per base.
A plurality of oligonucleotides used in an assembly reaction may contain preparations of synthetic oligonucleotides, single-stranded oligonucleotides, double-stranded oligonucleotides, amplification products, oligonucleotides that are processed to remove (or reduce the frequency of) error-containing variants, etc., or any combination of two or more thereof.
In some aspects, a synthetic oligonucleotide may be amplified prior to use. Either strand of a double-stranded amplification product may be used as an assembly oligonucleotide and added to an assembly reaction as described herein. A synthetic oligonucleotide may be amplified using a pair of amplification primers (e.g., a first primer that hybridizes to the 3′ region of the oligonucleotide and a second primer that hybridizes to the 3′ region of the complement of the oligonucleotide). The oligonucleotide may be synthesized on a support such as a chip (e.g., using an ink-jet-based synthesis technology). In some embodiments, the oligonucleotide may be amplified while it is still attached to the support. In some embodiments, the oligonucleotide may be removed or cleaved from the support prior to amplification. The two strands of a double-stranded amplification product may be separated and isolated using any suitable technique. In some embodiments, the two strands may be differentially labeled (e.g., using one or more different molecular weight, affinity, fluorescent, electrostatic, magnetic, and/or other suitable tags). The different labels may be used to purify and/or isolate one or both strands. In some embodiments, biotin may be used as a purification tag. In some embodiments, the strand that is to be used for assembly may be directly purified (e.g., using an affinity or other suitable tag). In some embodiments, the complementary strand is removed (e.g., using an affinity or other suitable tag) and the remaining strand is used for assembly.
In some embodiments, a synthetic oligonucleotide may include a central assembly sequence flanked by 5′ and 3′ amplification sequences. The central assembly sequence is designed for incorporation into an assembled nucleic acid. The flanking sequences are designed for amplification and are not intended to be incorporated into the assembled nucleic acid. The flanking amplification sequences may be used as universal primer sequences to amplify a plurality of different assembly oligonucleotides that share the same amplification sequences but have different central assembly sequences. In some embodiments, the flanking sequences are removed after amplification to produce an oligonucleotide that contains only the assembly sequence.
In some embodiments, one of the two amplification primers may be biotinylated. The nucleic acid strand that incorporates this biotinylated primer during amplification can be affinity purified using streptavidin (e.g., bound to a bead, column, or other surface). In some embodiments, the amplification primers also may be designed to include certain sequence features that can be used to remove the primer regions after amplification in order to produce a single-stranded assembly oligonucleotide that includes the assembly sequence without the flanking amplification sequences.
In some embodiments, the non-biotinylated strand may be used for assembly. The assembly oligonucleotide may be purified by removing the biotinylated complementary strand. In some embodiments, the amplification sequences may be removed if the non-biotinylated primer includes a dU at its 3′ end, and if the amplification sequence recognized by (i.e., complementary to) the biotinylated primer includes at most three of the four nucleotides and the fourth nucleotide is present in the assembly sequence at (or adjacent to) the junction between the amplification sequence and the assembly sequence. After amplification, the double-stranded product is incubated with T4 DNA polymerase (or other polymerase having a suitable editing activity) in the presence of the fourth nucleotide (without any of the nucleotides that are present in the amplification sequence recognized by the biotinylated primer) under appropriate reaction conditions. Under these conditions, the 3′ nucleotides are progressively removed through to the nucleotide that is not present in the amplification sequence (referred to as the fourth nucleotide above). As a result, the amplification sequence that is recognized by the biotinylated primer is removed. The biotinylated strand is then removed. The remaining non-biotinylated strand is then treated with uracil-DNA glycosylase (UDG) to remove the non-biotinylated primer sequence. This technique generates a single-stranded assembly oligonucleotide without the flanking amplification sequences. It should be appreciated that this technique may be used to process a single amplified oligonucleotide preparation or a plurality of different amplified oligonucleotides in a single reaction if they share the same amplification sequence features described above.
In some embodiments, the biotinylated strand may be used for assembly. The assembly oligonucleotide may be obtained directly by isolating the biotinylated strand. In some embodiments, the amplification sequences may be removed if the biotinylated primer includes a dU at its 3′ end, and if the amplification sequence recognized by (i.e., complementary to) the non-biotinylated primer includes at most three of the four nucleotides and the fourth nucleotide is present in the assembly sequence at (or adjacent to) the junction between the amplification sequence and the assembly sequence. After amplification, the double-stranded product is incubated with T4 DNA polymerase (or other polymerase having a suitable editing activity) in the presence of the fourth nucleotide (without any of the nucleotides that are present in the amplification sequence recognized by the non-biotinylated primer) under appropriate reaction conditions. Under these conditions, the 3′ nucleotides are progressively removed through to the nucleotide that is not present in the amplification sequence (referred to as the fourth nucleotide above). As a result, the amplification sequence that is recognized by the non-biotinylated primer is removed. The biotinylated strand is then isolated (and the non-biotinylated strand is removed). The isolated biotinylated strand is then treated with UDG to remove the biotinylated primer sequence. This technique generates a single-stranded assembly oligonucleotide without the flanking amplification sequences. It should be appreciated that this technique may be used to process a single amplified oligonucleotide preparation or a plurality of different amplified oligonucleotides in a single reaction if they share the same amplification sequence features described above.
It should be appreciated that the biotinylated primer may be designed to anneal to either the synthetic oligonucleotide or to its complement for the amplification and purification reactions described above. Similarly, the non-biotinylated primer may be designed to anneal to either strand provided it anneals to the strand that is complementary to the strand recognized by the biotinylated primer.
In certain embodiments, it may be helpful to include one or more modified oligonucleotides in an assembly reaction. An oligonucleotide may be modified by incorporating a modified-base (e.g., a nucleotide analog) during synthesis, by modifying the oligonucleotide after synthesis, or any combination thereof. Examples of modifications include, but are not limited to, one or more of the following: universal bases such as nitroindoles, dP and dK, inosine, uracil; halogenated bases such as BrdU; fluorescent labeled bases; non-radioactive labels such as biotin (as a derivative of dT) and digoxigenin (DIG); 2,4-Dinitrophenyl (DNP); radioactive nucleotides; post-coupling modification such as dR-NH2 (deoxyribose-NH2); Acridine (6-chloro-2-methoxiacridine); and spacer phosphoramides which are used during synthesis to add a spacer ‘arm’ into the sequence, such as C3, C8 (octanediol), C9, C12, HEG (hexaethlene glycol) and C18.
It should be appreciated that one or more nucleic acid binding proteins or recombinases are preferably not included in a post-assembly fidelity optimization technique (e.g., a screening technique using a MutS or MutS homolog), because the optimization procedure involves removing error-containing nucleic acids via the production and removal of heteroduplexes. Accordingly, any nucleic acid binding proteins or recombinases (e.g., RecA) that were included in the assembly steps is preferably removed (e.g., by inactivation, column purification or other suitable technique) after assembly and prior to fidelity optimization.
Applications:
Aspects of the invention may be useful for a range of applications involving the production and/or use of synthetic nucleic acids. As described herein, the invention provides methods for producing synthetic nucleic acids with increased fidelity and/or for reducing the cost and/or time of synthetic assembly reactions. The resulting assembled nucleic acids may be amplified in vitro (e.g., using PCR, LCR, or any suitable amplification technique), amplified in vivo (e.g., via cloning into a suitable vector), isolated and/or purified. An assembled nucleic acid (alone or cloned into a vector) may be transformed into a host cell (e.g., a prokaryotic, eukaryotic, insect, mammalian, or other host cell). In some embodiments, the host cell may be used to propagate the nucleic acid. In certain embodiments, the nucleic acid may be integrated into the genome of the host cell. In some embodiments, the nucleic acid may replace a corresponding nucleic acid region on the genome of the cell (e.g., via homologous recombination). Accordingly, nucleic acids may be used to produce recombinant organisms. In some embodiments, a target nucleic acid may be an entire genome or large fragments of a genome that are used to replace all or part of the genome of a host organism. Recombinant organisms also may be used for a variety of research, industrial, agricultural, and/or medical applications.
Many of the techniques described herein can be used together, applying enrichment steps at one or more points to produce long nucleic acid molecules. Correct sequence enrichment techniques of the invention can be applied to double-stranded nucleic acids of any size. For example, enrichment techniques using sliding clamp configurations of mismatch binding proteins may be used with oligonucleotide duplexes, nucleic acid fragments of less than 100 to more than 10,000 base pairs in length (e.g., 100 mers to 500 mers, 500 mers to 1,000 mers, 1,000 mers to 5,000 mers, 5,000 mers to 10,000 mers, etc.). In some embodiments, methods described herein may be used during the assembly of large nucleic acid molecules (for example, larger than 5,000 nucleotides in length, e.g., longer than about 10,000, longer than about 25,000, longer than about 50,000, longer than about 75,000, longer than about 100,000 nucleotides, etc.). In an exemplary embodiment, methods described herein may be used during the assembly of an entire genome (or a large fragment thereof, e.g., about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or more) of an organism (e.g., of a viral, bacterial, yeast, or other prokaryotic or eukaryotic organism), optionally incorporating specific modifications into the sequence at one or more desired locations.
Any of the nucleic acid products (e.g., including nucleic acids that are amplified, cloned, purified, isolated, etc.) may be packaged in any suitable format (e.g., in a stable buffer, lyophilized, etc.) for storage and/or shipping (e.g., for shipping to a distribution center or to a customer). Similarly, any of the host cells (e.g., cells transformed with a vector or having a modified genome) may be prepared in a suitable buffer for storage and or transport (e.g., for distribution to a customer). In some embodiments, cells may be frozen. However, other stable cell preparations also may be used.
Host cells may be grown and expanded in culture. Host cells may be used for expressing one or more RNAs or polypeptides of interest (e.g., therapeutic, industrial, agricultural, and/or medical proteins). The expressed polypeptides may be natural polypeptides or non-natural polypeptides. The polypeptides may be isolated or purified for subsequent use.
Accordingly, nucleic acid molecules generated using methods of the invention can be incorporated into a vector. The vector may be a cloning vector or an expression vector. In some embodiments, the vector may be a viral vector. A viral vector may comprise nucleic acid sequences capable of infecting target cells. Similarly, in some embodiments, a prokaryotic expression vector operably linked to an appropriate promoter system can be used to transform target cells. In other embodiments, a eukaryotic vector operably linked to an appropriate promoter system can be used to transfect target cells or tissues.
Transcription and/or translation of the constructs described herein may be carried out in vitro (i.e. using cell-free systems) or in vivo (i.e. expressed in cells). In some embodiments, cell lysates may be prepared. In certain embodiments, expressed RNAs or polypeptides may be isolated or purified. Nucleic acids of the invention also may be used to add detection and/or purification tags to expressed polypeptides or fragments thereof. Examples of polypeptide-based fusion/tag include, but are not limited to, hexa-histidine (His6) Myc and HA, and other polypeptides with utility, such as GFP, GST, MBP, chitin and the like. In some embodiments, polypeptides may comprise one or more unnatural amino acid residue(s).
In some embodiments, antibodies can be made against polypeptides or fragment(s) thereof encoded by one or more synthetic nucleic acids.
In certain embodiments, synthetic nucleic acids may be provided as libraries for screening in research and development (e.g., to identify potential therapeutic proteins or peptides, to identify potential protein targets for drug development, etc.)
In some embodiments, a synthetic nucleic acid may be used as a therapeutic (e.g., for gene therapy, or for gene regulation). For example, a synthetic nucleic acid may be administered to a patient in an amount sufficient to express a therapeutic amount of a protein. In other embodiments, a synthetic nucleic acid may be administered to a patient in an amount sufficient to regulate (e.g., down-regulate) the expression of a gene.
It should be appreciated that different acts or embodiments described herein may be performed independently and may be performed at different locations in the United States or outside the United States. For example, each of the acts of receiving an order for a target nucleic acid, analyzing a target nucleic acid sequence, designing one or more starting nucleic acids (e.g., oligonucleotides), synthesizing starting nucleic acid(s), purifying starting nucleic acid(s), assembling starting nucleic acid(s), isolating assembled nucleic acid(s), confirming the sequence of assembled nucleic acid(s), manipulating assembled nucleic acid(s) (e.g., amplifying, cloning, inserting into a host genome, etc.), and any other acts or any parts of these acts may be performed independently either at one location or at different sites within the United States or outside the United States. In some embodiments, an assembly procedure may involve a combination of acts that are performed at one site (in the United States or outside the United States) and acts that are performed at one or more remote sites (within the United States or outside the United States).
Automated Applications:
Aspects of the invention may include automating one or more acts described herein. For example, a sequence analysis may be automated in order to generate a synthesis strategy automatically. The synthesis strategy may include i) the design of the starting nucleic acids that are to be assembled into the target nucleic acid, ii) the choice of the assembly technique(s) to be used, iii) the number of rounds of assembly and error screening or sequencing steps to include, and/or decisions relating to subsequent processing of an assembled target nucleic acid. Similarly, one or more steps of an assembly reaction may be automated using one or more automated sample handling devices (e.g., one or more automated liquid or fluid handling devices). For example, the synthesis and optional selection of starting nucleic acids (e.g., oligonucleotides) may be automated using an nucleic acid synthesizer and automated procedures. Automated devices and procedures may be used to mix reaction reagents, including one or more of the following: starting nucleic acids, buffers, enzymes (e.g., one or more ligases and/or polymerases), nucleotides, nucleic acid binding proteins or recombinases, salts, and any other suitable agents such as stabilizing agents. Automated devices and procedures also may be used to control the reaction conditions. For example, an automated thermal cycler may be used to control reaction temperatures and any temperature cycles that may be used. Similarly, subsequent purification and analysis of assembled nucleic acid products may be automated. For example, fidelity optimization steps (e.g., a MutS error screening procedure) may be automated using appropriate sample processing devices and associated protocols. Sequencing also may be automated using a sequencing device and automated sequencing protocols. Additional steps (e.g., amplification, cloning, etc.) also may be automated using one or more appropriate devices and related protocols. It should be appreciated that one or more of the device or device components described herein may be combined in a system (e.g. a robotic system). Assembly reaction mixtures (e.g., liquid reaction samples) may be transferred from one component of the system to another using automated devices and procedures (e.g., robotic manipulation and/or transfer of samples and/or sample containers, including automated pipetting devices, etc.). The system and any components thereof may be controlled by a control system.
Accordingly, acts of the invention may be automated using, for example, a computer system (e.g., a computer controlled system). A computer system on which aspects of the invention can be implemented may include a computer for any type of processing (e.g., sequence analysis and/or automated device control as described herein). However, it should be appreciated that certain processing steps may be provided by one or more of the automated devices that are part of the assembly system. In some embodiments, a computer system may include two or more computers. For example, one computer may be coupled, via a network, to a second computer. One computer may perform sequence analysis. The second computer may control one or more of the automated synthesis and assembly devices in the system. In other aspects, additional computers may be included in the network to control one or more of the analysis or processing acts. Each computer may include a memory and processor. The computers can take any form, as the aspects of the present invention are not limited to being implemented on any particular computer platform. Similarly, the network can take any form, including a private network or a public network (e.g., the Internet). Display devices can be associated with one or more of the devices and computers. Alternatively, or in addition, a display device may be located at a remote site and connected for displaying the output of an analysis in accordance with the invention. Connections between the different components of the system may be via wire, wireless transmission, satellite transmission, any other suitable transmission, or any combination of two or more of the above.
In accordance with one embodiment of the present invention for use on a computer system it is contemplated that sequence information (e.g., a target sequence, a processed analysis of the target sequence, etc.) can be obtained and then sent over a public network, such as the Internet, to a remote location to be processed by computer to produce any of the various types of outputs discussed herein (e.g., in connection with oligonucleotide design). However, it should be appreciated that the aspects of the present invention described herein are not limited in that respect, and that numerous other configurations are possible. For example, all of the analysis and processing described herein can alternatively be implemented on a computer that is attached locally to a device, an assembly system, or one or more components of an assembly system. As a further alternative, as opposed to transmitting sequence information (e.g., a target sequence, a processed analysis of the target sequence, etc.) over a communication medium (e.g., the network), the information can be loaded onto a computer readable medium that can then be physically transported to another computer for processing in the manners described herein. In another embodiment, a combination of two or more transmission/delivery techniques may be used. It also should be appreciated that computer implementable programs for performing a sequence analysis or controlling one or more of the devices, systems, or system components described herein also may be transmitted via a network or loaded onto a computer readable medium as described herein. Accordingly, aspects of the invention may involve performing one or more steps within the United States and additional steps outside the United States. In some embodiments, sequence information (e.g., a customer order) may be received at one location (e.g., in one country) and sent to a remote location for processing (e.g., in the same country or in a different country (e.g., for sequence analysis to determine a synthesis strategy and/or design oligonucleotides). In certain embodiments, a portion of the sequence analysis may be performed at one site (e.g., in one country) and another portion at another site (e.g., in the same country or in another country). In some embodiments, different steps in the sequence analysis may be performed at multiple sites (e.g., all in one country or in several different countries). The results of a sequence analysis then may be sent to a further site for synthesis. However, in some embodiments, different synthesis and quality control steps may be performed at more than one site (e.g., within one county or in two or more countries). An assembled nucleic acid then may be shipped to a further site (e.g., either to a central shipping center or directly to a client).
Each of the different aspects, embodiments, or acts of the present invention described herein can be independently automated and implemented in any of numerous ways. For example, each aspect, embodiment, or act can be independently implemented using hardware, software or a combination thereof. When implemented in software, the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers. It should be appreciated that any component or collection of components that perform the functions described above can be generically considered as one or more controllers that control the above-discussed functions. The one or more controllers can be implemented in numerous ways, such as with dedicated hardware, or with general purpose hardware (e.g., one or more processors) that is programmed using microcode or software to perform the functions recited above.
In this respect, it should be appreciated that one implementation of the embodiments of the present invention comprises at least one computer-readable medium (e.g., a computer memory, a floppy disk, a compact disk, a tape, etc.) encoded with a computer program (i.e., a plurality of instructions), which, when executed on a processor, performs one or more of the above-discussed functions of the present invention. The computer-readable medium can be transportable such that the program stored thereon can be loaded onto any computer system resource to implement one or more functions of the present invention discussed herein. In addition, it should be appreciated that the reference to a computer program which, when executed, performs the above-discussed functions, is not limited to an application program running on a host computer. Rather, the term computer program is used herein in a generic sense to reference any type of computer code (e.g., software or microcode) that can be employed to program a processor to implement the above-discussed aspects of the present invention.
It should be appreciated that in accordance with several embodiments of the present invention wherein processes are implemented in a computer readable medium, the computer implemented processes may, during the course of their execution, receive input manually (e.g., from a user).
Accordingly, overall system-level control of the assembly devices or components described herein may be performed by a system controller which may provide control signals to the associated nucleic acid synthesizers, liquid handling devices, thermal cyclers, sequencing devices, associated robotic components, as well as other suitable systems for performing the desired input/output or other control functions. Thus, the system controller along with any device controllers together form a controller that controls the operation of a nucleic acid assembly system. The controller may include a general purpose data processing system, which can be a general purpose computer, or network of general purpose computers, and other associated devices, including communications devices, modems, and/or other circuitry or components necessary to perform the desired input/output or other functions. The controller can also be implemented, at least in part, as a single special purpose integrated circuit (e.g., ASIC) or an array of ASICs, each having a main or central processor section for overall, system-level control, and separate sections dedicated to performing various different specific computations, functions and other processes under the control of the central processor section. The controller can also be implemented using a plurality of separate dedicated programmable integrated or other electronic circuits or devices, e.g., hard wired electronic or logic circuits such as discrete element circuits or programmable logic devices. The controller can also include any other components or devices, such as user input/output devices (monitors, displays, printers, a keyboard, a user pointing device, touch screen, or other user interface, etc.), data storage devices, drive motors, linkages, valve controllers, robotic devices, vacuum and other pumps, pressure sensors, detectors, power supplies, pulse sources, communication devices or other electronic circuitry or components, and so on. The controller also may control operation of other portions of a system, such as automated client order processing, quality control, packaging, shipping, billing, etc., to perform other suitable functions known in the art but not described in detail herein.
Business Applications:
Aspects of the invention may be useful to streamline nucleic acid assembly reactions. Accordingly, aspects of the invention relate to marketing methods, compositions, kits, devices, and systems for increasing nucleic acid assembly throughput involving correct sequence enrichment using sliding clamp techniques described herein.
Aspects of the invention may be useful for reducing the time and/or cost of production, commercialization, and/or development of synthetic nucleic acids, and/or related compositions. Accordingly, aspects of the invention relate to business methods that involve collaboratively (e.g., with a partner) or independently marketing one or more methods, kits, compositions, devices, or systems for analyzing and/or assembling synthetic nucleic acids as described herein. For example, certain embodiments of the invention may involve marketing a procedure and/or associated devices or systems involving correct sequence enrichment using sliding clamp techniques described herein. In some embodiments, synthetic nucleic acids, libraries of synthetic nucleic acids, host cells containing synthetic nucleic acids, expressed polypeptides or proteins, etc., also may be marketed.
Marketing may involve providing information and/or samples relating to methods, kits, compositions, devices, and/or systems described herein. Potential customers or partners may be, for example, companies in the pharmaceutical, biotechnology and agricultural industries, as well as academic centers and government research organizations or institutes. Business applications also may involve generating revenue through sales and/or licenses of methods, kits, compositions, devices, and/or systems of the invention.
Gene assembly via a 2-step PCR method: In step (1), a primerless assembly of oligonucleotides is performed and in step (2) an assembled nucleic acid fragment is amplified in a primer-based amplification.
A 993 base long promoter>EGFP construct was assembled from 50-mer abutting oligonucleotides using a 2-step PCR assembly.
Mixed oligonucleotide pools were prepared as follows: 36 overlapping 50-mer oligonucleotides and two 5′ terminal 59-mers were separated into 4 pools, each corresponding to overlapping 200-300 nucleotide segments of the final construct. The total oligonucleotide concentration in each pool was 5 μM.
A primerless PCR extension reaction was used to stitch (assemble) overlapping oligonucleotides in each pool. The PCR extension reaction mixture was as follows:
Assembly was achieved by cycling this mixture through several rounds of denaturing, annealing, and extension reactions as follows:
The resulting product was exposed to amplification conditions to amplify the desired nucleic acid fragments (sub-segments of 200-300 nucleotides). The following PCR mix was used:
The following PCR cycle conditions were used:
The amplified sub-segments were assembled using another round of primeness PCR as follows. A diluted amplification product was prepared for each sub-segment by diluting each amplified sub-segment PCR product 1:10 (4 μl mix+36 μl dH2O). This diluted mix was used as follows:
The following PCR cycle conditions were used:
start 2 min. 95° C.
The full-length 993 nucleotide long promoter>EGFP was amplified in the following PCR mix:
The following PCR cycle conditions were used:
Method:
A method was developed for separating circular heteroduplex DNA from circular, homoduplex DNA using a MutS sliding clamp. Briefly, assembled constructs were heteroduplexed, cloned into a vector, and treated with MutS under conditions that favor the formation of a sliding clamp. Treated DNA was filtered through a nitrocellulose membrane to separate clamp-bound, error-containing heteroduplex from clamp-free, error-free homoduplex. The filtrate was then directly transformed into a bacterial host. Error rates were calculated by sequence analysis of resulting transformants.
The protocol involved assembling full length construct (˜1 kb) using the methods of Example 1. The resulting amplified PCR fragments were purified (Qiagen PCR purification kit) and heteroduplex formation was performed using ˜725 ng DNA in 20 μl STE buffer (10 mM Tris, 50 mM NaCl, 1 mM EDTA) at 95° C. for 5 minutes followed by a slow cool to room temperature. The products were purified (Qiagen PCR purification kit) and circularized by cloning into a vector (Invitrogen's BP Clonase enzyme and a pDONR221 vector was used with ˜300 ng insert and 300 ng of pDONR221 in a 20 μl reaction. The cloned products were purified (Qiagen PCR purification kit) and treated with MutS in sliding clamp (SC) buffer. The reaction was filtered through nitrocellulose membrane and bacteria were transformed with a portion of the filtrate.
The MutS sliding clamp treatment was performed using the following protocol. A pool of polynucleotides comprising homoduplex and heteroduplex nucleic acids was subjected to a reaction containing the following reaction components:
where the final concentration [1×] for the SC (Sliding Clamp) buffer is: 75 mM NaCl, 10 mM MgCl2, 1 mM DTT, and 10 mM Tris (pH 7.5). The reaction mixture was initiated in the presence of ADP as indicated at 60° C. After ˜5 minutes, 2.5 μl 10 mM ATP (**) was added to the reaction mixture, and the incubation continued for an additional ˜25 minutes.
As controls, the following conditions were included: (i) heteroduplexed DNA that was not treated with MutS, (ii) MutS treatment in STM buffer (P. Carr et al., NAR 32, e162), (iii) MutS treatment in SC buffer without ADP or ATP, and (iv) MutS treatment in SC buffer with ADP only.
Previous experiments demonstrated that addition of ATP (final 1 mM) to standard MutS treatment (12 μg MutS protein in STM buffer at a final volume of 25 μl) had little effect on colony forming units (CFU) upon transformation as compared to MutS treatment without ATP.
Separation of MutS-Bound and MutS-Free Polynucleotides:
Following the MutS treatment, the samples were filtered through a nitrocellulose membrane. This step allowed separation of MutS-bound, membrane-trapped polynucleotides from free polynucleotides that filtered through the membrane. Thus, mismatch-containing polynucleotide is selectively removed from base-matched polynucleotide.
Transformation of Bacteria and Cloning:
The resulting filtrate (5 μl), substantially free of mismatch errors, was used to transform the DH10B strain of E. coli (Invitrogen) and for subsequent cloning of the plasmid having a correct target sequence. These techniques and methods are well known to those of ordinary skill in the art. The number of colonies obtained for the different reactions is shown in
The results are shown in the following table.
In conclusion, the use of the MutS sliding clamp as a means of separating homoduplex-enriched polynucleotide from heteroduplex-enriched polynucleotide resulted in significant reduction of errors, as assessed by subsequent cloning/sequencing.
The present invention provides among other things methods for assembling large polynucleotide constructs and organisms having increased genomic stability. While specific embodiments of the subject invention have been discussed, the above specification is illustrative and not restrictive. Many variations of the invention will become apparent to those skilled in the art upon review of this specification. The full scope of the invention should be determined by reference to the claims, along with their full scope of equivalents, and the specification, along with such variations.
All publications, patents and sequence database entries mentioned herein, including those items listed below, are hereby incorporated by reference in their entirety as if each individual publication or patent was specifically and individually indicated to be incorporated by reference. In case of conflict, the present application, including any definitions herein, will control.