The instant application contains a Sequence Listing which has been submitted in ASCII format via EFS-Web and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Sep. 21, 2020, is named “DNWR-007_001WO_SeqList.txt” and is about 73.3 KB in size.
Over the last decade there has been an increase in demand for synthetic DNA molecules, which are used in a range of molecular biology applications. This increase has, in part, been driven by advances in DNA sequencing technology. However, while there have been significant developments in DNA sequencing technology, DNA synthesis technology has not progressed at a comparable pace and consequently the state-of-the-art technology does not satisfy the current market needs. The present disclosure provides compositions and methods for template-free double-stranded geometric DNA synthesis that provides a solution to the unmet need in the art for the production of long, error-free, inexpensive DNA sequences having the superior accuracy and speed of synthesis demonstrated by the compositions and methods of the present disclosure.
The present disclosure provides compositions comprising a first partially double-stranded nucleic acid molecule and an at least second partially double-stranded nucleic acid molecule, wherein the first partially double-stranded nucleic acid molecule comprises a first 5′ overhang and a second 5′ overhang, wherein the at least second partially double-stranded nucleic acid molecule comprises a third 5′ overhang and fourth 5′ overhang, wherein the second 5′ overhang and third 5′ overhang are complementary to each other, wherein the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang each comprise one of the 4-mer sequences, or complement thereof, of a 4-mer triplet, wherein the 4-mer triplet comprises three 4-mer sequences, which yield a single fragment with at least 90% purity upon ligation of the first partially double-stranded nucleic acid molecule and the at least second partially double-stranded nucleic acid molecule, and wherein the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprise a different 4-mer sequence. The 4-mer triplet can be selected from the 4-mer triplets recited in Table 1.
The present disclosure provides methods of producing a target nucleic acid molecule, the methods comprising: a) hybridizing the first and the at least second partially double-stranded nucleic acid molecules of the preceding compositions by hybridizing the second 5′ overhang of first partially double-stranded nucleic acid molecule and the third 5′ overhang of the at least second partially double-stranded nucleic acid molecule; and b) ligating the hybridized first partially double-stranded nucleic acid molecule and the at least second partially double-stranded nucleic acid molecule, thereby producing the target nucleic acid molecule. In some aspects, ligating comprises contacting the hybridized first and at least second partially double-stranded nucleic acid molecules and a ligase.
In some aspects, at least one of the first 5′ overhang, the second 5′ overhang, the third 5′ overhang and the fourth 5′ overhang can be 4 nucleotides in length. In some aspects, the first 5′ overhang, the second 5′ overhang, the third 5′ overhang and the fourth 5′ overhang can each be 4 nucleotides in length.
In some aspects, the first and the at least second double-stranded nucleic acid molecules can comprise RNA, XNA, DNA or a combination thereof. In some aspects, the first and the at least second double-stranded nucleic acid molecules can comprise DNA.
In some aspects, at least one of the first double-stranded nucleic acid molecule and the at least second double-stranded nucleic acid molecule can comprise at least one modified nucleic acid.
In some aspects, at least one of the first double-stranded nucleic acid molecule and the at least second double-stranded nucleic acid molecule can be at least about 15 nucleotides in length. In some aspects, at least one of the first double-stranded nucleic acid molecule and the at least second double-stranded nucleic acid molecule can comprises a double-stranded portion that is at least 30 bp in length, or at least 250 bp in length.
The present disclosure provides compositions comprising a first partially double-stranded nucleic acid molecule, a second partially double-stranded nucleic acid molecule, a third partially double-stranded nucleic acid molecule and an at least fourth partially double-stranded nucleic acid molecule, wherein the first partially double-stranded nucleic acid molecule comprises a first 5′ overhang and a second 5′ overhang, wherein the second partially double-stranded nucleic acid molecule comprises a third 5′ overhang and fourth 5′ overhang, wherein the third partially double-stranded nucleic acid molecule comprises a fifth 5′ overhang and a sixth 5′ overhang, wherein the at least fourth partially double-stranded nucleic acid molecule comprises a seventh 5′ overhang and an eighth 5′ overhang, wherein the second 5′ overhang and third 5′ overhang are complementary to each other, wherein the fourth 5′ overhang and the fifth 5′ overhang are complementary to each other, wherein the sixth 5′ overhang and the seventh 5′ overhang are complementary to each other, wherein the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ overhang each comprise one of the 4-mer sequences, or complement thereof, of a 4-mer quintuplet, wherein the 4-mer quintuplet comprises five 4-mer sequences, which yield a single fragment with at least 90% purity upon ligation of the first, second, third and at least fourth partially double-stranded nucleic acid molecules, and wherein the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ overhang comprise a different 4-mer sequence. The 4-mer quintuplet can be selected from the 4-mer quintuplets recited in Table 2.
The present disclosure provides methods of producing a target nucleic acid molecule, the methods comprising: a) hybridizing the first and the at least second partially double-stranded nucleic acid fragments of the preceding compositions by hybridizing the second 5′ overhang of the first partially double-stranded nucleic acid fragment and the third 5′ overhang of the second partially double-stranded nucleic acid fragment; b) ligating the hybridized first partially double-stranded nucleic acid fragment and the second partially double-stranded nucleic acid fragment to produce a first ligation product; c) hybridizing the third and the at fourth second partially double-stranded nucleic acid fragments of the preceding compositions by hybridizing the sixth 5′ overhang of third partially double-stranded nucleic acid fragment and the seventh 5′ overhang of the at least fourth partially double-stranded nucleic acid fragment; d) ligating the hybridized third partially double-stranded nucleic acid fragment and the at least fourth partially double-stranded nucleic acid fragment to produce a second ligation product; e) hybridizing the first ligation product from step (b) and the second ligation product of step (d) by hybridizing the fourth 5′ overhang and the fifth 5′ overhang; and f) ligating the hybridized first ligation product and second ligation product, thereby producing the target nucleic acid molecule. In some aspects, ligating can comprise contacting the hybridized molecules and a ligase.
In some aspects, at least one of the first 5′ overhang, the second 5′ overhang, the third 5′ overhang, the fourth 5′ overhang, the fifth 5′ overhang, the sixth 5′ overhang, the seventh 5′ overhang and the eighth 5′ overhang can be 4 nucleotides in length. In some aspects, the first 5′ overhang, the second 5′ overhang, the third 5′ overhang, the fourth 5′ overhang, the fifth 5′ overhang, the sixth 5′ overhang, the seventh 5′ overhang and the eighth 5′ overhang can each be 4 nucleotides in length.
In some aspects, the first, the second, the third and the at least fourth partially double-stranded nucleic acid molecules can comprise RNA, XNA, DNA or a combination thereof. In some aspects, the first, the second, the third and the at least fourth partially double-stranded nucleic acid molecules comprise DNA.
In some aspects, at least one of the first partially double-stranded nucleic acid molecule, the second partially double-stranded nucleic acid molecule, the third partially double-stranded nucleic acid molecule and the fourth partially double-stranded nucleic acid molecule can comprise at least one modified nucleic acid.
In some aspects, at least one of the first partially double-stranded nucleic acid molecule, the second partially double-stranded nucleic acid molecule, the third partially double-stranded nucleic acid molecule and the at least fourth partially double-stranded nucleic acid molecule can be at least about 15 nucleotides in length. In some aspects, at least one of the first partially double-stranded nucleic acid molecule, the second partially double-stranded nucleic acid molecule, the third partially double-stranded nucleic acid molecule and the at least fourth partially double-stranded nucleic acid molecule comprises a double-stranded portion can be at least 20 bp in length, or at least 250 bp in length.
The present disclosure provides methods of synthesizing a target double-stranded nucleic acid molecule comprising a target nucleic acid sequence, the methods comprising: a) determining an assembly map of the desired double-stranded nucleic acid molecule, wherein the assembly map divides the target double-stranded nucleic acid molecule into a plurality of double-stranded nucleic acid fragments, wherein the double-stranded nucleic acid fragments comprise at least two 5′ overhangs, wherein nucleic acid fragments that are adjacent within the target nucleic acid sequence comprise 5′ overhangs that are complementary, wherein the 5′ overhangs of at least one pair of nucleic acid fragments that are adjacent within the target nucleic acid sequence each comprise one of the 4-mer sequences, or complement thereof, of a 4-mer triplet, wherein the 4-mer triplet comprises three 4-mer sequences, which yield a single fragment with at least 90% purity upon ligation of the at least one pair of adjacent nucleic acid fragments; b) providing the double-stranded nucleic acid fragments determined in step (a); c) hybridizing a first pair of double-stranded nucleic acid fragments that are adjacent within the target nucleic acid via their complementary 5′ overhangs; d) ligating the hybridized nucleic acid fragments from step (c) to form a double-stranded nucleic acid fragment; e) hybridizing a second pair of double-stranded nucleic acid fragments that are adjacent within the target nucleic acid via their complementary 5′ overhangs; f) ligating the hybridized nucleic acid fragments from step (e) to form a double-stranded nucleic acid fragment, such that the double-stranded nucleic acid fragment is adjacent within the target nucleic acid sequence to the double-stranded nucleic acid formed in step (d); g) repeating steps (c)-(f) using the ligation products such that the target double-stranded nucleic acid molecule is synthesized. In some aspects, the 4-mer triplet can selected from the 4-mer triplets recited in Table 1.
The present disclosure provides methods of synthesizing a target double-stranded nucleic acid molecule comprising a target nucleic acid sequence, the methods comprising: a) determining an assembly map of the desired double-stranded nucleic acid molecule, wherein the assembly map divides the target double-stranded nucleic acid molecule into a plurality of double-stranded nucleic acid fragments, wherein the double-stranded nucleic acid fragments comprise at least two 5′ overhangs, wherein nucleic acid fragments that are adjacent within the target nucleic acid sequence comprise 5′ overhangs that are complementary, wherein the 5′ overhangs of at least one set of four nucleic acid fragments that are adjacent within the target nucleic acid sequence each comprise one of the 4-mer sequences, or complement thereof, of a 4-mer quintuplet, wherein the 4-mer quintuplet comprises five 4-mer sequences, which yield a single fragment with at least 90% purity upon ligation of the at least one set of four nucleic acid fragments; b) providing the double-stranded nucleic acid fragments determined in step (a); c) hybridizing a first pair of double-stranded nucleic acid fragments that are adjacent within the target nucleic acid via their complementary 5′ overhangs; d) ligating the hybridized nucleic acid fragments from step (c) to form a double-stranded nucleic acid fragment; e) hybridizing a second pair of double-stranded nucleic acid fragments that are adjacent within the target nucleic acid via their complementary 5′ overhangs; f) ligating the hybridized nucleic acid fragments from step (e) to form a double-stranded nucleic acid fragment, such that the double-stranded nucleic acid fragment is adjacent within the target nucleic acid sequence to the double-stranded nucleic acid formed in step (d); and g) repeating steps (c)-(f) using the ligation products such that the target double-stranded nucleic acid molecule is synthesized. In some aspects, the 4-mer quintuplet can be selected from the 4-mer quintuplets recited in Table 2.
In some aspects, an assembly map can divide the target double-stranded nucleic acid molecule into at least 4 double-stranded nucleic acid fragments, or at least 50 double-stranded nucleic acid fragments, or at least 100 double-stranded nucleic acid fragments.
In some aspects, the target double-stranded nucleic acid molecule can be at least 1000 nucleotides in length, or at least 2000 nucleotides in length, or least 3000 nucleotides in length.
In some aspects, the target double-stranded nucleic acid can comprise at least one homopolymeric sequence. A homopolymeric sequence can be 10 nucleotides in length. In some aspects, a target double-stranded nucleic acid molecule can have a GC content that is at least about 50%.
In some aspects of the preceding methods, at least one of the double-stranded nucleic acid fragments that corresponds to at least one of the termini of the target double-stranded nucleic acid molecule comprises a hairpin sequence.
In some aspects, the preceding methods can further comprise after step (g): h) incubating the ligation products with at least one exonuclease. In some aspects, a hairpin sequence can comprise at least one deoxyuridine base. In some aspects, a hairpin sequence can comprise at least one restriction endonuclease site.
In some aspects, the preceding methods can further comprise: i) removing the at least one exonuclease; and j) incubating the products of the exonuclease incubation with at least one enzyme that cleaves the at least one deoxyuridine base, thereby cleaving the hairpin sequence.
In some aspects, the preceding methods can further comprise: i) removing the at least one exonuclease; and j) incubating the products of the exonuclease incubation with at least one enzyme that cleaves the at least one restriction endonuclease site, thereby cleaving the hairpin sequence.
In some aspects of the preceding methods, a synthesized target double-stranded nucleic acid molecule can have a purity of at least 80% or at least 90%.
Any of the above aspects can be combined with any other aspect.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. In the Specification, the singular forms also include the plural unless the context clearly dictates otherwise; as examples, the terms “a,” “an,” and “the” are understood to be singular or plural and the term “or” is understood to be inclusive. By way of example, “an element” means one or more element. Throughout the specification the word “comprising,” or variations such as “comprises” or “comprising,” will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps. About can be understood as within 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.5%, 0.1%, 0.05%, or 0.01% of the stated value. Unless otherwise clear from the context, all numerical values provided herein are modified by the term “about.”
Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present disclosure, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. The references cited herein are not admitted to be prior art to the claimed invention. In the case of conflict, the present Specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and are not intended to be limiting. Other features and advantages of the disclosure will be apparent from the following detailed description and claim.
The above and further features will be more clearly appreciated from the following detailed description when taken in conjunction with the accompanying drawings.
The top panel of
The present disclosure provides a DNA assembly methodology entitled “double-stranded geometric synthesis (gSynth)” and compositions related thereto for the synthesis of long, arbitrary double-stranded nucleic acid sequences. In a double-stranded gSynth assembly reaction, the target sequence (i.e. the sequence that is to be synthesized) is computationally broken into a sets of adjacent, double-stranded nucleic acid fragments, These adjacent double-stranded nucleic acid fragments are then ligated together in one-pair at-a-time ligation reactions in a systematic assembly method. These fragments possess 3′ and/or 5′ overhanging single-stranded N-mer sites, with three key properties. 1) The N-mer sites are not self-hybridizing or self-reactive in ligation reactions. 2) The N-mer site at one end of the fragment does not cross-hybridize or cross-react with the N-mer site at the other end. Finally, 3) there is one N-mer site on each fragment of an adjacent pair of fragments in that will hybridize and ligate with the adjacent fragment in a ligation reaction leading to a new, longer double-stranded fragment. The present disclosure provides preferred N-mer sites that facilitate more efficient and accurate ligation reactions, thereby allowing the double-stranded gSynth methods of the present disclosure to be used to synthesize nucleic acid sequences of unprecedented lengths that are not achievable using existing nucleic acid assembly and synthesis techniques. The double-stranded fragments of the present disclosure can be generated using conventional phosphoramidite chemical synthesis, single-stranded geometric synthesis (WO2019140353A1), or conventional molecular cloning, for example from a restriction digest of a plasmid.
Compositions of the Present Disclosure
The present disclosure provides compositions comprising a first partially double-stranded nucleic acid molecule and an at least second partially double-stranded nucleic acid molecule, wherein the first partially double-stranded nucleic acid molecule comprises a first 5′ overhang and a second 5′ overhang, wherein the at least second partially double-stranded nucleic acid molecule comprises a third 5′ overhang and fourth 5′ overhang, wherein the second 5′ overhang and third 5′ overhang are complementary to each other, wherein the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang each comprise one of the 4-mer sequences, or complement thereof, of a 4-mer triplet, wherein the 4-mer triplet comprises three 4-mer sequences, which yield a single fragment with at least 90% purity upon ligation of the first partially double-stranded nucleic acid molecule and the at least second partially double-stranded nucleic acid molecule, and wherein the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprise a different 4-mer sequence. A schematic of a representative composition of the present disclosure is shown in
The present disclosure provides compositions comprising a first partially double-stranded nucleic acid molecule and an at least second partially double-stranded nucleic acid molecule, wherein the first partially double-stranded nucleic acid molecule comprises a first 5′ overhang and a second 5′ overhang, wherein the at least second partially double-stranded nucleic acid molecule comprises a third 5′ overhang and fourth 5′ overhang, wherein the second 5′ overhang and third 5′ overhang are complementary to each other, wherein the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang each comprise one of the 4-mer sequences, or complement thereof, of a 4-mer triplet, wherein the 4-mer triplet comprises three 4-mer sequences, which yield a single fragment upon ligation of the first partially double-stranded nucleic acid molecule and the at least second partially double-stranded nucleic acid molecule, and wherein the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprise a different 4-mer sequence. A schematic of a representative composition of the present disclosure is shown in
The present disclosure provides compositions comprising a first partially double-stranded nucleic acid molecule and an at least second partially double-stranded nucleic acid molecule, wherein the first partially double-stranded nucleic acid molecule comprises a first 5′ overhang and a second 5′ overhang, wherein the at least second partially double-stranded nucleic acid molecule comprises a third 5′ overhang and fourth 5′ overhang, wherein the second 5′ overhang and third 5′ overhang are complementary to each other, wherein the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang each comprise one of the 4-mer sequences, or complement thereof, of a 4-mer triplet, wherein the 4-mer triplet is selected from the 4-mer triplets recited in Table 1, and wherein the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprise a different 4-mer sequence. A schematic of a representative composition of the present disclosure is shown in
As used herein, the term “4-mer” refers to a nucleic acid sequence consisting of 4 nucleotides.
As used herein the term “4-mer triplet” refers to a set of three distinct 4-mer sequences. These three distinct 4-mer sequences together provide superior and unexpected results in that when the three sequences, or complements thereof are used in the 5′ overhangs of a pair of partially double-stranded nucleic acid molecules, the pair of partially double-stranded nucleic acid molecules can be ligated together with high efficiency and/or high fidelity.
In some aspects, the three distinct 4-mer sequences, or complements thereof, of a 4-mer triplet, when used in the 5′ overhangs of a pair of partially double-stranded nucleic acid molecules, can allow for the ligation of the partially double-stranded nucleic acid molecule such that the resulting ligation product has a purity of at least 80%, or at least 90%, or at least 95%, or at least 99%. In some aspects, the three distinct 4-mer sequences, or complements thereof, of a 4-mer triplet, when used in the 5′ overhangs of a pair of partially double-stranded nucleic acid molecules, can allow for the ligation of the partially double-stranded nucleic acid molecule such that the resulting ligation product has a purity of at least 90%. In some aspects, purity refers to the percentage of the total ligation products that were formed as part of a ligation reaction (or multiple rounds of ligation reactions) that correspond to the correct/desired ligation product. Thus, in a non-limiting example, the three-distinct 4-mer sequences, or complements thereof, of a 4-mer triplet, when used in the 5′ overhangs of a pair of partially double-stranded nucleic acid molecules, can allow for the ligation of the partially double-stranded nucleic acid molecules such that when a ligation reaction comprising a plurality of the pair of partially double-stranded nucleic acid molecules is performed, 90% of the resulting ligation products correspond to the correct/desired ligation product.
to the percentage of the total ligation products that were formed as part of a single ligation reaction, or multiple rounds of ligation reactions, that correspond to the correct/desired ligation product. Without wishing to be bound by theory, the methods of the present disclosure comprising the ligation of nuclei acid molecules produce can produce plurality of ligation products, some of which correspond to the correct/desired ligation product, and some that are undesired (side-reactions, incorrect ligations, etc.). The purity of a ligation product, or a target molecule that is being synthesized, can be expressed as a percentage, which corresponds to the percentage of the total ligation products formed which correspond to the correct/desired ligation product.
In some aspects, the three distinct 4-mer sequences of a 4-mer triplet can be experimentally determined. In some aspects, the three distinct 4-mer sequences of a 4-mer triplet can be experimentally determined using the methods described in Example 5.
Non-limiting examples of preferred 4-mer triplets are shown in Table 1.
In a non-limiting example of the preceding compositions, wherein the triplet selected from Table 1 is 4-mer triplet #1, the first 5′ overhang can comprise either 4-mer #1 of the triplet (AAAA), 4-mer #2 of the triplet (CACC) or 4-mer #3 of the triplet (CGCC), or the complements thereof. If the first 5′ overhang comprises 4-mer #1 of the triplet (AAAA), then the third 5′ overhang can comprise either 4-mer #2 of the triplet (CACC) or 4-mer #3 of the triplet (CGCC). If the first 5′ overhang comprises 4-mer #1 of the triplet (AAAA) and the third 5′ overhang comprises 4-mer #2 of the triplet (CACC), then the fourth 5′ overhang will comprise 4-mer #3 of the triplet (CGCC). That is, one of the first, third and fourth 5′ overhangs will comprise the 4-mer of the second column of a single row of Table 1, one of the first, third and fourth 5′ overhangs will comprise the 4-mer of the third column of the same row of Table 1, and one of the first, third and fourth 5′ overhangs will comprise the 4-mer of the fourth column of Table 1, wherein the first, third and fourth 5′ overhangs comprise a different 4-mer sequence.
In some aspects, a double-stranded nucleic acid fragment or a double-stranded nucleic acid molecule can be a partially double-stranded nucleic acid molecule or a partially double-stranded nucleic acid fragment. As used herein, the terms “partially double-stranded nucleic acid molecule” and “partially double-stranded nucleic acid fragment” also refers to a nucleic acid molecule comprised of two polynucleotide strands, wherein at least a portion of the two strands are hybridized (i.e. base-paired) to each other such that the nucleic acid molecule comprises at least one portion that is double-stranded and at least one portion that is single-stranded (i.e. not base-paired with the other strand). In some aspects, only one of the strands has a single-stranded portion. In some aspects, both of the strands has a single-stranded portion. As used herein, the terms “nucleic acid molecule” and “nucleic acid fragment” are used interchangeably.
The present disclosure provides compositions comprising a first partially double-stranded nucleic acid molecule, a second partially double-stranded nucleic acid molecule, a third partially double-stranded nucleic acid molecule and an at least fourth partially double-stranded nucleic acid molecule, wherein the first partially double-stranded nucleic acid molecule comprises a first 5′ overhang and a second 5′ overhang, wherein the second partially double-stranded nucleic acid molecule comprises a third 5′ overhang and fourth 5′ overhang, wherein the third partially double-stranded nucleic acid molecule comprises a fifth 5′ overhang and a sixth 5′ overhang, wherein the at least fourth partially double-stranded nucleic acid molecule comprises a seventh 5′ overhang and an eighth 5′ overhang, wherein the second 5′ overhang and third 5′ overhang are complementary to each other, wherein the fourth 5′ overhang and the fifth 5′ overhang are complementary to each other, wherein the sixth 5′ overhang and the seventh 5′ overhang are complementary to each other, wherein the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ overhang each comprise one of the 4-mer sequences, or complement thereof, of a 4-mer quintuplet, wherein the 4-mer quintuplet comprises five 4-mer sequences, which yield a single fragment with at least 90% purity upon ligation of the first, second, third and at least fourth partially double-stranded nucleic acid molecules, and wherein the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ overhang comprise a different 4-mer sequence.
The present disclosure provides compositions comprising a first partially double-stranded nucleic acid molecule, a second partially double-stranded nucleic acid molecule, a third partially double-stranded nucleic acid molecule and an at least fourth partially double-stranded nucleic acid molecule, wherein the first partially double-stranded nucleic acid molecule comprises a first 5′ overhang and a second 5′ overhang, wherein the second partially double-stranded nucleic acid molecule comprises a third 5′ overhang and fourth 5′ overhang, wherein the third partially double-stranded nucleic acid molecule comprises a fifth 5′ overhang and a sixth 5′ overhang, wherein the at least fourth partially double-stranded nucleic acid molecule comprises a seventh 5′ overhang and an eighth 5′ overhang, wherein the second 5′ overhang and third 5′ overhang are complementary to each other, wherein the fourth 5′ overhang and the fifth 5′ overhang are complementary to each other, wherein the sixth 5′ overhang and the seventh 5′ overhang are complementary to each other, wherein the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ overhang each comprise one of the 4-mer sequences, or complement thereof, of a 4-mer quintuplet, wherein the 4-mer quintuplet comprises five 4-mer sequences, which yield only one fragment upon ligation of the first, second, third and at least fourth partially double-stranded nucleic acid molecules, and wherein the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ overhang comprise a different 4-mer sequence. A schematic of a representative composition of the present disclosure is shown in
The present disclosure provides compositions comprising a first double-stranded nucleic acid fragment, a second double-stranded nucleic acid fragment, a third double-stranded nucleic acid fragment and an at least fourth double-stranded nucleic acid fragment, wherein the first double-stranded nucleic acid fragment comprises a first 5′ overhang and a second 5′ overhang, wherein the second double-stranded nucleic acid fragment comprises a third 5′ overhang and fourth 5′ overhang, wherein the third double-stranded nucleic acid fragment comprises a fifth 5′ overhang and a sixth 5′ overhang, wherein the at least fourth double-stranded nucleic acid fragment comprises a seventh 5′ overhang and an eighth 5′ overhang, wherein the second 5′ overhang and third 5′ overhang are complementary to each other, wherein the fourth 5′ overhang and the fifth 5′ overhang are complementary to each other, wherein the sixth 5′ overhang and the seventh 5′ overhang are complementary to each other, wherein the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ overhang each comprise one of the 4-mer sequences, or complement thereof, of a 4-mer quintuplet, wherein the 4-mer quintuplet is selected from the 4-mer quintuplets recited in Table 2, and wherein the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ overhang comprise a different 4-mer sequence. A schematic of a representative composition of the present disclosure is shown in
As used herein the term “4-mer quintuplet” refers to a set of five distinct 4-mer sequences. These five distinct 4-mer sequences together provide superior and unexpected results in that when the five sequences are used as in the 5′ overhangs of a set of four partially double-stranded nucleic acid molecules, the four partially double-stranded nucleic acid molecules can be ligated together in a step wise assembly reaction with high efficiency and/or high fidelity.
In some aspects, the five distinct 4-mer sequences, or complements thereof, of a 4-mer quintuplet, when used in the 5′ overhangs of a set of four partially double-stranded nucleic acid molecules, can allow for the ligation of the four partially double-stranded nucleic acid molecules such that the resulting ligation product has a purity of at least 80%, or at least 90%, or at least 95%, or at least 99%. In some aspects, the five distinct 4-mer sequences, or complements thereof, of a 4-mer quintuplet, when used in the 5′ overhangs of a set of four partially double-stranded nucleic acid molecules, can allow for the ligation of the four partially double-stranded nucleic acid molecules such that the resulting ligation product has a purity of at least 90%.
In some aspects, purity refers to the percentage of the total ligation products that were formed as part of a ligation reaction (or multiple rounds of ligation reactions) that correspond to the correct/desired ligation product. Thus, in a non-limiting example, the five distinct 4-mer sequences, or complements thereof, of a 4-mer quintuplet, when used in the 5′ overhangs of a set of four partially double-stranded nucleic acid molecules, can allow for the ligation of the four partially double-stranded nucleic acid molecules such that when a ligation reaction (or two or more consecutive rounds of ligation reactions) comprising a plurality of the set of four partially double-stranded nucleic acid molecules is performed, 90% of the resulting ligation products correspond to the correct/desired ligation product.
In some aspects, the five distinct 4-mer sequences of a 4-mer quintuplet can be experimentally determined. In some aspects, the five distinct 4-mer sequences of a 4-mer quintuplet can be experimentally determined using the methods described in Example 5.
Non-limiting examples of preferred 4-mer triplets are shown in Table 2.
In a non-limiting example of the preceding compositions, wherein the quintuplet selected from Table 2 is 4-mer quintuplet #1, one of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ overhang will comprise 4-mer #1 of the quintuplet (AAAC), or the complement thereof, another one of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ overhang will comprise 4-mer #2 of the quintuplet (AAGG), or the complement thereof, another one of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ overhang will comprise 4-mer #3 of the quintuplet (TGAC), or the complement thereof, another one of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ overhang will comprise 4-mer #4 of the quintuplet (TCGT), or the complement thereof, and another one of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ overhang will comprise 4-mer #5 of the quintuplet (GTAG), or the complement thereof, and each of the overhangs comprise a different 4-mer sequence.
In some aspects, the double-stranded portion of a partially double-stranded nucleic acid molecule can comprise at least about 5 base-pairs (bp), or at least about 6 bp, or at least about 7 bp, or at least about 8 bp, or at least about 9 bp, or at least about 10 bp, or at least about 11 bp, or at least about 12 bp, or at least about 13 bp, or at least about 14 bp, or at least about 15 bp, or at least about 16 bp, or at least about 17 bp, or at least about 18 bp, or at least about 19 bp, or at least about 20 bp, or at least about 21 bp, or at least about 22 bp, or at least about 23 bp, or at least about 24 bp, or at least about 25 bp, or at least about 26 bp, or at least about 27 bp, or at least about 28 bp, or at least about 29 bp, or at least about 30 bp, or at least about 31 bp, or at least about 32 bp, or at least about 33 bp, or at least about 34 bp, or at least about 35 bp, or at least about 36 bp, or at least about 37 bp, or at least about 38 bp, or at least about 39 bp, or at least about 40 bp in length.
In some aspects, the double-stranded portion of a partially double-stranded nucleic acid molecule is about 5 bp to about 40 bp in length. In some aspects, the double-stranded portion of a partially double-stranded nucleic acid molecule is about 10 bp to about 35 bp in length. In some aspects, the double-stranded portion of a partially double-stranded nucleic acid molecule is about 20 bp to about 30 bp in length.
In some aspects, a partially double-stranded nucleic acid molecule can be at least about 5 nucleotides, or at least about 10 nucleotides, or at least about 15 nucleotides, or at least about 20 nucleotides, or at least about 25 nucleotides, or at least about 30 nucleotides, or at least about 35 nucleotides, or at least about 40 nucleotides in length.
As used herein, the term 5′ overhang is used to refer to a single-stranded portion of a partially double-stranded nucleic acid molecule that is located at the 5′ terminus of one of the strands. An illustrative example of 5′ overhangs are shown in
As used herein, the term 3′ overhang is used to refer to a single-stranded portion of a partially double-stranded nucleic acid molecule that is located at the 3′ terminus of one of the strands.
In some aspects of the compositions of the present disclosure, a 5′ overhang can comprise one of the 4-mers of one of the 4-mer triplets recited in Table 1. In some aspects of the compositions of the present disclosure, a 5′ overhang can consist of one of the 4-mer triplets recited in Table 1.
In some aspects of the compositions of the present disclosure, a 5′ overhang can comprise one of the 4-mers of one of the 4-mer quintuplets recited in Table 2. In some aspects of the compositions of the present disclosure, a 5′ overhang can consist of one of the 4-mers of one of the 4-mer quintuplets recited in Table 2.
In some aspects of the compositions of the present disclosure a 5′ overhang can be about 4 nucleotides in length. In some aspects of the compositions of the present disclosure a 5′ overhang can be at least about 4 nucleotides, or at least about 5 nucleotides, or at least about 6 nucleotides, or at least about 7 nucleotides, or at least about 8 nucleotides, or at least about 9 nucleotides, or at least about 10 nucleotides in length.
In some aspects, a 5′ overhang is no more than 4, or no more than 5, or no more than 6, or no more than 7, or no more than 8, or no more than 9, or no more than 10 nucleotides, or no more than 11 nucleotides, or no more than 12 nucleotides, or no more than 13 nucleotide, or no more than 14 nucleotides, or no more than 15 nucleotides, or no more than 16 nucleotides, or no more than 17 nucleotides, or no more than 18 nucleotides, or no more than 19 nucleotides, or no more than 20 nucleotides in length.
In some aspects of the compositions of the present disclosure a 3′ overhang can be about 4 nucleotides in length. In some aspects of the compositions of the present disclosure a 3′ overhang can be at least about 4 nucleotides, or at least about 5 nucleotides, or at least about 6 nucleotides, or at least about 7 nucleotides, or at least about 8 nucleotides, or at least about 9 nucleotides, or at least about 10 nucleotides in length.
In some aspects, a 3′ overhang is no more than 4, or no more than 5, or no more than 6, or no more than 7, or no more than 8, or no more than 9, or no more than 10 nucleotides, or no more than 11 nucleotides, or no more than 12 nucleotides, or no more than 13 nucleotide, or no more than 14 nucleotides, or no more than 15 nucleotides, or no more than 16 nucleotides, or no more than 17 nucleotides, or no more than 18 nucleotides, or no more than 19 nucleotides, or no more than 20 nucleotides in length.
In some aspects of the compositions of the present disclosure, a 5′ overhang can comprise at least one of the nucleic acid sequences, or the complement thereof, selected from AAAA, CACC, CGCC, AAAA, GACA, CGCC, AAAC, AAGG, TGAC, AAAC, TGAC, GTAG, AACG, CACC, CGCC, AACG, GACA, CGCC, ACCG, CCGA, GAGG, ACGC, CGTT, CTGG, ACGC, CTGG, CGCA, AGCC, CACC, GCAA, AGCC, GACA, GCAA, AGCC, GCAA, TCCC, AGTT, TGAT, TGTG, ATCC, ACCG, GAGG, ATCC, ATGC, AAGG, ATCC, TACC, ACCG, ATGT, TTGA, GGTC, CAAC, TGAT, TGAC, CAAC, TTTT, TGAT, CGAG, AACA, AGTT, CGAG, AGTT, TGTG, CGGT, TTGC, ATCC, CGTT, CTGG, CGCA, CGTT, CTGG, GGAA, CGTT, GGAA, CGCA, CTGC, TCTT, ACGA, CTGC, TGTC, ACGA, CTGG, GGAA, CGCA, GAGG, ACGC, CGTT, GAGG, ACGC, CTGG, GAGG, CCCA, TGGC, GAGG, CGTT, CGCA, GAGG, CGTT, GGAA, GAGG, CTGG, CGCA, GAGG, TGGC, TCAC, GCAA, ACTG, TCCC, GCAA, TGGC, TCCC, GCTC, ATGG, CGGT, GCTC, CGGT, ATCC, GGAA, ATCC, AAGG, GGAA, GTTT, ATCC, GTAG, CTGC, ACGA, GTAG, TCTG, CTGC, GTAG, TGCT, CTGC, GTTT, ATGA, ATGT, GTTf, ATGT, GGTC, TCAC, AAAA, CGCC, TCAC, AACG, CGCC, TCAC, TCTG, AACG, TCAC, TGCT, AAAA, TCAC, TGCT, AACG, TGAC, TCGT, GTAG, TGAT, ATGT, TGAC, TGGC, CTCC and TCAC.
In some aspects of the compositions of the present disclosure, a 5′ overhang can consist of at least one of the sequences, or the complement thereof, selected from AAAA, CACC, CGCC, AAAA, GACA, CGCC, AAAC, AAGG, TGAC, AAAC, TGAC, GTAG, AACG, CACC, CGCC, AACG, GACA, CGCC, ACCG, CCGA, GAGG, ACGC, CGTT, CTGG, ACGC, CTGG, CGCA, AGCC, CACC, GCAA, AGCC, GACA, GCAA, AGCC, GCAA, TCCC, AGTT, TGAT, TGTG, ATCC, ACCG, GAGG, ATCC, ATGC, AAGG, ATCC, TACC, ACCG, ATGT, TTGA, GGTC, CAAC, TGAT, TGAC, CAAC, TTTT, TGAT, CGAG, AACA, AGTT, CGAG, AGTT, TGTG, CGGT, TTGC, ATCC, CGTT, CTGG, CGCA, CGTT, CTGG, GGAA, CGTT, GGAA, CGCA, CTGC, TCTT, ACGA, CTGC, TGTC, ACGA, CTGG, GGAA, CGCA, GAGG, ACGC, CGTT, GAGG, ACGC, CTGG, GAGG, CCCA, TGGC, GAGG, CGTT, CGCA, GAGG, CGTT, GGAA, GAGG, CTGG, CGCA, GAGG, TGGC, TCAC, GCAA, ACTG, TCCC, GCAA, TGGC, TCCC, GCTC, ATGG, CGGT, GCTC, CGGT, ATCC, GGAA, ATCC, AAGG, GGAA, GTTT, ATCC, GTAG, CTGC, ACGA, GTAG, TCTG, CTGC, GTAG, TGCT, CTGC, GTTT, ATGA, ATGT, GTTT, ATGT, GGTC, TCAC, AAAA, CGCC, TCAC, AACG, CGCC, TCAC, TCTG, AACG, TCAC, TGCT, AAAA, TCAC, TGCT, AACG, TGAC, TCGT, GTAG, TGAT, ATGT, TGAC, TGGC, CTCC and TCAC.
In some aspects of the compositions of the present disclosure, a 5′ overhang can comprise at least one of the nucleic acid sequences, or the complement thereof, selected from AAAC, AAGG, TGAC, TCGT, GTAG, ACGC, CGTT, CTGG, GGAA, CGCA, AGCC, CACC, GCAA, ACTG, TCCC, AGCC, CACC, GCAA, TGGC, TCCC, AGCC, GACA, GCAA, ACTG, TCCC, AGCC, GACA, GCAA, TGGC, TCCC, ATCC, TACC, ACCG, CCGA, GAGG, CAAC, TTTT, TGAT, ATGT, TGAC, CGAG, AACA, AGTT, TGAT, TGTG, GAGG, ACGC, CGTT, CTGG, CGCA, GAGG, ACGC, CGTT, CTGG, GGAA, GAGG, ACGC, CGTT, GGAA, CGCA, GAGG, ACGC, CTGG, GGAA, CGCA, GAGG, CCCA, TGGC, CTCC, TCAC, GCTC, ATGG, CGGT, TTGC, ATCC, GGAA, GTTT, ATCC, ATGC, AAGG, GTAG, TCTG, CTGC, TCTT, ACGA, GTAG, TCTG, CTGC, TGTC, ACGA, GTAG, TGCT, CTGC, TCTT, ACGA, GTAG, TGCT, CTGC, TGTC, ACGA, GTTT, ATGA, ATGT, TTGA, GGTC, TCAC, TCTG, AACG, CACC, CGCC, TCAC, TCTG, AACG, GACA, CGCC, TCAC, TGCT, AAAA, CACC, CGCC, TCAC, TGCT, AAAA, GACA, CGCC, TCAC, TGCT, AACG, CACC, CGCC, TCAC, TGCT, AACG, GACA, and CGCC
In some aspects of the compositions of the present disclosure, a 5′ overhang can consist of at least one of the sequences, or the complement thereof, selected from AAAC, AAGG, TGAC, TCGT, GTAG, ACGC, CGTT, CTGG, GGAA, CGCA, AGCC, CACC, GCAA, ACTG, TCCC, AGCC, CACC, GCAA, TGGC, TCCC, AGCC, GACA, GCAA, ACTG, TCCC, AGCC, GACA, GCAA, TGGC, TCCC, ATCC, TACC, ACCG, CCGA, GAGG, CAAC, TTTT, TGAT, ATGT, TGAC, CGAG, AACA, AGTT, TGAT, TGTG, GAGG, ACGC, CGTT, CTGG, CGCA, GAGG, ACGC, CGTT, CTGG, GGAA, GAGG, ACGC, CGTT, GGAA, CGCA, GAGG, ACGC, CTGG, GGAA, CGCA, GAGG, CCCA, TGGC, CTCC, TCAC, GCTC, ATGG, CGGT, TTGC, ATCC, GGAA, GTTT, ATCC, ATGC, AAGG, GTAG, TCTG, CTGC, TCTT, ACGA, GTAG, TCTG, CTGC, TGTC, ACGA, GTAG, TGCT, CTGC, TCTT, ACGA, GTAG, TGCT, CTGC, TGTC, ACGA, GT, ATGA, ATGT, TTGA, GGTC, TCAC, TCTG, AACG, CACC, CGCC, TCAC, TCTG, AACG, GACA, CGCC, TCAC, TGCT, AAAA, CACC, CGCC, TCAC, TGCT, AAAA, GACA, CGCC, TCAC, TGCT, AACG, CACC, CGCC, TCAC, TGCT, AACG, GACA, and CGCC.
In some aspects of the compositions of the present disclosure, a 3′ overhang can comprise at least one of the nucleic acid sequences, or the complement thereof, selected from AAAA, CACC, CGCC, AAAA, GACA, CGCC, AAAC, AAGG, TGAC, AAAC, TGAC, GTAG, AACG, CACC, CGCC, AACG, GACA, CGCC, ACCG, CCGA, GAGG, ACGC, CGTT, CTGG, ACGC, CTGG, CGCA, AGCC, CACC, GCAA, AGCC, GACA, GCAA, AGCC, GCAA, TCCC, AGTT, TGAT, TGTG, ATCC, ACCG, GAGG, ATCC, ATGC, AAGG, ATCC, TACC, ACCG, ATGT, TTGA, GGTC, CAAC, TGAT, TGAC, CAAC, TTTT, TGAT, CGAG, AACA, AGTT, CGAG, AGTT, TGTG, CGGT, TTGC, ATCC, CGTT, CTGG, CGCA, CGTT, CTGG, GGAA, CGTT, GGAA, CGCA, CTGC, TCTT, ACGA, CTGC, TGTC, ACGA, CTGG, GGAA, CGCA, GAGG, ACGC, CGTT, GAGG, ACGC, CTGG, GAGG, CCCA, TGGC, GAGG, CGTT, CGCA, GAGG, CGTT, GGAA, GAGG, CTGG, CGCA, GAGG, TGGC, TCAC, GCAA, ACTG, TCCC, GCAA, TGGC, TCCC, GCTC, ATGG, CGGT, GCTC, CGGT, ATCC, GGAA, ATCC, AAGG, GGAA, GTTT, ATCC, GTAG, CTGC, ACGA, GTAG, TCTG, CTGC, GTAG, TGCT, CTGC, GTTT, ATGA, ATGT, GTTf, ATGT, GGTC, TCAC, AAAA, CGCC, TCAC, AACG, CGCC, TCAC, TCTG, AACG, TCAC, TGCT, AAAA, TCAC, TGCT, AACG, TGAC, TCGT, GTAG, TGAT, ATGT, TGAC, TGGC, CTCC and TCAC.
In some aspects of the compositions of the present disclosure, a 3′ overhang can consist of at least one of the sequences, or the complement thereof, selected from AAAA, CACC, CGCC, AAAA, GACA, CGCC, AAAC, AAGG, TGAC, AAAC, TGAC, GTAG, AACG, CACC, CGCC, AACG, GACA, CGCC, ACCG, CCGA, GAGG, ACGC, CGTT, CTGG, ACGC, CTGG, CGCA, AGCC, CACC, GCAA, AGCC, GACA, GCAA, AGCC, GCAA, TCCC, AGTT, TGAT, TGTG, ATCC, ACCG, GAGG, ATCC, ATGC, AAGG, ATCC, TACC, ACCG, ATGT, TTGA, GGTC, CAAC, TGAT, TGAC, CAAC, TTTT, TGAT, CGAG, AACA, AGTT, CGAG, AGTT, TGTG, CGGT, TTGC, ATCC, CGTT, CTGG, CGCA, CGTT, CTGG, GGAA, CGTT, GGAA, CGCA, CTGC, TCTT, ACGA, CTGC, TGTC, ACGA, CTGG, GGAA, CGCA, GAGG, ACGC, CGTT, GAGG, ACGC, CTGG, GAGG, CCCA, TGGC, GAGG, CGTT, CGCA, GAGG, CGTT, GGAA, GAGG, CTGG, CGCA, GAGG, TGGC, TCAC, GCAA, ACTG, TCCC, GCAA, TGGC, TCCC, GCTC, ATGG, CGGT, GCTC, CGGT, ATCC, GGAA, ATCC, AAGG, GGAA, GTTT, ATCC, GTAG, CTGC, ACGA, GTAG, TCTG, CTGC, GTAG, TGCT, CTGC, GTTT, ATGA, ATGT, GTTT, ATGT, GGTC, TCAC, AAAA, CGCC, TCAC, AACG, CGCC, TCAC, TCTG, AACG, TCAC, TGCT, AAAA, TCAC, TGCT, AACG, TGAC, TCGT, GTAG, TGAT, ATGT, TGAC, TGGC, CTCC and TCAC.
In some aspects of the compositions of the present disclosure, a 3′ overhang can comprise at least one of the nucleic acid sequences, or the complement thereof, selected from AAAC, AAGG, TGAC, TCGT, GTAG, ACGC, CGTT, CTGG, GGAA, CGCA, AGCC, CACC, GCAA, ACTG, TCCC, AGCC, CACC, GCAA, TGGC, TCCC, AGCC, GACA, GCAA, ACTG, TCCC, AGCC, GACA, GCAA, TGGC, TCCC, ATCC, TACC, ACCG, CCGA, GAGG, CAAC, TTTT, TGAT, ATGT, TGAC, CGAG, AACA, AGTT, TGAT, TGTG, GAGG, ACGC, CGTT, CTGG, CGCA, GAGG, ACGC, CGTT, CTGG, GGAA, GAGG, ACGC, CGTT, GGAA, CGCA, GAGG, ACGC, CTGG, GGAA, CGCA, GAGG, CCCA, TGGC, CTCC, TCAC, GCTC, ATGG, CGGT, TTGC, ATCC, GGAA, GTTT, ATCC, ATGC, AAGG, GTAG, TCTG, CTGC, TCTT, ACGA, GTAG, TCTG, CTGC, TGTC, ACGA, GTAG, TGCT, CTGC, TCTT, ACGA, GTAG, TGCT, CTGC, TGTC, ACGA, GTTT, ATGA, ATGT, TTGA, GGTC, TCAC, TCTG, AACG, CACC, CGCC, TCAC, TCTG, AACG, GACA, CGCC, TCAC, TGCT, AAAA, CACC, CGCC, TCAC, TGCT, AAAA, GACA, CGCC, TCAC, TGCT, AACG, CACC, CGCC, TCAC, TGCT, AACG, GACA, and CGCC
In some aspects of the compositions of the present disclosure, a 3′ overhang can consist of at least one of the sequences, or the complement thereof, selected from AAAC, AAGG, TGAC, TCGT, GTAG, ACGC, CGTT, CTGG, GGAA, CGCA, AGCC, CACC, GCAA, ACTG, TCCC, AGCC, CACC, GCAA, TGGC, TCCC, AGCC, GACA, GCAA, ACTG, TCCC, AGCC, GACA, GCAA, TGGC, TCCC, ATCC, TACC, ACCG, CCGA, GAGG, CAAC, TTTT, TGAT, ATGT, TGAC, CGAG, AACA, AGTT, TGAT, TGTG, GAGG, ACGC, CGTT, CTGG, CGCA, GAGG, ACGC, CGTT, CTGG, GGAA, GAGG, ACGC, CGTT, GGAA, CGCA, GAGG, ACGC, CTGG, GGAA, CGCA, GAGG, CCCA, TGGC, CTCC, TCAC, GCTC, ATGG, CGGT, TTGC, ATCC, GGAA, GTTT, ATCC, ATGC, AAGG, GTAG, TCTG, CTGC, TCTT, ACGA, GTAG, TCTG, CTGC, TGTC, ACGA, GTAG, TGCT, CTGC, TCTT, ACGA, GTAG, TGCT, CTGC, TGTC, ACGA, GT, ATGA, ATGT, TTGA, GGTC, TCAC, TCTG, AACG, CACC, CGCC, TCAC, TCTG, AACG, GACA, CGCC, TCAC, TGCT, AAAA, CACC, CGCC, TCAC, TGCT, AAAA, GACA, CGCC, TCAC, TGCT, AACG, CACC, CGCC, TCAC, TGCT, AACG, GACA, and CGCC.
In some aspects of the compositions of the present disclosure, any description and/or characteristic of a 5′ overhang can be applied to a 3′ overhang.
The present disclosure provides compositions comprising a first partially double-stranded nucleic acid molecule and an at least second partially double-stranded nucleic acid molecule, wherein the first partially double-stranded nucleic acid molecule comprises a first 5′ overhang and a second 5′ overhang, wherein the at least second partially double-stranded nucleic acid molecule comprises a third 5′ overhang and fourth 5′ overhang, wherein the second 5′ overhang and third 5′ overhang are complementary to each other, wherein at least one of the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprises one of the 4-mer sequences, or complement thereof, recited in Table 3, and wherein the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprise a different 4-mer sequence. In some aspects of the preceding compositions, at least two of the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprises one of the 4-mer sequences, or complement thereof, recited in Table 3. In some aspects of the preceding compositions, each of the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprises one of the 4-mer sequences, or complement thereof recited in Table 3.
The present disclosure provides compositions comprising a first double-stranded nucleic acid fragment, a second double-stranded nucleic acid fragment, a third double-stranded nucleic acid fragment and an at least fourth double-stranded nucleic acid fragment, wherein the first double-stranded nucleic acid fragment comprises a first 5′ overhang and a second 5′ overhang, wherein the second double-stranded nucleic acid fragment comprises a third 5′ overhang and fourth 5′ overhang, wherein the third double-stranded nucleic acid fragment comprises a fifth 5′ overhang and a sixth 5′ overhang, wherein the at least fourth double-stranded nucleic acid fragment comprises a seventh 5′ overhang and an eighth 5′ overhang, wherein the second 5′ overhang and third 5′ overhang are complementary to each other, wherein the fourth 5′ overhang and the fifth 5′ overhang are complementary to each other, wherein the sixth 5′ overhang and the seventh 5′ overhang are complementary to each other, wherein at least one of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 3, and wherein the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ overhang comprise a different 4-mer sequence.
In some aspects of the preceding composition, at least two of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 3. In some aspects of the preceding compositions, at least three of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 3. In some aspects of the preceding compositions, at least four of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 3. In some aspects of the preceding compositions, each of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 3.
The present disclosure provides compositions comprising a first partially double-stranded nucleic acid molecule and an at least second partially double-stranded nucleic acid molecule, wherein the first partially double-stranded nucleic acid molecule comprises a first 5′ overhang and a second 5′ overhang, wherein the at least second partially double-stranded nucleic acid molecule comprises a third 5′ overhang and fourth 5′ overhang, wherein the second 5′ overhang and third 5′ overhang are complementary to each other, wherein at least one of the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprises one of the 4-mer sequences, or complement thereof, recited in Table 4, and wherein the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprise a different 4-mer sequence. In some aspects of the preceding compositions, at least two of the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprises one of the 4-mer sequences, or complement thereof, recited in Table 4. In some aspects of the preceding compositions, each of the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprises one of the 4-mer sequences, or complement thereof, recited in Table 4.
The present disclosure provides compositions comprising a first double-stranded nucleic acid fragment, a second double-stranded nucleic acid fragment, a third double-stranded nucleic acid fragment and an at least fourth double-stranded nucleic acid fragment, wherein the first double-stranded nucleic acid fragment comprises a first 5′ overhang and a second 5′ overhang, wherein the second double-stranded nucleic acid fragment comprises a third 5′ overhang and fourth 5′ overhang, wherein the third double-stranded nucleic acid fragment comprises a fifth 5′ overhang and a sixth 5′ overhang, wherein the at least fourth double-stranded nucleic acid fragment comprises a seventh 5′ overhang and an eighth 5′ overhang, wherein the second 5′ overhang and third 5′ overhang are complementary to each other, wherein the fourth 5′ overhang and the fifth 5′ overhang are complementary to each other, wherein the sixth 5′ overhang and the seventh 5′ overhang are complementary to each other, wherein at least one of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 4, and wherein the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ overhang comprise a different 4-mer sequence.
In some aspects of the preceding compositions, at least two of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 4. In some aspects of the preceding compositions, at least three of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 4. In some aspects of the preceding compositions, at least four of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 4. In some aspects of the preceding compositions, each of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 4.
The present disclosure provides compositions comprising a first partially double-stranded nucleic acid molecule and an at least second partially double-stranded nucleic acid molecule, wherein the first partially double-stranded nucleic acid molecule comprises a first 5′ overhang and a second 5′ overhang, wherein the at least second partially double-stranded nucleic acid molecule comprises a third 5′ overhang and fourth 5′ overhang, wherein the second 5′ overhang and third 5′ overhang are complementary to each other, wherein at least one of the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprises one of the 4-mer sequences, or complement thereof, recited in Table 5, and wherein the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprise a different 4-mer sequence. In some aspects of the preceding compositions, at least two of the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprises one of the 4-mer sequences, or complement thereof, recited in Table 5. In some aspects of the preceding compositions, each of the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprises one of the 4-mer sequences, or complement thereof, recited in Table 5.
The present disclosure provides compositions comprising a first double-stranded nucleic acid fragment, a second double-stranded nucleic acid fragment, a third double-stranded nucleic acid fragment and an at least fourth double-stranded nucleic acid fragment, wherein the first double-stranded nucleic acid fragment comprises a first 5′ overhang and a second 5′ overhang, wherein the second double-stranded nucleic acid fragment comprises a third 5′ overhang and fourth 5′ overhang, wherein the third double-stranded nucleic acid fragment comprises a fifth 5′ overhang and a sixth 5′ overhang, wherein the at least fourth double-stranded nucleic acid fragment comprises a seventh 5′ overhang and an eighth 5′ overhang, wherein the second 5′ overhang and third 5′ overhang are complementary to each other, wherein the fourth 5′ overhang and the fifth 5′ overhang are complementary to each other, wherein the sixth 5′ overhang and the seventh 5′ overhang are complementary to each other, wherein at least one of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 5, and wherein the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ overhang comprise a different 4-mer sequence.
In some aspects of the preceding compositions, at least two of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 5. In some aspects of the preceding compositions, at least three of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 5. In some aspects of the preceding compositions, at least four of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 5. In some aspects of the preceding compositions, each of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 5.
The present disclosure provides compositions comprising a first partially double-stranded nucleic acid molecule and an at least second partially double-stranded nucleic acid molecule, wherein the first partially double-stranded nucleic acid molecule comprises a first 5′ overhang and a second 5′ overhang, wherein the at least second partially double-stranded nucleic acid molecule comprises a third 5′ overhang and fourth 5′ overhang, wherein the second 5′ overhang and third 5′ overhang are complementary to each other, wherein at least one of the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprises one of the 4-mer sequences, or complement thereof, recited in Table 3 and Table 4, and wherein the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprise a different 4-mer sequence. In some aspects of the preceding compositions, at least two of the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprises one of the 4-mer sequences, or complement thereof, recited in Table 3 and Table 4. In some aspects of the preceding compositions, each of the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprises one of the 4-mer sequences, or complement thereof, recited in Table 3 and Table 4.
The present disclosure provides compositions comprising a first double-stranded nucleic acid fragment, a second double-stranded nucleic acid fragment, a third double-stranded nucleic acid fragment and an at least fourth double-stranded nucleic acid fragment, wherein the first double-stranded nucleic acid fragment comprises a first 5′ overhang and a second 5′ overhang, wherein the second double-stranded nucleic acid fragment comprises a third 5′ overhang and fourth 5′ overhang, wherein the third double-stranded nucleic acid fragment comprises a fifth 5′ overhang and a sixth 5′ overhang, wherein the at least fourth double-stranded nucleic acid fragment comprises a seventh 5′ overhang and an eighth 5′ overhang, wherein the second 5′ overhang and third 5′ overhang are complementary to each other, wherein the fourth 5′ overhang and the fifth 5′ overhang are complementary to each other, wherein the sixth 5′ overhang and the seventh 5′ overhang are complementary to each other, wherein at least one of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 3 and Table 4, and wherein the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ overhang comprise a different 4-mer sequence.
In some aspects of the preceding compositions, at least two of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 3 and Table 4. In some aspects of the preceding compositions, at least three of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 3 and Table 4. In some aspects of the preceding compositions, at least four of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 3 and Table 4. In some aspects of the preceding compositions, each of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 3 and Table 4.
The present disclosure provides compositions comprising a first partially double-stranded nucleic acid molecule and an at least second partially double-stranded nucleic acid molecule, wherein the first partially double-stranded nucleic acid molecule comprises a first 5′ overhang and a second 5′ overhang, wherein the at least second partially double-stranded nucleic acid molecule comprises a third 5′ overhang and fourth 5′ overhang, wherein the second 5′ overhang and third 5′ overhang are complementary to each other, wherein at least one of the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprises one of the 4-mer sequences, or complement thereof, recited in Table 3 and Table 5, and wherein the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprise a different 4-mer sequence. In some aspects of the preceding compositions, at least two of the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprises one of the 4-mer sequences, or complement thereof, recited in Table 3 and Table 5. In some aspects of the preceding compositions, each of the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprises one of the 4-mer sequences, or complement thereof, recited in Table 3 and Table 5.
The present disclosure provides compositions comprising a first double-stranded nucleic acid fragment, a second double-stranded nucleic acid fragment, a third double-stranded nucleic acid fragment and an at least fourth double-stranded nucleic acid fragment, wherein the first double-stranded nucleic acid fragment comprises a first 5′ overhang and a second 5′ overhang, wherein the second double-stranded nucleic acid fragment comprises a third 5′ overhang and fourth 5′ overhang, wherein the third double-stranded nucleic acid fragment comprises a fifth 5′ overhang and a sixth 5′ overhang, wherein the at least fourth double-stranded nucleic acid fragment comprises a seventh 5′ overhang and an eighth 5′ overhang, wherein the second 5′ overhang and third 5′ overhang are complementary to each other, wherein the fourth 5′ overhang and the fifth 5′ overhang are complementary to each other, wherein the sixth 5′ overhang and the seventh 5′ overhang are complementary to each other, wherein at least one of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 3 and Table 5, and wherein the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ overhang comprise a different 4-mer sequence.
In some aspects of the preceding compositions, at least two of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 3 and Table 5. In some aspects of the preceding compositions, at least three of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 3 and Table 5. In some aspects of the preceding compositions, at least four of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 3 and Table 5. In some aspects of the preceding compositions, each of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 3 and Table 5.
The present disclosure provides compositions comprising a first partially double-stranded nucleic acid molecule and an at least second partially double-stranded nucleic acid molecule, wherein the first partially double-stranded nucleic acid molecule comprises a first 5′ overhang and a second 5′ overhang, wherein the at least second partially double-stranded nucleic acid molecule comprises a third 5′ overhang and fourth 5′ overhang, wherein the second 5′ overhang and third 5′ overhang are complementary to each other, wherein at least one of the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprises one of the 4-mer sequences, or complement thereof, recited in Table 4 and Table 5, and wherein the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprise a different 4-mer sequence. In some aspects of the preceding compositions, at least two of the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprises one of the 4-mer sequences, or complement thereof, recited in Table 4 and Table 5. In some aspects of the preceding compositions, each of the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprises one of the 4-mer sequences, or complement thereof, recited in Table 4 and Table 5.
The present disclosure provides compositions comprising a first double-stranded nucleic acid fragment, a second double-stranded nucleic acid fragment, a third double-stranded nucleic acid fragment and an at least fourth double-stranded nucleic acid fragment, wherein the first double-stranded nucleic acid fragment comprises a first 5′ overhang and a second 5′ overhang, wherein the second double-stranded nucleic acid fragment comprises a third 5′ overhang and fourth 5′ overhang, wherein the third double-stranded nucleic acid fragment comprises a fifth 5′ overhang and a sixth 5′ overhang, wherein the at least fourth double-stranded nucleic acid fragment comprises a seventh 5′ overhang and an eighth 5′ overhang, wherein the second 5′ overhang and third 5′ overhang are complementary to each other, wherein the fourth 5′ overhang and the fifth 5′ overhang are complementary to each other, wherein the sixth 5′ overhang and the seventh 5′ overhang are complementary to each other, wherein at least one of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 4 and Table 5, and wherein the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ overhang comprise a different 4-mer sequence.
In some aspects of the preceding compositions, at least two of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 4 and Table 5. In some aspects of the preceding compositions, at least three of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 4 and Table 5. In some aspects of the preceding compositions, at least four of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 4 and Table 5. In some aspects of the preceding compositions, each of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 4 and Table 5.
The present disclosure provides compositions comprising a first partially double-stranded nucleic acid molecule and an at least second partially double-stranded nucleic acid molecule, wherein the first partially double-stranded nucleic acid molecule comprises a first 5′ overhang and a second 5′ overhang, wherein the at least second partially double-stranded nucleic acid molecule comprises a third 5′ overhang and fourth 5′ overhang, wherein the second 5′ overhang and third 5′ overhang are complementary to each other, wherein at least one of the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprises one of the 4-mer sequences, or complement thereof, recited in Table 3, Table 4 and Table 5, and wherein the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprise a different 4-mer sequence. In some aspects of the preceding compositions, at least two of the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprises one of the 4-mer sequences, or complement thereof, recited in Table 3, Table 4 and Table 5. In some aspects of the preceding compositions, each of the first 5′ overhang, the third 5′ overhang and the fourth 5′ overhang comprises one of the 4-mer sequences, or complement thereof, recited in Table 3, Table 4 and Table 5.
The present disclosure provides compositions comprising a first double-stranded nucleic acid fragment, a second double-stranded nucleic acid fragment, a third double-stranded nucleic acid fragment and an at least fourth double-stranded nucleic acid fragment, wherein the first double-stranded nucleic acid fragment comprises a first 5′ overhang and a second 5′ overhang, wherein the second double-stranded nucleic acid fragment comprises a third 5′ overhang and fourth 5′ overhang, wherein the third double-stranded nucleic acid fragment comprises a fifth 5′ overhang and a sixth 5′ overhang, wherein the at least fourth double-stranded nucleic acid fragment comprises a seventh 5′ overhang and an eighth 5′ overhang, wherein the second 5′ overhang and third 5′ overhang are complementary to each other, wherein the fourth 5′ overhang and the fifth 5′ overhang are complementary to each other, wherein the sixth 5′ overhang and the seventh 5′ overhang are complementary to each other, wherein at least one of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 3, Table 4 and Table 5, and wherein the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ overhang comprise a different 4-mer sequence.
In some aspects of the preceding compositions, at least two of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 3, Table 4 and Table 5. In some aspects of the preceding compositions, at least three of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 3, Table 4 and Table 5. In some aspects of the preceding compositions, at least four of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 3, Table 4 and Table 5. In some aspects of the preceding compositions, each of the first 5′ overhang, the third 5′ overhang, the fifth 5′ overhang, the seventh 5′ overhang and the eighth 5′ comprises one of the 4-mer sequences, or complement thereof, recited in Table 3, Table 4 and Table 5.
In some aspects of the present disclosure, a partially double-stranded nucleic acid molecule can comprise DNA, RNA, XNA or any combination of DNA, RNA and XNA. As used herein, the term “XNA” is used to refer to xeno nucleic acids. As would be appreciated by the skilled artisan, xeno nucleic acids are synthetic nucleic acid analogues comprising a different sugar backbone than the natural nucleic acids DNA and RNA. XNAs can include, but are not limited to, 1,5-anhydrohexitol nucleic acid (HNA), Cyclohexene nucleic acid (CeNA), Threose nucleic acid (TNA), Glycol nucleic acid (GNA), Locked nucleic acid (LNA), Peptide nucleic acid (PNA) and FANA (Fluoro Arabino nucleic acid).
In some aspects, a partially double-stranded nucleic acid molecule can comprise at least one modified nucleic acid. In some aspects, a modified nucleic acid can comprise methylated cytidine. In some aspects, a modified nucleic acid can comprise 5mC (5-methylcytosine), 5hmC (5-hydromethylcytosine), 5fC (5-formylcytosine), 3 mA (3-methyladenine), 5-fU (5-formyluridine), 5-hmU (5-hydroxymethyluridine), 5-hoU (5-hydroxyuridine), 7mG (7-methylguanine), 8oxoG (8-oxo-7,8-dihydroguanine), AP (apurinic/apyrimidinic sites), CPDs (Cyclobutane pyrimidine dimers), dI (deoxyinosine), dR5P (deoxyribose 5′-phosphate), dU (deoxyuridine), dX (deoxyxanthosine), PA (3′-phospho-α,β-unsaturated aldehyde), rN (ribonucleotides), Tg (Thymine Glycol), TT (TT dimer) and/or Mismatches including AP:A (apurinic/apyrimidinic site base paired with adenine), DHT:A (5,6-dihydrothymine base paired with an adenine), 5-hmU:A (5-hydroxymethyluracil base paired with an adenine), 5-hmU:G (5-hydroxymethyluracil base paired with a guanine), I:T (inosine base paired with a thymine), 6-MeA:T (6-methyladenine base paired with a thymine), 8-OG:C (8-oxoguanine base paired with a cytosine), 8-OG:G (8-oxoguanine base paired with a guanine), U:A (uridine base paired with an adenine) or U:G (uridine base paired with a guanine) or any combination thereof.
In some aspects, a partially double-stranded nucleic acid molecule can comprise at least one non-hybridized sequence, at least one non-symmetrical element, at least one hairpin, at least one G-quadruplex, at least one I-motif, at least one hemi-modified site, at least on CpG or any combination thereof. The at least one non-hybridized sequence, at least one non-symmetrical element, at least one hairpin, at least one G-quadruplex, at least one I-motif, at least one hemi-modified site, at least on CpG or any combination thereof can be used to introduce at least one or at least two unique molecular identifier (UMI) regions.
In some aspects of the compositions of the present disclosure, a partially double-stranded nucleic acid molecule can be attached to at least one solid support. In some aspects of the compositions of the present disclosure, at least one partially double-stranded nucleic acid in the plurality is attached to at least one bead.
In some aspects of the compositions of the present disclosure, a partially double-stranded nucleic acid molecule can comprise at least one hairpin sequence. A hairpin sequence can comprise at least one deoxyuridine base. A hairpin sequence can comprise at least one restriction endonuclease site. The restriction endonuclease site can be a Type II S restriction endonuclease site.
The present disclosure provides compositions comprising a plurality of partially double-stranded nucleic acid molecules, wherein the plurality comprises at least two distinct species of partially double-stranded nucleic acid molecules, wherein the partially double-stranded nucleic acid molecules comprise a first 5′ overhang and a second 5′ overhang, wherein the first 5′ overhang of one species of partially double-stranded nucleic acid molecules is complementary to only one other 5′ overhang present in the plurality of double-stranded nucleic acid molecules, and wherein the other 5′ overhang is present on a different species of partially double-stranded nucleic acid molecules, and wherein no 5′ overhang in the plurality of partially double-stranded nucleic acid molecules is self-complementary.
The present disclosure provides compositions comprising a plurality of partially double-stranded nucleic acid molecules, wherein the plurality comprises at least two distinct species of partially double-stranded nucleic acid molecules, wherein the partially double-stranded nucleic acid molecules comprise a first 3′ overhang and a second 3′ overhang, wherein the first 3′ overhang of one species of partially double-stranded nucleic acid molecules is complementary to only one other 3′ overhang present in the plurality of double-stranded nucleic acid molecules, and wherein the other 3′ overhang is present on a different species of partially double-stranded nucleic acid molecules, and wherein no 3′ overhang in the plurality of partially double-stranded nucleic acid molecules is self-complementary.
In some aspects of the compositions of the present disclosure, at least one partially double-stranded nucleic acid molecule in the plurality does not comprise a first 5′ overhang and instead comprises a blunt end and the second 5′ overhang.
In some aspects of the compositions of the present disclosure, at least one partially double-stranded nucleic acid molecule in the plurality does not comprise a second 5′ overhang and instead comprises a blunt end and the first 5′ overhang.
In some aspects of the compositions of the present disclosure, at least one partially double-stranded nucleic acid molecule in the plurality does not comprise a first 3′ overhang and instead comprises a blunt end and the second 3′ overhang.
In some aspects of the compositions of the present disclosure, at least one partially double-stranded nucleic acid molecule in the plurality does not comprise a second 3′ overhang and instead comprises a blunt end and the first 3′ overhang.
In some aspects of the compositions of the present disclosure, at least one partially double-stranded nucleic acid in the plurality can comprise at least one modified nucleic acid. The at least one modified nucleic acid can comprise methylated cytidine. The at least one modified nucleic acid comprises 5mC (5-methylcytosine), 5hmC (5-hydromethylcytosine), 5fC (5-formylcytosine), 3 mA (3-methyladenine), 5-fU (5-formyluridine), 5-hmU (5-hydroxymethyluridine), 5-hoU (5-hydroxyuridine), 7mG (7-methylguanine), 8oxoG (8-oxo-7,8-dihydroguanine), AP (apurinic/apyrimidinic sites), CPDs (Cyclobutane pyrimidine dimers), dI (deoxyinosine), dRSP (deoxyribose 5′-phosphate), dU (deoxyuridine), dX (deoxyxanthosine), PA (3′-phospho-α,β-unsaturated aldehyde), rN (ribonucleotides), Tg (Thymine Glycol), TT (TT dimer) and/or Mismatches including AP:A (apurinic/apyrimidinic site base paired with adenine), DHT:A (5,6-dihydrothymine base paired with an adenine), 5-hmU:A (5-hydroxymethyluracil base paired with an adenine), 5-hmU:G (5-hydroxymethyluracil base paired with a guanine), I:T (inosine base paired with a thymine), 6-MeA:T (6-methyladenine base paired with a thymine), 8-OG:C (8-oxoguanine base paired with a cytosine), 8-OG:G (8-oxoguanine base paired with a guanine), U:A (uridine base paired with an adenine) or U:G (uridine base paired with a guanine) or any combination thereof.
In some aspects of the methods of the present disclosure, at least one partially double-stranded nucleic acid in the plurality can comprise at least one non-hybridized sequence, at least one non-symmetrical element, at least one hairpin, at least one G-quadruplex, at least one I-motif, at least one hemi-modified site, at least on CpG or any combination thereof. The at least one non-hybridized sequence, at least one non-symmetrical element, at least one hairpin, at least one G-quadruplex, at least one I-motif, at least one hemi-modified site, at least on CpG or any combination thereof can be used to introduce at least one or at least two unique molecular identifier (UMI) regions.
In some aspects of the compositions of the present disclosure, at least one partially double-stranded nucleic acid in the plurality can comprise at least one nucleotide substitution, deletion, insertion or any combination thereof that causes at least one amino acid codon variation, deletion, insertion or any combination thereof as compared to a wildtype or reference sequence
In some aspects of the compositions of the present disclosure, at least one partially double-stranded nucleic acid in the plurality can be attached to at least one solid support. In some aspects of the compositions of the present disclosure, at least one partially double-stranded nucleic acid in the plurality is attached to at least one bead.
In some aspects of the compositions of the present disclosure, at least one partially double-stranded nucleic acid in the plurality can comprise at least one hairpin sequence. A hairpin sequence can comprise at least one deoxyuridine base. A hairpin sequence can comprise at least one restriction endonuclease site. The restriction endonuclease site can be a Type II S restriction endonuclease site.
In some aspects of the compositions of the present disclosure, at least one partially double-stranded nucleic acid in the plurality can comprise RNA, DNA, XNA, at least one modified nucleic acid, at least one peptide or any combination thereof.
In some aspects of the compositions of the present disclosure, at least one partially double-stranded nucleic acid in the plurality can be obtained from any source.
In some aspects of the compositions of the present disclosure, at least one partially double-stranded nucleic acid in the plurality is obtained from at least one endonuclease digestion reaction of native DNA, at least one PCR reaction, at least one Recombinase Polymerase Amplification (RPA) reaction, at least one reverse transcription reaction, at least single-stranded geometric synthesis reaction or any combination thereof. In some aspects of the compositions of the present disclosure, at least one partially double-stranded nucleic acid in the plurality can be obtained from chemical synthesis of oligonucleotides.
In some aspects of any method or composition of the present disclosure, a 5′ overhang can comprise at least about 1, or at least about 2, or at least about 3, or at least about 4, or at least about 5, or at least about 6 or at least about 7, or at least about 8, or at least about 9, or at least about 10, or at least about 11, or at least about 12, or at least about 13, or at least about 14, or at least about 15, or at least about 16, or at least about 17, or at least about 18, or at least about 19, or at least about 20, or at least about 21, or at least about 22, or at least about 23, or at least about 24, or at least about 25, or at least about 26, or at least about 27, or at least about 28, or at least about 29, or at least about 30, or at least about 35, or at least about 40, or at least about 45, or at least about 50 nucleotides.
In some aspects of any method or composition of the present disclosure, a 5′ overhang can consist of at least about 1, or at least about 2, or at least about 3, or at least about 4, or at least about 5, or at least about 6 or at least about 7, or at least about 8, or at least about 9, or at least about 10, or at least about 11, or at least about 12, or at least about 13, or at least about 14, or at least about 15, or at least about 16, or at least about 17, or at least about 18, or at least about 19, or at least about 20, or at least about 21, or at least about 22, or at least about 23, or at least about 24, or at least about 25, or at least about 26, or at least about 27, or at least about 28, or at least about 29, or at least about 30, or at least about 35, or at least about 40, or at least about 45, or at least about 50 nucleotides.
In some aspects of any method or composition of the present disclosure, a 3′ overhang can comprise at least about 1, or at least about 2, or at least about 3, or at least about 4, or at least about 5, or at least about 6 or at least about 7, or at least about 8, or at least about 9, or at least about 10, or at least about 11, or at least about 12, or at least about 13, or at least about 14, or at least about 15, or at least about 16, or at least about 17, or at least about 18, or at least about 19, or at least about 20, or at least about 21, or at least about 22, or at least about 23, or at least about 24, or at least about 25, or at least about 26, or at least about 27, or at least about 28, or at least about 29, or at least about 30, or at least about 35, or at least about 40, or at least about 45, or at least about 50 nucleotides.
In some aspects of any method or composition of the present disclosure, a 3′ overhang can consist of at least about 1, or at least about 2, or at least about 3, or at least about 4, or at least about 5, or at least about 6 or at least about 7, or at least about 8, or at least about 9, or at least about 10, or at least about 11, or at least about 12, or at least about 13, or at least about 14, or at least about 15, or at least about 16, or at least about 17, or at least about 18, or at least about 19, or at least about 20, or at least about 21, or at least about 22, or at least about 23, or at least about 24, or at least about 25, or at least about 26, or at least about 27, or at least about 28, or at least about 29, or at least about 30, or at least about 35, or at least about 40, or at least about 45, or at least about 50 nucleotides.
Methods of the Present Disclosure
The methods of the present disclosure can comprise the use of any of the compositions described herein. As used herein in the methods of the present disclosure, a double-stranded nuclei acid fragment or a double-stranded nucleic acid molecule can be a partially double-stranded nucleic acid fragment or a partially double-stranded nucleic acid molecule.
The present disclosure provides methods of synthesizing a target double-stranded nucleic acid molecule comprising a target nucleic acid sequence, the methods comprising a) determining an assembly map of the desired double-stranded nucleic acid molecule, wherein the assembly map divides the target double-stranded nucleic acid molecule into a plurality of double-stranded nucleic acid fragments, wherein the double-stranded nucleic acid fragments comprise at least two 5′ overhangs, wherein nucleic acid fragments that are adjacent within the target nucleic acid sequence comprise 5′ overhangs that are complementary, wherein the 5′ overhangs of at least one pair of nucleic acid fragments that are adjacent within the target nucleic acid sequence each comprise one of the 4-mer sequences, or complement thereof, of a 4-mer triplet, wherein the 4-mer triplet comprises three 4-mer sequences, which yield a single fragment with at least 90% purity upon ligation of the at least one pair of adjacent nucleic acid fragments; b) providing the double-stranded nucleic acid fragments determined in step (a); c) hybridizing a first pair of double-stranded nucleic acid fragments that are adjacent within the target nucleic acid via their complementary 5′ overhangs; d) ligating the hybridized nucleic acid fragments from step (c) to form a double-stranded nucleic acid fragment; e) hybridizing a second pair of double-stranded nucleic acid fragments that are adjacent within the target nucleic acid via their complementary 5′ overhangs; f) ligating the hybridized nucleic acid fragments from step (e) to form a double-stranded nucleic acid fragment, such that the double-stranded nucleic acid fragment is adjacent within the target nucleic acid sequence to the double-stranded nucleic acid formed in step (d); g) repeating steps (c)-(f) using the ligation products such that the target double-stranded nucleic acid molecule is synthesized.
The present disclosure provides methods of synthesizing a target double-stranded nucleic acid molecule comprising a target nucleic acid sequence, the methods comprising: a) determining an assembly map of the desired double-stranded nucleic acid molecule, wherein the assembly map divides the target double-stranded nucleic acid molecule into a plurality of double-stranded nucleic acid fragments, wherein the double-stranded nucleic acid fragments comprise at least two 5′ overhangs, wherein nucleic acid fragments that are adjacent within the target nucleic acid sequence comprise 5′ overhangs that are complementary, wherein the 5′ overhangs of at least one pair of nucleic acid fragments that are adjacent within the target nucleic acid sequence each comprise one of the 4-mer sequences, or complement thereof, of a 4-mer triplet, wherein the 4-mer triplet comprises three 4-mer sequences, which yield only one fragment upon ligation of the at least one pair of adjacent nucleic acid fragments; b) providing the double-stranded nucleic acid fragments determined in step (a); c) hybridizing a first pair of double-stranded nucleic acid fragments that are adjacent within the target nucleic acid via their complementary 5′ overhangs; d) ligating the hybridized nucleic acid fragments from step (c) to form a double-stranded nucleic acid fragment; e) hybridizing a second pair of double-stranded nucleic acid fragments that are adjacent within the target nucleic acid via their complementary 5′ overhangs; f) ligating the hybridized nucleic acid fragments from step (e) to form a double-stranded nucleic acid fragment, such that the double-stranded nucleic acid fragment is adjacent within the target nucleic acid sequence to the double-stranded nucleic acid formed in step (d); g) repeating steps (c)-(f) using the ligation products such that the target double-stranded nucleic acid molecule is synthesized. In some aspects, the 4-mer triplet can selected from the 4-mer triplets recited in Table 1.
The present disclosure provides methods of synthesizing a target double-stranded nucleic acid molecule comprising a target nucleic acid sequence, the methods comprising: a) determining an assembly map of the desired double-stranded nucleic acid molecule, wherein the assembly map divides the target double-stranded nucleic acid molecule into a plurality of double-stranded nucleic acid fragments, wherein the double-stranded nucleic acid fragments comprise at least two 5′ overhangs, wherein nucleic acid fragments that are adjacent within the target nucleic acid sequence comprise 5′ overhangs that are complementary, wherein the 5′ overhangs of at least one pair of nucleic acid fragments that are adjacent within the target nucleic acid sequence each comprise one of the 4-mer sequences, or complement thereof, of a 4-mer triplet, wherein the 4-mer triplet is selected from the 4-mer triplets recited in Table 1; b) providing the double-stranded nucleic acid fragments determined in step (a); c) hybridizing a first pair of double-stranded nucleic acid fragments that are adjacent within the target nucleic acid via their complementary 5′ overhangs; d) ligating the hybridized nucleic acid fragments from step (c) to form a double-stranded nucleic acid fragment; e) hybridizing a second pair of double-stranded nucleic acid fragments that are adjacent within the target nucleic acid via their complementary 5′ overhangs; f) ligating the hybridized nucleic acid fragments from step (e) to form a double-stranded nucleic acid fragment, such that the double-stranded nucleic acid fragment is adjacent within the target nucleic acid sequence to the double-stranded nucleic acid formed in step (d); and g) repeating steps (c)-(f) using the ligation products such that the target double-stranded nucleic acid molecule is synthesized.
The present disclosure provides methods of synthesizing a target double-stranded nucleic acid molecule comprising a target nucleic acid sequence, the methods comprising: a) determining an assembly map of the desired double-stranded nucleic acid molecule, wherein the assembly map divides the target double-stranded nucleic acid molecule into a plurality of double-stranded nucleic acid fragments, wherein the double-stranded nucleic acid fragments comprise at least two 5′ overhangs, wherein nucleic acid fragments that are adjacent within the target nucleic acid sequence comprise 5′ overhangs that are complementary, wherein at least one 5′ overhang of at least one pair of nucleic acid fragments that are adjacent within the target nucleic acid sequence comprises at least one 4-mer, or complement thereof, recited in Table 3, Table 4, Table 5 or any combination thereof; b) providing the double-stranded nucleic acid fragments determined in step (a); c) hybridizing a first pair of double-stranded nucleic acid fragments that are adjacent within the target nucleic acid via their complementary 5′ overhangs; d) ligating the hybridized nucleic acid fragments from step (c) to form a double-stranded nucleic acid fragment; e) hybridizing a second pair of double-stranded nucleic acid fragments that are adjacent within the target nucleic acid via their complementary 5′ overhangs; f) ligating the hybridized nucleic acid fragments from step (e) to form a double-stranded nucleic acid fragment, such that the double-stranded nucleic acid fragment is adjacent within the target nucleic acid sequence to the double-stranded nucleic acid formed in step (d); g) repeating steps (c)-(f) using the ligation products such that the target double-stranded nucleic acid molecule is synthesized.
The present disclosure provides methods of synthesizing a target double-stranded nucleic acid molecule comprising a target nucleic acid sequence, the methods comprising: a) determining an assembly map of the desired double-stranded nucleic acid molecule, wherein the assembly map divides the target double-stranded nucleic acid molecule into a plurality of double-stranded nucleic acid fragments, wherein the double-stranded nucleic acid fragments comprise at least two 5′ overhangs, wherein nucleic acid fragments that are adjacent within the target nucleic acid sequence comprise 5′ overhangs that are complementary, wherein the 5′ overhangs of at least one set of four nucleic acid fragments that are adjacent within the target nucleic acid sequence each comprise one of the 4-mer sequences, or complement thereof, of a 4-mer quintuplet, wherein the 4-mer quintuplet comprises five 4-mer sequences, which yield a single fragment with at least 90% purity upon ligation of the at least one set of four nucleic acid fragments; b) providing the double-stranded nucleic acid fragments determined in step (a); c) hybridizing a first pair of double-stranded nucleic acid fragments that are adjacent within the target nucleic acid via their complementary 5′ overhangs; d) ligating the hybridized nucleic acid fragments from step (c) to form a double-stranded nucleic acid fragment; e) hybridizing a second pair of double-stranded nucleic acid fragments that are adjacent within the target nucleic acid via their complementary 5′ overhangs; f) ligating the hybridized nucleic acid fragments from step (e) to form a double-stranded nucleic acid fragment, such that the double-stranded nucleic acid fragment is adjacent within the target nucleic acid sequence to the double-stranded nucleic acid formed in step (d); and g) repeating steps (c)-(f) using the ligation products such that the target double-stranded nucleic acid molecule is synthesized.
The present disclosure provides methods of synthesizing a target double-stranded nucleic acid molecule comprising a target nucleic acid sequence, the methods comprising: a) determining an assembly map of the desired double-stranded nucleic acid molecule, wherein the assembly map divides the target double-stranded nucleic acid molecule into a plurality of double-stranded nucleic acid fragments, wherein the double-stranded nucleic acid fragments comprise at least two 5′ overhangs, wherein nucleic acid fragments that are adjacent within the target nucleic acid sequence comprise 5′ overhangs that are complementary, wherein the 5′ overhangs of at least one set of four nucleic acid fragments that are adjacent within the target nucleic acid sequence each comprise one of the 4-mer sequences, or complement thereof, of a 4-mer quintuplet, wherein the 4-mer quintuplet comprises five 4-mer sequences, which yield only one fragment upon ligation of the at least one set of four nucleic acid fragments; b) providing the double-stranded nucleic acid fragments determined in step (a); c) hybridizing a first pair of double-stranded nucleic acid fragments that are adjacent within the target nucleic acid via their complementary 5′ overhangs; d) ligating the hybridized nucleic acid fragments from step (c) to form a double-stranded nucleic acid fragment; e) hybridizing a second pair of double-stranded nucleic acid fragments that are adjacent within the target nucleic acid via their complementary 5′ overhangs; f) ligating the hybridized nucleic acid fragments from step (e) to form a double-stranded nucleic acid fragment, such that the double-stranded nucleic acid fragment is adjacent within the target nucleic acid sequence to the double-stranded nucleic acid formed in step (d); and g) repeating steps (c)-(f) using the ligation products such that the target double-stranded nucleic acid molecule is synthesized. In some aspects, the 4-mer quintuplet can be selected from the 4-mer quintuplets recited in Table 2.
The present disclosure provides methods of synthesizing a target double-stranded nucleic acid molecule comprising a target nucleic acid sequence, the methods comprising: a) determining an assembly map of the desired double-stranded nucleic acid molecule, wherein the assembly map divides the target double-stranded nucleic acid molecule into a plurality of double-stranded nucleic acid fragments, wherein the double-stranded nucleic acid fragments comprise at least two 5′ overhangs, wherein nucleic acid fragments that are adjacent within the target nucleic acid sequence comprise 5′ overhangs that are complementary, wherein the 5′ overhangs of at least one set of four nucleic acid fragments that are adjacent within the target nucleic acid sequence each comprise one of the 4-mer sequences, or complement thereof, of a 4-mer quintuplet, wherein the 4-mer quintuplet is selected from the 4-mer quintuplets recited in Table 2; b) providing the double-stranded nucleic acid fragments determined in step (a); c) hybridizing a first pair of double-stranded nucleic acid fragments that are adjacent within the target nucleic acid via their complementary 5′ overhangs; d) ligating the hybridized nucleic acid fragments from step (c) to form a double-stranded nucleic acid fragment; e) hybridizing a second pair of double-stranded nucleic acid fragments that are adjacent within the target nucleic acid via their complementary 5′ overhangs; f) ligating the hybridized nucleic acid fragments from step (e) to form a double-stranded nucleic acid fragment, such that the double-stranded nucleic acid fragment is adjacent within the target nucleic acid sequence to the double-stranded nucleic acid formed in step (d); and g) repeating steps (c)-(f) using the ligation products such that the target double-stranded nucleic acid molecule is synthesized.
The present disclosure provides methods of synthesizing a target double-stranded nucleic acid molecule comprising a target nucleic acid sequence, the methods comprising: a) determining an assembly map of the desired double-stranded nucleic acid molecule, wherein the assembly map divides the target double-stranded nucleic acid molecule into a plurality of double-stranded nucleic acid fragments, wherein the double-stranded nucleic acid fragments comprise at least two 5′ overhangs, wherein nucleic acid fragments that are adjacent within the target nucleic acid sequence comprise 5′ overhangs that are complementary, wherein at least one 5′ overhang comprises at least one 4-mer, or complement thereof, recited in Table 3, Table 4, Table 5 or any combination thereof; b) providing the double-stranded nucleic acid fragments determined in step (a); c) hybridizing a first pair of double-stranded nucleic acid fragments that are adjacent within the target nucleic acid via their complementary 5′ overhangs; d) ligating the hybridized nucleic acid fragments from step (c) to form a double-stranded nucleic acid fragment; e) hybridizing a second pair of double-stranded nucleic acid fragments that are adjacent within the target nucleic acid via their complementary 5′ overhangs; f) ligating the hybridized nucleic acid fragments from step (e) to form a double-stranded nucleic acid fragment, such that the double-stranded nucleic acid fragment is adjacent within the target nucleic acid sequence to the double-stranded nucleic acid formed in step (d); and g) repeating steps (c)-(f) using the ligation products such that the target double-stranded nucleic acid molecule is synthesized. In some aspects, the 4-mer quintuplet can be selected from the 4-mer quintuplets recited in Table 2.
In some aspects of the methods of the present disclosure, the assembly map divides the target double-stranded nucleic acid molecule into at least 4 double-stranded nucleic acid fragments. In some aspects of the methods of the present disclosure, the assembly map divides the target double-stranded nucleic acid molecule into at least about 10, or at least about 20, or at least about 30, or at least about 40, or at least about 50, or at least about 60, or at least about 70, or at least about 80, or at least about 90, or at least about 100, or at least about 110, or at least about 120, or at least about 130, or at least about 140, or at least about 150, or at least about 160, or at least about 170, or at least about 180, or at least about 200, or at least about 225, or at least about 250, or at least about 275, or at least about 300 double-stranded nucleic acid fragments.
In some aspects, the target double-stranded nucleic acid is at least about 100, or at least about 200, or at least about 300, or at least about 400, or at least about 500, or at least about 600, or at least about 700, or at least about 800, or at least about 900, or at least about 1000, or at least about 1100, or at least about 1200, or at least about 1300, or at least about 1400, or at least about 1500, or at least about 1600, or at least about 1700, or at least about 1800, or at least about 1900, or at least about 2000, or at least about 2100, or at least about 2200, or at least about 2300, or at least about 2400, or at least about 2500, or at least about 2600, or at least about 2700, or at least about 2800, or at least about 2900, or at least about 3000, or at least about 3500, or at least about 4000, or at least about 5000, or at least about 6000, or at least about 7000, or at least about 8000, or at least about 9000, or at least about 10000 nucleotides (base pairs) in length.
In some aspects, the target double-stranded nucleic acid molecule can comprise at least one homopolymeric sequence. As used herein, the term homopolymeric sequence is used to refer to any type of repeating nucleic acid sequence, including, but not limited to, repeats of single nucleotides or repeats of small motifs. In some aspects, a homopolymeric sequence can be at least about 10 nucleotides, or at least about 20 nucleotides, or at least about 30 nucleotides, or at least about 40 nucleotides, or at least about 50 nucleotides, or at least about 60 nucleotides, or at least about 70 nucleotides, or at least about 80 nucleotides, or at least about 90 nucleotides, or at least about 100 nucleotides in length.
In some aspects, the target double-stranded nucleic acid molecule can have a GC content of at least about 50%.
In some aspects of the preceding methods, at least one of the double-stranded nucleic acid fragments that corresponds to at least on termini of the target double-stranded nucleic acid molecule can comprise a blunt end. As used herein, the term blunt end is used to refer to the end of a double-stranded nucleic acid molecule that does not have a single stranded overhang.
In some aspects of the preceding methods, at least one of the double-stranded nucleic acid fragments that corresponds to at least on termini of the target double-stranded nucleic acid molecule can comprise a hairpin sequence. In some aspects, the hairpin sequence can comprise at least one deoxyuridine base. In some aspects, the hairpin sequence can comprise at least one restriction endonuclease site.
In some aspects of the preceding methods, the method can further comprise after step (g): h) incubating the ligation products with at least one exonuclease. In aspects wherein a hairpin sequence comprises at least one deoxyuridine base, the method can further comprise after step (h): i) removing the at least one exonuclease; and j) incubating the products of the exonuclease incubation with at least one enzyme that cleaves a deoxyuridine base, thereby cleaving the hairpin sequence. In aspects wherein a hairpin sequence comprises at least one restriction endonuclease site, the method can further comprise after step (h): i) removing the at least one exonuclease; and j) incubating the products of the exonuclease incubation with at least one enzyme that cleaves the at least one restriction endonuclease site, thereby cleaving the hairpin sequence. In some aspects, an enzyme that cleaves a deoxyuridine base can be the USER (NEB) enzyme.
In some aspects of the methods of the present disclosure, ligation can comprise the use of a ligase. Any ligase known in the art may be used. Preferably, the ligase is T7 DNA ligase. Preferably, the ligase is HiFi Taq DNA Ligase.
In some aspects of the methods of the present disclosure, the synthesized target double-stranded nucleic acid molecule has a purity of at least about 80%, or at least about 85%, or at least about 90%, or at least about 95%, or at least about 99%.
In some aspects, the purity of a synthesized target double-stranded nucleic acid molecule refers to the percentage of the total ligation products that were formed as part of a single ligation reaction, or multiple rounds of ligation reactions, that correspond to the correct/desired ligation product. Without wishing to be bound by theory, the methods of the present disclosure comprising the ligation of nuclei acid molecules produce can produce plurality of ligation products, some of which correspond to the correct/desired ligation product, and some that are undesired (side-reactions, incorrect ligations, etc.). The purity of a ligation product, or a target molecule that is being synthesized, can be expressed as a percentage, which corresponds to the percentage of the total ligation products formed which correspond to the correct/desired ligation product.
The present disclosure provides methods of producing at least one target nucleic acid molecule, the methods comprising: (a) providing a first partially double-stranded nucleic acid molecule, wherein the first double-stranded nucleic acid molecule comprises a first 5′ overhang and a second 5′ overhang; (b) providing a second partially double-stranded nucleic acid molecule, wherein the second partially double-stranded nucleic acid molecule comprises a third 5′ overhang and a fourth 5′ overhang, wherein the second 5′ overhang is complementary to the third 5′ overhang; (c) hybridizing the second 5′ overhang and the third 5′ overhang; (d) ligating the first partially double-stranded nucleic acid molecule and the second partially double-stranded nucleic acid molecule to produce a first ligated fragment, wherein the first ligated fragment comprises the first 5′ overhang and the fourth 5′ overhang (e) providing a third partially double-stranded nucleic acid molecule, wherein the third double-stranded nucleic acid molecule comprises a fifth 5′ overhang and a sixth 5′ overhang, wherein the fourth 5′ overhang is complementary to the fifth 5′ overhang; (f) providing at least a fourth partially double-stranded nucleic acid molecule, wherein the at least fourth partially double-stranded nucleic acid molecule comprises a seventh 5′ overhang and a eighth 5′ overhang, wherein the sixth 5′ overhang is complementary to the seventh 5′ overhang; (g) hybridizing the sixth 5′ overhang and the seventh 5′ overhang; (h) ligating the third partially double-stranded nucleic acid molecule and the at least fourth partially double-stranded nucleic acid molecule to produce an at least second ligated fragment, wherein the at least second ligated fragment comprises the fifth 5′ overhang and the eighth 5′ overhang; (i) hybridizing the fourth 5′ overhang present in the first ligated fragment to the eighth 5′ overhang located in the at least second ligated fragment; and (j) ligating the first ligated fragment and at least second ligated fragment to produce an at least third ligated fragment, wherein the at least third ligated fragment comprises the first 5′ overhand and the eighth 5′ overhang.
In some aspects, the preceding methods can further comprise: (k) Providing an at least fifth partially double-stranded nucleic acid molecule, wherein the at least fifth partially double-stranded nucleic acid molecule comprises a ninth 5′ overhang and a tenth 5′ overhang, wherein the ninth 5′ overhang is complementary to the eighth 5′ overhang; (l) hybridizing the ninth 5′ overhang and the eighth 5′ overhang; and (m) ligating the at least third ligated fragment and the at least fifth partially double-stranded nucleic acid molecule to produce an at least fourth ligated fragment, wherein the at least fourth ligated fragment comprises the first 5′ overhang and the tenth 5′ overhang.
In some aspects the preceding methods can further comprise: (k) Providing a fifth partially double-stranded nucleic acid molecule, wherein the fifth partially double-stranded nucleic acid molecule comprises a ninth 5′ overhang and a tenth 5′ overhang, wherein the ninth 5′ overhang is complementary to the eighth 5′ overhang; (i) Providing an at least sixth partially double-stranded nucleic acid molecule, wherein the at least sixth partially double-stranded nucleic acid molecule comprises an eleventh 5′ overhand and a twelfth 5′ overhand; wherein the tenth 5′ overhang is complementary to the eleventh 5′ overhang; (m) hybridizing the tenth 5′ overhang and the eleventh 5′ overhang; (n) ligating the fifth partially double-stranded nucleic acid molecule and the at least sixth partially double-stranded nucleic acid molecule to produce a fourth ligated fragment, wherein the fourth ligated fragment comprises the ninth 5′ overhand and the twelfth 5′ overhang; (o) hybridizing the eighth 5′ overhand and the ninth 5′ overhang; and (p) ligating the at least third ligated fragment to the at least fourth ligated fragment to produce an at least at least fifth ligated fragment, wherein the at least fifth ligated fragment comprises the first 5′ overhang and the twelfth 5′ overhang.
In some aspects, the preceding methods can further comprise: (k) Providing a fifth partially double-stranded nucleic acid molecule, wherein the fifth partially double-stranded nucleic acid molecule comprises a ninth 5′ overhang and a tenth 5′ overhang, wherein the ninth 5′ overhang is complementary to the eighth 5′ overhang; (i) Providing a sixth partially double-stranded nucleic acid molecule, wherein the sixth partially double-stranded nucleic acid molecule comprises an eleventh 5′ overhand and a twelfth 5′ overhand, wherein the tenth 5′ overhang is complementary to the eleventh 5′ overhang; (m) hybridizing the tenth 5′ overhang and the eleventh 5′ overhang; (n) ligating the fifth partially double-stranded nucleic acid molecule and the sixth partially double-stranded nucleic acid molecule to produce a fourth ligated fragment, wherein the fourth ligated fragment comprises the ninth 5′ overhand and the twelfth 5′ overhang; (o) providing a seventh partially double-stranded nucleic acid molecule, wherein the seventh partially double-stranded nucleic acid molecule comprises a thirteenth 5′ overhang and a fourteenth 5′ overhang, wherein the thirteenth 5′ overhang is complementary to the twelfth 5′ overhang; (p) providing an at least eighth partially double-stranded nucleic acid molecule, wherein the at least eighth partially double-stranded nucleic acid molecule comprises a fifteenth 5′ overhang and sixteenth 5′ overhang, wherein the fourteenth 5′ overhang is complementary to the fifteenth 5′ overhang; (q) hybridizing the fourteenth 5′ overhang and the fifteenth 5′ overhang; (r) ligating the seventh partially double-stranded nucleic acid molecule and the at least eighth partially double-stranded nucleic acid molecule to produce an at least fifth ligated fragment, wherein the at least fifth ligated fragment comprises the thirteenth 5′ overhang and the sixteenth 5′ overhang; (s) hybridizing the twelfth 5′ overhang and the thirteenth 5′ overhang; (t) ligating the fourth ligated fragment and the at least fifth ligated fragment to produce an at least sixth ligated fragment, wherein the at least sixth ligated fragment comprises the ninth 5′ overhang and the sixteenth 5′ overhang; (u) hybridizing the eighth 5′ overhang to the ninth 5′ overhang; (v) ligating the at least sixth ligated fragment and the third ligated fragment to produce an at least seventh ligated fragment, wherein the at least seventh ligated fragment comprises the first 5′ overhang and the sixteenth 5′ overhang.
The present disclosure provides methods of producing at least one target nucleic acid molecule, the methods comprising: (a) providing a first partially double-stranded nucleic acid molecule, wherein the first double-stranded nucleic acid molecule comprises a first 3′ overhang and a second 3′ overhang; (b) providing a second partially double-stranded nucleic acid molecule, wherein the second partially double-stranded nucleic acid molecule comprises a third 3′ overhang and a fourth 3′ overhang, wherein the second 3′ overhang is complementary to the third 3′ overhang; (c) hybridizing the second 3′ overhang and the third 3′ overhang; (d) ligating the first partially double-stranded nucleic acid molecule and the second partially double-stranded nucleic acid molecule to produce a first ligated fragment, wherein the first ligated fragment comprises the first 3′ overhang and the fourth 3′ overhang; (e) providing a third partially double-stranded nucleic acid molecule, wherein the third double-stranded nucleic acid molecule comprises a fifth 3′ overhang and a sixth 3′ overhang, wherein the fourth 3′ overhang is complementary to the fifth 3′ overhang; (f) providing at least a fourth partially double-stranded nucleic acid molecule, wherein the at least fourth partially double-stranded nucleic acid molecule comprises a seventh 3′ overhang and a eighth 3′ overhang, wherein the sixth 3′ overhang is complementary to the seventh 3′ overhang, and (g) hybridizing the sixth 3′ overhang and the seventh 3′ overhang; (h) ligating the third partially double-stranded nucleic acid molecule and the at least fourth partially double-stranded nucleic acid molecule to produce an at least second ligated fragment, wherein the at least second ligated fragment comprises the fifth 3′ overhang and the eighth 3′ overhang; (i) hybridizing the fourth 3′ overhang present in the first ligated fragment to the eighth 3′ overhang located in the at least second ligated fragment; and (j) ligating the first ligated fragment and at least second ligated fragment to produce an at least third ligated fragment, wherein the at least third ligated fragment comprises the first 3′ overhand and the eighth 3′ overhang.
In some aspects the preceding methods can further comprise: (k) Providing an at least fifth partially double-stranded nucleic acid molecule, wherein the at least fifth partially double-stranded nucleic acid molecule comprises a ninth 3′ overhang and a tenth 3′ overhang, wherein the ninth 3′ overhang is complementary to the eighth 3′ overhang; (l) hybridizing the ninth 3′ overhang and the eighth 3′ overhang; and (m) ligating the at least third ligated fragment and the at least fifth partially double-stranded nucleic acid molecule to produce an at least fourth ligated fragment, wherein the at least fourth ligated fragment comprises the first 3′ overhang and the tenth 3′ overhang.
In some aspects, the preceding methods can further comprise: (k) Providing a fifth partially double-stranded nucleic acid molecule, wherein the fifth partially double-stranded nucleic acid molecule comprises a ninth 3′ overhang and a tenth 3′ overhang, wherein the ninth 3′ overhang is complementary to the eighth 3′ overhang; (i) Providing an at least sixth partially double-stranded nucleic acid molecule, wherein the at least sixth partially double-stranded nucleic acid molecule comprises an eleventh 3′ overhand and a twelfth 3′ overhand; wherein the tenth 3′ overhang is complementary to the eleventh 3′ overhang; (m) hybridizing the tenth 3′ overhang and the eleventh 3′ overhang; (n) ligating the fifth partially double-stranded nucleic acid molecule and the at least sixth partially double-stranded nucleic acid molecule to produce a fourth ligated fragment, wherein the fourth ligated fragment comprises the ninth 3′ overhand and the twelfth 3′ overhang; (o) hybridizing the eighth 3′ overhand and the ninth 3′ overhang; and (p) ligating the at least third ligated fragment to the at least fourth ligated fragment to produce an at least at least fifth ligated fragment, wherein the at least fifth ligated fragment comprises the first 3′ overhang and the twelfth 3′ overhang.
In some aspects, the preceding methods can further comprise: (k) Providing a fifth partially double-stranded nucleic acid molecule, wherein the fifth partially double-stranded nucleic acid molecule comprises a ninth 3′ overhang and a tenth 3′ overhang, wherein the ninth 3′ overhang is complementary to the eighth 3′ overhang; (i) Providing a sixth partially double-stranded nucleic acid molecule, wherein the sixth partially double-stranded nucleic acid molecule comprises an eleventh 3′ overhand and a twelfth 3′ overhand, wherein the tenth 3′ overhang is complementary to the eleventh 3′ overhang; (m) hybridizing the tenth 3′ overhang and the eleventh 3′ overhang; (n) ligating the fifth partially double-stranded nucleic acid molecule and the sixth partially double-stranded nucleic acid molecule to produce a fourth ligated fragment, wherein the fourth ligated fragment comprises the ninth 3′ overhand and the twelfth 3′ overhang; (o) providing a seventh partially double-stranded nucleic acid molecule, wherein the seventh partially double-stranded nucleic acid molecule comprises a thirteenth 3′ overhang and a fourteenth 3′ overhang, wherein the thirteenth 3′ overhang is complementary to the twelfth 3′ overhang; (p) providing an at least eighth partially double-stranded nucleic acid molecule, wherein the at least eighth partially double-stranded nucleic acid molecule comprises a fifteenth 3′ overhang and sixteenth 3′ overhang, wherein the fourteenth 3′ overhang is complementary to the fifteenth 3′ overhang; (q) hybridizing the fourteenth 3′ overhang and the fifteenth 3′ overhang; (r) ligating the seventh partially double-stranded nucleic acid molecule and the at least eighth partially double-stranded nucleic acid molecule to produce an at least fifth ligated fragment, wherein the at least fifth ligated fragment comprises the thirteenth 3′ overhang and the sixteenth 3′ overhang; (s) hybridizing the twelfth 3′ overhang and the thirteenth 3′ overhang; (t) ligating the fourth ligated fragment and the at least fifth ligated fragment to produce an at least sixth ligated fragment, wherein the at least sixth ligated fragment comprises the ninth 3′ overhang and the sixteenth 3′ overhang; (u) hybridizing the eighth 3′ overhang to the ninth 3′ overhang; (v) ligating the at least sixth ligated fragment and the third ligated fragment to produce an at least seventh ligated fragment, wherein the at least seventh ligated fragment comprises the first 3′ overhang and the sixteenth 3′ overhang.
In some aspects of the methods of the present disclosure, the first 5′ overhang, the fourth 5′ overhang, the fifth 5′ overhang, the eighth 5′ overhang, ninth 5′ overhang, the tenth 5′ overhang, the twelfth 5′ overhang, the thirteenth 5′ overhang, the sixteenth 5′ overhang or any combination thereof can comprise a hairpin sequence.
In some aspects of the methods of the present disclosure, the first 3′ overhang, the fourth 3′ overhang, the fifth 3′ overhang, the eighth 3′ overhang, ninth 3′ overhang, the tenth 3′ overhang, the twelfth 3′ overhang, the thirteenth 3′ overhang, the sixteenth 3′ overhang or any combination thereof can comprise a hairpin sequence.
In some aspects of the methods of the present disclosure, a hairpin sequence can comprise at least one deoxyuridine base. In some aspects of the methods of the present disclosure, a hairpin sequence can comprise at least one restriction endonuclease site. The restriction endonuclease site can be a Type II S restriction endonuclease site.
In some aspects, the preceding methods can further comprise after step (d), incubating the reaction of step (d) with at least one exonuclease. In some aspects, the preceding methods can further comprise after step (h), incubating the reaction of step (h) with at least one exonuclease. In some aspects, the preceding methods can further comprise after step (j), incubating the reaction of step (j) with at least one exonuclease. In some aspects, the preceding methods can further comprise after step (m), incubating the reaction of step (m) with at least one exonuclease. In some aspects, the preceding methods can further comprise after step (n), incubating the reaction of step (n) with at least one exonuclease. In some aspects, the preceding methods can further comprise after step (p), incubating the reaction of step (p) with at least one exonuclease. In some aspects, the preceding methods can further comprise after step (r), incubating the reaction of step (r) with at least one exonuclease. In some aspects, the preceding methods can further comprise after step (t), incubating the reaction of step (t) with at least one exonuclease. In some aspects, the preceding methods can further comprise after step (v), incubating the reaction of step (v) with at least one exonuclease. In some aspects of the methods of the present disclosure, a ligation reaction can be followed by an incubation of the ligation reaction components with at least exonuclease.
In some aspects of the methods of the present disclosure, an incubation with at least one exonuclease results in the digestion of any nucleic acid fragment not capped at both ends with a hairpin sequence.
In some aspects, the methods of the present disclosure can further comprise after incubation with the at least one exonuclease: removing the at least one exonuclease; and contacting the product of the exonuclease incubation with at least one enzyme that cleaves at deoxyuridine, thereby removing the hairpin sequence.
In some aspects, the methods of the present disclosure can further comprise after incubation with at the at least one exonuclease: removing the at least one exonuclease; and contacting the product of the exonuclease incubation with at least one endonuclease that cleaves the at least one restriction endonuclease site in the hairpin sequence, thereby removing the hairpin sequence.
In some aspects of the preceding methods, the first partially double-stranded nucleic acid molecule does not comprise the first 5′ overhang and instead comprises a blunt end and the second 5′ overhang. In some aspects of the preceding methods, the fourth partially double-stranded nucleic acid molecule does not comprise the eighth 5′ overhang and instead comprises a blunt end and the seventh 5′ overhang. In some aspects of the preceding methods, the at least fifth partially double-stranded nucleic acid molecule does not comprise the tenth 5′ overhang and instead comprises a blunt end and the ninth 5′ overhang. In some aspects of the preceding methods, the at least sixth partially double-stranded nucleic acid molecule does not comprise the twelfth 5′ overhang and instead comprises a blunt end and the eleventh 5′ overhang. In some aspects of the preceding methods, the at least eighth partially double-stranded nucleic acid molecule does not comprise the sixteenth 5′ overhang and instead comprises a blunt end and the fifteenth 5′ overhang.
In some aspects of the preceding methods, the first partially double-stranded nucleic acid molecule does not comprise the first 3′ overhang and instead comprises a blunt end and the second 3′ overhang. In some aspects of the preceding methods, the fourth partially double-stranded nucleic acid molecule does not comprise the eighth 3′ overhang and instead comprises a blunt end and the seventh 3′ overhang. In some aspects of the preceding methods, the at least fifth partially double-stranded nucleic acid molecule does not comprise the tenth 3′ overhang and instead comprises a blunt end and the ninth 3′ overhang. In some aspects of the preceding methods, the at least sixth partially double-stranded nucleic acid molecule does not comprise the twelfth 3′ overhang and instead comprises a blunt end and the eleventh 3′ overhang. In some aspects of the preceding methods, the at least eighth partially double-stranded nucleic acid molecule does not comprise the sixteenth 3′ overhang and instead comprises a blunt end and the fifteenth 3′ overhang.
The present disclosure provide a method of producing at least one target nucleic acid molecule, the methods comprising: (a) Providing at least one template double-stranded nucleic acid molecule comprising a first template strand and a second template strand; (b) Amplifying a first portion of the at least one template double-stranded nucleic acid molecule using a first primer molecule that hybridizes to a first region on the second template strand and a second primer molecule that hybridizes to a second region on the first template strand to produce at least one first double-stranded nucleic acid fragment; (c) Amplifying a second portion of the at least one template double-stranded nucleic acid molecule using a third primer molecule that hybridizes to the third region on the second strand and a fourth primer molecule that hybridizes to a fourth region on the first template strand to produce at least one second double-stranded nucleic acid fragment; (d) Amplifying a third portion of the at least one template double-stranded nucleic acid molecule using a fifth primer molecule that hybridizes to the fifth region on the second template strand and a sixth primer molecule that hybridizes to the sixth region on the first template strand to produce at least one third double-stranded nucleic acid fragment; (e) Contacting the at least one first double-stranded nucleic acid fragment with a restriction enzyme to form a first 3′ overhang; (f) Contacting the at least one second double-stranded nucleic acid fragment and a restriction enzyme to form a second 3′ overhang and third 3′ overhang, wherein the second 3′ overhang is complementary to the first 3′ overhang; (g) Contacting the at least one third double-stranded nucleic acid fragment and a restriction enzyme to form a fourth 3′ overhang, wherein the fourth 3′ overhang is complementary to the third 3′ overhang; (h) Hybridizing the first 3′ overhang to the second 3′ overhang; (i) Hybridizing the third 3′ overhang to the fourth 3′ overhang; (j) Ligating the at least one first double-stranded nucleic acid fragment and the at least one double-stranded second fragment; (k) Ligating the at least one second double-stranded nucleic acid fragment and the at least one third double-stranded nucleic acid fragment, thereby producing the at least one target nucleic acid molecule.
The present disclosure provide a method of producing at least one target nucleic acid molecule, the methods comprising: (a) providing at least one template double-stranded nucleic acid molecule comprising a first template strand and a second template strand; (b) amplifying a first portion of the at least one template double-stranded nucleic acid molecule using a first primer molecule that hybridizes to a first region on the second template strand and a second primer molecule that hybridizes to a second region on the first template strand to produce at least one first double-stranded nucleic acid fragment; (c) amplifying a second portion of the at least one template double-stranded nucleic acid molecule using a third primer molecule that hybridizes to the third region on the second strand and a fourth primer molecule that hybridizes to a fourth region on the first template strand to produce at least one second double-stranded nucleic acid fragment; (d) amplifying a third portion of the at least one template double-stranded nucleic acid molecule using a fifth primer molecule that hybridizes to the fifth region on the second template strand and a sixth primer molecule that hybridizes to the sixth region on the first template strand to produce at least one third double-stranded nucleic acid fragment; (e) Contacting the at least one first double-stranded nucleic acid fragment with a restriction enzyme to form a first 3′ overhang; (f) contacting the at least one second double-stranded nucleic acid fragment and a restriction enzyme to form a second 3′ overhang and third 3′ overhang; (g) contacting the at least one third double-stranded nucleic acid fragment and a restriction enzyme to form a fourth 3′ overhang, wherein the fourth 3′ overhang is complementary to the third 3′ overhang; (h) providing at least one fourth double-stranded nucleic acid fragment, wherein the at least one fourth double-stranded nucleic acid fragment comprises a fifth 3′ overhang and a sixth 3′ overhang, wherein the fifth 3′ overhang is complementary to the first 3′ overhang and the sixth 3′ overhang is complementary to the second 3′ overhang; (i) hybridizing the fifth 3′ overhang and the first 3′ overhang; (j) Hybridizing the sixth 3′ overhang and the second 3′ overhang; (k) Hybridizing the third 3′ overhang to the fourth 3′ overhang; (1) ligating the at least one first double-stranded nucleic acid fragment and the at least one fourth double-stranded nucleic acid fragment; (m) Ligating the at least one fourth double-stranded nucleic acid fragment and the at least one second double-stranded second fragment; (n) Ligating the at least one second double-stranded nucleic acid fragment and the at least one third double-stranded nucleic acid fragment, thereby producing the at least one target nucleic acid molecule.
The present disclosure provide a method of producing at least one target nucleic acid molecule, the methods comprising: (a) Providing at least one template double-stranded nucleic acid molecule comprising a first template strand and a second template strand; (b) Amplifying a first portion of the at least one template double-stranded nucleic acid molecule using a first primer molecule that hybridizes to a first region on the second template strand and a second primer molecule that hybridizes to a second region on the first template strand to produce at least one first double-stranded nucleic acid fragment; (c) Amplifying a second portion of the at least one template double-stranded nucleic acid molecule using a third primer molecule that hybridizes to the third region on the second strand and a fourth primer molecule that hybridizes to a fourth region on the first template strand to produce at least one second double-stranded nucleic acid fragment; (d) Amplifying a third portion of the at least one template double-stranded nucleic acid molecule using a fifth primer molecule that hybridizes to the fifth region on the second template strand and a sixth primer molecule that hybridizes to the sixth region on the first template strand to produce at least one third double-stranded nucleic acid fragment; (e) Contacting the at least one first double-stranded nucleic acid fragment with a restriction enzyme to form a first 3′ overhang; (f) Contacting the at least one second double-stranded nucleic acid fragment and a restriction enzyme to form a second 3′ overhang and third 3′ overhang, wherein the second 3′ overhang is complementary to the first 3′ overhang; (g) Contacting the at least one third double-stranded nucleic acid fragment and a restriction enzyme to form a fourth 3′ overhang; (h) Providing at least one fourth double-stranded nucleic acid fragment, wherein the at least one fourth double-stranded nucleic acid fragment comprises a fifth 3′ overhang and a sixth 3′ overhang, wherein the fifth 3′ overhang is complementary to the first 3′ overhang and the sixth 3′ overhang is complementary to the second 3′ overhang; (i) Providing at least one fifth double-stranded nucleic acid fragment, wherein the at least one fifth double-stranded nucleic acid fragment comprises a seventh 3′ overhang and a eighth 3′ overhang, wherein the seventh 3′ overhang is complementary to the third 3′ overhang and the eighth 3′ overhang is complementary to the fourth 3′ overhang; (j) Hybridizing the fifth 3′ overhang and the first 3′ overhang; (k) Hybridizing the sixth 3′ overhang and the second 3′ overhang; (l) Hybridizing the seventh 3′ overhang and the third 3′ overhang; (m) Hybridizing the eighth 3′ overhang and the fourth 3′ overhang; (n) Ligating the at least one first double-stranded nucleic acid fragment and the at least one fourth double-stranded nucleic acid fragment; (o) Ligating the at least one fourth double-stranded nucleic acid fragment and the at least one second double-stranded nucleic acid fragment; (p) Ligating the at least one second double-stranded fragment and the at least one fifth double-stranded nucleic acid fragment; (q) Ligating the at least one fifth double-stranded nucleic acid fragment and the at least one third double-stranded nucleic acid fragment, thereby producing the at least one target nucleic acid molecule.
The present disclosure provide a method of producing at least one target nucleic acid molecule, the methods comprising: (a) Providing at least one template double-stranded nucleic acid molecule comprising a first template strand and a second template strand; (b) Amplifying a first portion of the at least one template double-stranded nucleic acid molecule using a first primer molecule that hybridizes to a first region on the second template strand and a second primer molecule that hybridizes to a second region on the first template strand to produce at least one first double-stranded nucleic acid fragment; (c) Amplifying a second portion of the at least one template double-stranded nucleic acid molecule using a third primer molecule that hybridizes to the third region on the second strand and a fourth primer molecule that hybridizes to a fourth region on the first template strand to produce at least one second double-stranded nucleic acid fragment; (d) Amplifying a third portion of the at least one template double-stranded nucleic acid molecule using a fifth primer molecule that hybridizes to the fifth region on the second template strand and a sixth primer molecule that hybridizes to the sixth region on the first template strand to produce at least one third double-stranded nucleic acid fragment; (e) Contacting the at least one first double-stranded nucleic acid fragment with a restriction enzyme to form a first 5′ overhang; (f) Contacting the at least one second double-stranded nucleic acid fragment and a restriction enzyme to form a second 5′ overhang and third 5′ overhang, wherein the second 5′ overhang is complementary to the first 5′ overhang; (g) Contacting the at least one third double-stranded nucleic acid fragment and a restriction enzyme to form a fourth 5′ overhang, wherein the fourth 5′ overhang is complementary to the third 5′ overhang; (h) Hybridizing the first 5′ overhang to the second 5′ overhang; (i) Hybridizing the third 5′ overhang to the fourth 5′ overhang; (j) Ligating the at least one first double-stranded nucleic acid fragment and the at least one double-stranded second fragment; (k) Ligating the at least one second double-stranded nucleic acid fragment and the at least one third double-stranded nucleic acid fragment, thereby producing the at least one target nucleic acid molecule.
The present disclosure provide a method of producing at least one target nucleic acid molecule, the methods comprising: (a) providing at least one template double-stranded nucleic acid molecule comprising a first template strand and a second template strand; (b) amplifying a first portion of the at least one template double-stranded nucleic acid molecule using a first primer molecule that hybridizes to a first region on the second template strand and a second primer molecule that hybridizes to a second region on the first template strand to produce at least one first double-stranded nucleic acid fragment; (c) amplifying a second portion of the at least one template double-stranded nucleic acid molecule using a third primer molecule that hybridizes to the third region on the second strand and a fourth primer molecule that hybridizes to a fourth region on the first template strand to produce at least one second double-stranded nucleic acid fragment; (d) amplifying a third portion of the at least one template double-stranded nucleic acid molecule using a fifth primer molecule that hybridizes to the fifth region on the second template strand and a sixth primer molecule that hybridizes to the sixth region on the first template strand to produce at least one third double-stranded nucleic acid fragment; (e) Contacting the at least one first double-stranded nucleic acid fragment with a restriction enzyme to form a first 5′ overhang; (f) contacting the at least one second double-stranded nucleic acid fragment and a restriction enzyme to form a second 5′ overhang and third 5′ overhang; (g) contacting the at least one third double-stranded nucleic acid fragment and a restriction enzyme to form a fourth 5′ overhang, wherein the fourth 5′ overhang is complementary to the third 5′ overhang; (h) providing at least one fourth double-stranded nucleic acid fragment, wherein the at least one fourth double-stranded nucleic acid fragment comprises a fifth 5′ overhang and a sixth 5′ overhang, wherein the fifth 5′ overhang is complementary to the first 5′ overhang and the sixth 5′ overhang is complementary to the second 5′ overhang; (i) hybridizing the fifth 5′ overhang and the first 5′ overhang; (j) Hybridizing the sixth 5′ overhang and the second 5′ overhang; (k) Hybridizing the third 5′ overhang to the fourth 5′ overhang; (l) ligating the at least one first double-stranded nucleic acid fragment and the at least one fourth double-stranded nucleic acid fragment; (m) Ligating the at least one fourth double-stranded nucleic acid fragment and the at least one second double-stranded second fragment; (n) Ligating the at least one second double-stranded nucleic acid fragment and the at least one third double-stranded nucleic acid fragment, thereby producing the at least one target nucleic acid molecule.
The present disclosure provide a method of producing at least one target nucleic acid molecule, the methods comprising: (a) Providing at least one template double-stranded nucleic acid molecule comprising a first template strand and a second template strand; (b) Amplifying a first portion of the at least one template double-stranded nucleic acid molecule using a first primer molecule that hybridizes to a first region on the second template strand and a second primer molecule that hybridizes to a second region on the first template strand to produce at least one first double-stranded nucleic acid fragment; (c) Amplifying a second portion of the at least one template double-stranded nucleic acid molecule using a third primer molecule that hybridizes to the third region on the second strand and a fourth primer molecule that hybridizes to a fourth region on the first template strand to produce at least one second double-stranded nucleic acid fragment; (d) Amplifying a third portion of the at least one template double-stranded nucleic acid molecule using a fifth primer molecule that hybridizes to the fifth region on the second template strand and a sixth primer molecule that hybridizes to the sixth region on the first template strand to produce at least one third double-stranded nucleic acid fragment; (e) Contacting the at least one first double-stranded nucleic acid fragment with a restriction enzyme to form a first 5′ overhang; (f) Contacting the at least one second double-stranded nucleic acid fragment and a restriction enzyme to form a second 5′ overhang and third 5′ overhang, wherein the second 5′ overhang is complementary to the first 5′ overhang; (g) Contacting the at least one third double-stranded nucleic acid fragment and a restriction enzyme to form a fourth 5′ overhang; (h) Providing at least one fourth double-stranded nucleic acid fragment, wherein the at least one fourth double-stranded nucleic acid fragment comprises a fifth 5′ overhang and a sixth 5′ overhang, wherein the fifth 5′ overhang is complementary to the first 5′ overhang and the sixth 5′ overhang is complementary to the second 5′ overhang; (i) Providing at least one fifth double-stranded nucleic acid fragment, wherein the at least one fifth double-stranded nucleic acid fragment comprises a seventh 5′ overhang and a eighth 5′ overhang, wherein the seventh 5′ overhang is complementary to the third 5′ overhang and the eighth 5′ overhang is complementary to the fourth 5′ overhang; (j) Hybridizing the fifth 5′ overhang and the first 5′ overhang; (k) Hybridizing the sixth 5′ overhang and the second 5′ overhang; (l) Hybridizing the seventh 5′ overhang and the third 5′ overhang; (m) Hybridizing the eighth 5′ overhang and the fourth 5′ overhang; (n) Ligating the at least one first double-stranded nucleic acid fragment and the at least one fourth double-stranded nucleic acid fragment; (o) Ligating the at least one fourth double-stranded nucleic acid fragment and the at least one second double-stranded nucleic acid fragment; (p) Ligating the at least one second double-stranded fragment and the at least one fifth double-stranded nucleic acid fragment; (q) Ligating the at least one fifth double-stranded nucleic acid fragment and the at least one third double-stranded nucleic acid fragment, thereby producing the at least one target nucleic acid molecule.
In some aspects of the preceding methods, the second region on the first template strand and the third region on the second template strand can be at least partially complementary. In some aspects of the preceding methods, the fourth region on the first template strand and the fifth region on the second template strand can be at least partially complementary.
In some aspects, the present disclosure provides methods comprising: a) Generation of an assembly map, comprising fragment designs, wherein each fragment possesses 3′ and/or 5′ overhangs. The 3′ or 5′ overhangs are selected from a set of N-mer sites, known not to inappropriately cross-hybridize or inappropriately ligate and also known to ligate efficiently with target N-mer sites on adjacent oligonucleotide pairs; b) Contacting two fragments at a time in a ligation reaction leading to a larger new fragment; c) Contacting a fragment either with a blunt ended fragment (i.e. a fragment with only one overhanging single-stranded N-mer or; d) Contacting a fragment with a nucleic acid hairpin with a complementary overhanging single-stranded N-mer.
In some aspects of the preceding methods, at least one nucleic acid molecule or at least one fragment can comprise at least one modified nucleic acid. The at least one modified nucleic acid can comprise methylated cytidine. The at least one modified nucleic acid can comprise 5mC (5-methylcytosine), 5hmC (5-hydromethylcytosine), 5fC (5-formylcytosine), 3 mA (3-methyladenine), 5-fU (5-formyluridine), 5-hmU (5-hydroxymethyluridine), 5-hoU (5-hydroxyuridine), 7mG (7-methylguanine), 8oxoG (8-oxo-7,8-dihydroguanine), AP (apurinic/apyrimidinic sites), CPDs (Cyclobutane pyrimidine dimers), dI (deoxyinosine), dR5P (deoxyribose 5′-phosphate), dU (deoxyuridine), dX (deoxyxanthosine), PA (3′-phospho-α,β-unsaturated aldehyde), rN (ribonucleotides), Tg (Thymine Glycol), TT (TT dimer) and/or Mismatches including AP:A (apurinic/apyrimidinic site base paired with adenine), DHT:A (5,6-dihydrothymine base paired with an adenine), 5-hmU:A (5-hydroxymethyluracil base paired with an adenine), 5-hmU:G (5-hydroxymethyluracil base paired with a guanine), I:T (inosine base paired with a thymine), 6-MeA:T (6-methyladenine base paired with a thymine), 8-OG:C (8-oxoguanine base paired with a cytosine), 8-OG:G (8-oxoguanine base paired with a guanine), U:A (uridine base paired with an adenine) or U:G (uridine base paired with a guanine) or any combination thereof.
In some aspects of the preceding methods, at least one nucleic acid molecule or at least one fragment can comprise at least one non-hybridized sequence, at least one non-symmetrical element, at least one hairpin, at least one G-quadruplex, at least one I-motif, at least one hemi-modified site, at least on CpG or any combination thereof. The at least one non-hybridized sequence, at least one non-symmetrical element, at least one hairpin, at least one G-quadruplex, at least one I-motif, at least one hemi-modified site, at least on CpG or any combination thereof can be used to introduce at least one or at least two unique molecular identifier (UMI) regions. The at least one or at least two UMI regions can lead to increased diversity.
In some aspects of the preceding methods, the at least one target nucleic acid molecule is a plurality of target nucleic acid molecules. Thus, in some aspects, the products of the preceding methods is a plurality of target nucleic acid molecules. A plurality of target nucleic acids can comprise at least about 1, or at least about 2, or at least about 3, or at least about 4, or at least about 5, or at least about 6, or at least about 7, or at least about 8, or at least about 9, or at least about 10, or at least about 1.0×102, or at least about 1.0×103, or at least about 1.0×104, or at least about 1.0×105, or at least about 1.0×106, or at least about 1.0×107, or at least about 1.0×108, or at least about 1.0×109, or at least about 1.0×1010, or at least about 1.0×1011, or at least about 1.0×1012, or at least about 1.0×1013, or at least about 1.0×1014, or at least about 1.0×1015, or at least about 1.0×1016, or at least about 1.0×1017, or at least about 1.0×1018, or at least about 1.0×1019, or at least about 1.0×1020, or at least about 1.0×1025, or at least about 1.0×1030, or at least about 1.0×1035, or at least about 1.0×1040, or at least about 1.0×10100 distinct target nucleic acid species, wherein each distinct target nucleic acid species comprises a different nucleic acid sequence. In some aspects, each nucleic acid species is present in the plurality in approximately the same amount
In some aspects of the preceding methods, at least one target nucleic acid can comprise at least one nucleotide substitution, deletion, insertion or any combination thereof that causes at least one amino acid codon variation, deletion, insertion or any combination thereof as compared to a wildtype or reference sequence. The distribution of variant substitutions, insertions, deletions or any combination thereof can be approximately even at multiple distal sites.
In some aspects, the products of the methods of the present disclosure can be used for the screening and/or selection of proteins and/or peptides. In some aspects, the products of the methods of the present disclosure can be used for screening and/or selection of at least one protein fusion, at least one protein-peptide fusion and/or at least one peptide-peptide fusions. In some aspects, the products of the methods of the present disclosure can be used for the screening and/or selection of differential methylated promoters, gene bodies, untranslated regions (UTRs) or any combination thereof. In some aspects, the products of the methods of the present disclosure can be used for the screening and/or selection of aptamers, siRNAs, PCR primers, sequencing adapters or any combination thereof. Screening and/or selection can performed using a cell-based assay. Screening and/or selection can performed using a cell-free assay.
In some aspects, the products of the methods of the present disclosure can be used for barcoding or unique molecular identifiers (UMIs). The barcoding or unique molecular identifiers can be used in single cell sequencing.
In some aspects, the products of the methods of the present disclosure can comprise sequences and/or modifications for the attachment of proteins onto a nucleic acid sequence.
In some aspects, the methods of the present disclosure can be performed on at least one solid support. In some aspects, the methods of the present disclosure can be performed on at least one bead. In some aspects, the products of the methods of the present disclosure can be attached to at least one solid support. In some aspects, the products of the methods of the present disclosure can be are attached to at least one bead. In some aspects, the products of the methods of the present disclosure can be attached to at least one bead such that the bead is attached to only nucleic acid molecules comprising the same sequence.
In some aspects of the methods of present disclosure, the first partially double-stranded nucleic acid molecule, the second partially double-stranded nucleic acid molecule, the third partially double-stranded nucleic acid molecule, the fourth partially double-stranded nucleic acid molecule, the fifth partially double-stranded nucleic acid molecule, the sixth partially double-stranded nucleic acid molecule, the seventh partially double-stranded nucleic acid molecule, the eighth partially double-stranded nucleic acid molecule or any combination thereof can comprise RNA, DNA, XNA, at least one modified nucleic acid, at least one peptide or any combination thereof.
In some aspects of the methods of present disclosure, the first partially double-stranded nucleic acid molecule, the second partially double-stranded nucleic acid molecule, the third partially double-stranded nucleic acid molecule, the fourth partially double-stranded nucleic acid molecule, the fifth partially double-stranded nucleic acid molecule, the sixth partially double-stranded nucleic acid molecule, the seventh partially double-stranded nucleic acid molecule, the eighth partially double-stranded nucleic acid molecule or any combination thereof can be obtained from any source.
In some aspects of the methods of present disclosure, the first partially double-stranded nucleic acid molecule, the second partially double-stranded nucleic acid molecule, the third partially double-stranded nucleic acid molecule, the fourth partially double-stranded nucleic acid molecule, the fifth partially double-stranded nucleic acid molecule, the sixth partially double-stranded nucleic acid molecule, the seventh partially double-stranded nucleic acid molecule, the eighth partially double-stranded nucleic acid molecule or any combination thereof can be obtained from at least one endonuclease digestion reaction of native DNA, at least one PCR reaction, at least one Recombinase Polymerase Amplification (RPA) reaction, at least one reverse transcription reaction, at least single-stranded geometric synthesis reaction or any combination thereof.
In some aspects of the methods of present disclosure, the first partially double-stranded nucleic acid molecule, the second partially double-stranded nucleic acid molecule, the third partially double-stranded nucleic acid molecule, the fourth partially double-stranded nucleic acid molecule, the fifth partially double-stranded nucleic acid molecule, the sixth partially double-stranded nucleic acid molecule, the seventh partially double-stranded nucleic acid molecule, the eighth partially double-stranded nucleic acid molecule or any combination thereof can be obtained from chemical synthesis of oligonucleotides.
In some aspects of the methods of present disclosure, the first primer, the second primer, the third primer, the fourth primer, the fifth primer, the sixth primer or any combination thereof can comprise at least one modified nucleic acid. The at least one modified nucleic acid can comprise methylated cytidine. The at least one modified nucleic acid can comprise 5mC (5-methylcytosine), 5hmC (5-hydromethylcytosine), 5fC (5-formylcytosine), 3 mA (3-methyladenine), 5-fU (5-formyluridine), 5-hmU (5-hydroxymethyluridine), 5-hoU (5-hydroxyuridine), 7mG (7-methylguanine), 8oxoG (8-oxo-7,8-dihydroguanine), AP (apurinic/apyrimidinic sites), CPDs (Cyclobutane pyrimidine dimers), dI (deoxyinosine), dRSP (deoxyribose 5′-phosphate), dU (deoxyuridine), dX (deoxyxanthosine), PA (3′-phospho-α,β-unsaturated aldehyde), rN (ribonucleotides), Tg (Thymine Glycol), TT (TT dimer) and/or Mismatches including AP:A (apurinic/apyrimidinic site base paired with adenine), DHT:A (5,6-dihydrothymine base paired with an adenine), 5-hmU:A (5-hydroxymethyluracil base paired with an adenine), 5-hmU:G (5-hydroxymethyluracil base paired with a guanine), I:T (inosine base paired with a thymine), 6-MeA:T (6-methyladenine base paired with a thymine), 8-OG:C (8-oxoguanine base paired with a cytosine), 8-OG:G (8-oxoguanine base paired with a guanine), U:A (uridine base paired with an adenine) or U:G (uridine base paired with a guanine) or any combination thereof.
In some aspects of the methods of present disclosure, the first primer, the second primer, the third primer, the fourth primer, the fifth primer, the sixth primer or any combination thereof can comprise at least one nucleotide substitution, deletion, insertion or any combination thereof that causes at least one amino acid codon variation, deletion, insertion or any combination thereof as compared to a wildtype or reference sequence.
In some aspects of the methods of present disclosure, a restriction enzyme can be an MspJI family restriction enzyme. The restriction enzyme can be MSpJI, FspEI, LpnPI, AspBHI, RIaI, SgrTI or any combination thereof.
In some aspects of the methods of present disclosure, the at least one fourth double-stranded nucleic acid fragment, the at least one fifth double-stranded nucleic acid fragment or any combination thereof can comprise at least one nucleotide substitution, deletion, insertion or any combination thereof that causes at least one amino acid codon variation, deletion, insertion or any combination thereof as compared to a wildtype or reference sequence.
In this example, the results of a double-stranded gSynth assembly reactions, as described herein, were compared to the results of the existing, alternative method of hybridization and elongation (HAE) using DNA polymerase. HAE is similar to polymerase cycling assembly (PCA) reactions but does not use PCR amplification. A variety of programed sequences were synthesized using bother double-stranded gSynth and HAE. These programmed sequences included sequences that had a GC content ranging from 10% to 90% along the length of the sequence, from 20% to 80% along the length of the sequence, from 30% to 70% along the length of the sequence, from 40% to 60% along the length of the sequence, and sequences that were 50% GC along the entire length of the sequence.
Briefly, the double-stranded gSynth reactions were performed as follows: each of the largely double stranded pairs of fragments were first resuspended at 10 μM in annealing buffer (10 mM Tris-HCl, 50 mM NaCl). The solution was then heated to 95° C. for 30 seconds on a PCR machine, then allowed to cool to room temperature. After annealing in the first ligation reaction, adjacent fragments are combined (2.5 μL each of a 10 μM solution). For fragments lacking a 5′ P04, the ligation reaction also includes Polynucleotide Kinase (PNK). Thus, in one embodiment, the complete ligation reaction includes: 5 μl oligos (2.5 μl pair A, 2.5 μl pair B)+6 μl of 2× Buffer+0.5 μl PNK+0.5 μl T7 DNA ligase. In these reactions 1× Buffer is: 66 mM Tris-HCl, 10 mM MgCl2, 1 mM ATP, 1 mM DTT, 7.5% Polyethylene glycol (PEG 6000), pH 7.6 @ 25° C. Reactions are held at 25° C. Each subsequent ligation reaction between adjacent fragments is performed by combining all of the reaction volumes of each of the two fragments together.
The products of the assembly reactions were analyzed via gel-separation and the results of the analysis are shown in
In this example, double-stranded gSynth assembly reactions were performed as described herein, where the double-stranded nucleic acid fragments corresponding to the two termini of the sequence to be synthesized were capped with hairpin structures. The products of the gSynth assembly reaction were then analyzed by gel separation before and after treatment with a T7 exonuclease. Without wishing to be bound by theory, any nucleic acid that is not capped at both ends by a hairpin should be digested by the exonuclease. That is, only desired, full-length assembly products should be present after digestion with T7 exonuclease. The results of the gel separation analysis are shown in
The following examples further describe an application of the double-stranded gSynth methods of the present disclosure entitled variant geometric synthesis (V-gSynth), a modular DNA manipulation method for generating gene variant libraries by insertion, deletion and/or substitution of codons.
To demonstrate the utility of V-gSynth, a substitution-based bead display library of functional GFP variants was constructed. These functional GFP variants exhibited altered spectral properties. Extending this proof of concept, a large variant library containing InDels, with up to 12 codon insertions and an estimated 3.2×1014 protein-coding variants was constructed. Sequencing analysis demonstrated an even codon distribution and extraordinary high diversity of variants that greatly exceeds previous work.
To generate diverse gene variant libraries that included insertions and deletions (InDels), a PCR approach was used that involved the preparation from the gene of interest, with primers containing the modified nucleobase 5-methylcytosine, as shown in
After preparation of the methylated fragments, FspEI was used to create four-nucleotide overhangs that can be assembled, via ligation, into the required gene, as shown in
To create non-synonymous mutations, codon changes can be incorporated into the oligos used to amplify the different fragments, as shown in
Initial V-gSynth experiments consisted of assembling four IVTT templates. The motivation to generate IVTT templates was based on the GFP variants p.Y66W, p.T203Y and p.[Y66W; T203Y] described by Sawano et al. (Sawano, A. “Directed Evolution of Green Fluorescent Protein by a New Versatile PCR Strategy for Site-Directed and Semi-Random Mutagenesis.” Nucleic Acids Research, vol. 28, no. 16, 2000, doi:10.1093/nar/28.16.e78.). Using two excitation wavelength 488 nm and 440 nm, the four IVTT templates should be distinguishable by their 488/440 nm ratio, where the ratio of p.T203Y>wild-type>p.[Y66W; T203Y]>p.Y66W. Each of the wild-type, p.Y66W, p.T203Y and p.[Y66W; T203Y] IVTT templates were assembled from three FspEI digested, Methylated Fragments as shown in FIGS. 5A and 5B, as well as
Next, the wild-type, p.Y66W, p.T203Y and p.[Y66W; T203Y] IVTT templates were evaluated within the PUREexpress system. The expression of the wild-type, p.Y66W, p.T203Y and p.[Y66W; T203Y] IVTT templates yielded emGFP variants of the same size and with comparable expression levels to the original pRSET/emGFP plasmid (
A monoclonal, on-bead p.[Y66X; T203X] IVTT library was constructed (along with on-bead wild-type, p.Y66W, p.T203Y and p.[Y66W; T203Y] IVTT templates as controls). A combination of sixteen individual variants at position Y66 (Y66N, Y66T, Y66S, Y66I, Y66H, Y66P, Y66R, Y66L, Y66D, Y66A, Y66G, Y66V, Y66, Y66S, Y66C and Y66F) and a further sixteen individual variants at position T203 (T203N, T203, T203S, T203I, T203H, T203P, T203R, T203L, T203D, T203A, T203G, T203V, T203S, T203C and T203F) constitute the 256 members of the p.[Y66X; T203X] IVTT library (
Assembly of the on-bead p.[Y66X; T203X] IVTT library was first confirmed by NGS. Sequencing of the p.[Y66X; T203X] IVTT library generated the raw fastq files which contained 34,980 paired-end reads, from which 31,283 (89.4%) where the desired in-frame reads, which contained the Adapter 1a sequence (nucleotide position 202 to 214) and the in-frame nucleotides ACC (nucleotide position 196-198, codon position T65) in read 1, as well as the Adapter 2a sequence (nucleotide position 609 to 597), and the in-frame nucleotides CTG (base position 615 to 613, codon position Q204) in Read 2, as shown in
Overall, the sequence variations introduced at positions Y66X and T203X create the degenerate nucleotide sequence N1N2C3. Median A, C, G and T values for nucleotide N1 were 27.1±3.7%, 22.7±0.9%, 24.2±5.2% and 25.9±0.6%; for nucleotide N2 were 27.6±5.1%, 21.4±2.4%, 24.8±2.2% and 26.2±5.3 and for nucleotide C3 were 1.8±1.2%, 97.9±1.5%, 0.1 f 0.0% and 0.2±0.3, respectively (
Once the on-bead p.[Y66X; T203X] library was confirmed, fluorescent imaging was performed. A single monoclonal bead from the p.[Y66X; T203X] library was encapsulated within a single droplet of the IVTT reaction mix. The single droplet will fluoresce according to the monoclonal variant present within the monoclonal DNA on the bead. Individual beads were placed within an emulsion using 2.0% PicoSurf-1 in HFE7500 (v/v). Three images of each emulsion were captured at 488 nm and 440 nm excitation along with the brightfield image, with the 488/440 ratio being overlaid onto the brightfield image for individual droplets containing single beads (
The 488/440 ratio for the monoclonal variant library indicates that individual droplets from the library had different spectral properties, which are consistent with the wild-type, p.Y66W, p.T203Y and p.[Y66W; T203Y] controls. Furthermore, there was an increase in droplets which contain a single bead yet did not fluoresce at either 440 or 488 nm, as many of the variants introduced eliminate the fluorescence of that particular GFP variant (
This example demonstrates the production of a hugely diverse InDel library using the double stranded geometric synthesis methods of the present disclosure. The InDel library was an amalgamation of forty-nine InDel combinations. Initially, the three codons T65_Y66_G67 were deleted during the preparation of the Methylated InDel Fragments 1 and 2, while codons S202 T203_Q204 were deleted between Methylated InDel Fragments 2 and 3. FspEI digestion removed a further 12/16 nucleotides from the 5-methycytosine leaving four-nucleotide overhangs suitable for T7 DNA ligase (
Sequencing of the InDel Library generated the raw fastq files which contained 221,610,757 paired end reads, of which 188,805 (0.09%) aligned to the wildtype emGFP sequence and were removed. Wildtype reads were detected as described for the p. [Y66X; T203X] IVTT library (see above), pairs of reads which had the wildtype sequence in either Read 1 (77,033), Read 2 (92,814) or both Read 1 and Read 2 (18,958) were discarded, leaving 221,421,952 paired end reads. Following the removal of the wildtype sequences, 192,331,072 (86.8%) in-frame reads were kept as they contained the desired adapter 1b sequence GTG CAG TGC TTC G (nucleotide position 205 to 217) and sequence TGG (base position 193 to 195, codon position L64) in read 1 as well as adapter 2b sequence (base position 606 to 594), and the sequence GGA (base position 616 to 618, codon position S205) in Read 2 (
The population of each InDel combination, was directly related to the initial InDel Duplex concentration (and therefore diversity), within the two InDel Duplex Pools. The largest population of reads 87.7% (expected, 87.9%) belonged to the most diverse combination p.[T65_G67delins(X)6; S202_Q204delins(X)6], in comparison, the least diverse combination p.[T65_G67delTYG; S202_Q204delSTQ] combination contained 0.0% (expected, 0.0%) of the reads. As described for the on-bead p.[Y66X; T203X] IVTT library (see above), the sequence variations introduced at positions T65_Y66_G67 and S202_T203_Q204 were created by the degenerate nucleotide sequence N1N2C3. Median A, C, G and T values for nucleotide N1 were 23.5±1.5%, 28.3±2.3%, 25.8±3.0% and 22.5±1.3%; for nucleotide N2 were 23.3±1.5%, 27.6±2.5%, 26.8±2.5% and 22.3±1.4 and for nucleotide C3 were 0.5+0.3%, 99.2±0.6%, 0.2±0.4% and 0.1±0.1% respectively, with the median codon value being 5.9±1.7%. Overall, the % GC for N1, N2 and N3 was 54.1±2.3%, 54.4±2.4 and 99.4±0.4%.
Discussion of Examples 2A-2D
Creating accurate and well-balanced sequence diversity, whether in the form of substitution, insertion and/or deletion, is the Keystone for many methodologies involving the use of variant libraries, nonemore so than in directed evolution. Variant library quality within directed evolution defines the library size and library diversity, therefore influences any screening strategy and size. Ultimately, the variant library quality determines the potential success (or failure) of any given directed evolution undertaking.
V-gSynth, which leverages the double-stranded geometric synthesis methods of the present disclosure, is a highly capable, flexible and user-friendly methodology which, can introduce substitutions, insertions and/or deletions, simultaneously at multiple distal sites. Hugely diverse variant libraries can be produced within a single working day, while only requiring commercially available enzymes and reagents combined with the most basic of molecular biology recourses. Furthermore, due to automation friendly nature of V-gSynth, the methodology can be parallelized and scaled as required.
IVTT templates were generated using V-gSynth, however due to the inherent flexibly of the four nucleotide overhangs generated by FspEI, V-gSynth is compatible with any cloning strategy. Likewise, while only the coding region of a single gene within a single plasmid, was targeted, nothing is stopping the targeted assembly of variants from multiple genes and from multiple sources. Furthermore, as the assembly of V-gSynth monoclonal beads is PCR-free. Thus DNA, RNA as well as modified nucleic acids (including nucleobase, sugar and/or back bone modifications) can be incorporated into the monoclonal variants bead library, extending the scope of V-gSynth from protein evolution into other areas such as aptamers, SELEX etc.
The one-pot, single step assembly approach of V-gSynth is capable of generating huge diversity, while maintaining an even distribution of that diversity, this was as exemplified by the assembly of the InDel library, which is a generated through the combination of 49 unique InDel combinations. An unprecedented ˜85% of all sequences generated within the InDel library were unique sequences with the potential to produce a desired, in-frame, full-length protein variant. Many of the out-of-frame reads within the InDel library, will originate from the synthetic oligos used for the InDel duplexes, in particular the N−1 error associated with phosphoramidite synthesis. By employing a more faithful oligo synthesis method and/or further purification of the synthetic oligos (such as PAGE or HPLC) these N−1 errors can be greatly reduced.
Furthermore, the slight increase of the % GC (within the InDel library) of N1 (54.1±2.3%) and N2 (54.4±2.4) above the ideal 50% for nucleotides N1 and N2 within the degenerate N1N2C3 sequence, may be due to the melting temperatures of the InDel oligo duplexes. Duplexes with a higher % GC and therefore (on average) a higher melting temperature, will have had a greater representation within the InDel Duplex Pools. Optimisation of the InDel duplex sequence, along with the annealing conditions should create % GC of the N1 and N2 nucleotide more in line with the ideal 50%. The consequence of gaining an even % GC, would been seen at the protein level with, for example, within the InDel library the codon P (nucleotide sequence CCC) had the highest representation (9.0±1.1%) while codon F (nucleotide sequence TTC) was represented the least (5.3±0.5%). An ideal % GC would allow for a more even codon representation, regardless of the nucleotide sequence.
During the application of V-gSynth we successfully generated a monoclonal, on-bead IVTT library which contained 256 nucleotide and 225 codon variations, along with an InDel library with an estimated ˜3.2×1014 nucleotide and ˜1.5×1014 codon variations.
Methods for Examples 2A-2D
Reagents
Unless otherwise stated all enzymes, buffers, dNTPs, rNTPs and the GeneJET Gel Extraction kit were supplied by New England Biolabs (NEB; Ipswich, Mass., USA) and all oligonucleotides were supplied by Integrated DNA Technologies (IDT; Coralville, Iowa, USA). Dibenzocyclooctyne (DBCO) Magnetic Beads (Jena Bioscience; Jena, Germany), PicoSurf-1 (Sphere Fluidics; Cambridge, UK), HFE7500 oil (Fluorochem; Hadfield, UK), Nuclease-free water, pREST/emGFP and QuBit/high sensitivity dsDNA kit (ThermoFisher; Waltham, Mass., USA), Solid Phase Reversible Immobilization (SPRI) beads were made as previously described (Rohland, N., and D. Reich. “Cost-Effective, High-Throughput DNA Sequencing Libraries for Multiplexed Target Capture.” Genome Research, vol. 22, no. 5, 2012, pp. 939-946., doi:10.1101/gr.128124.111.).
emGFP Reference, Nucleotide and Codon Variations Nomenclature
Nucleotide and codon numbering of emGFP are from the consensus sequence of eGFP (Tsien, Roger Y. “The Green Fluorescent Protein.” Annual Review of Biochemistry, vol. 67, no. 1, 1998, pp. 509-544., doi:10.1146/annurev.biochem.67.1.509.). Nomenclature used throughout this disclosure to describe the nucleotide and codon variations, are based upon recommendations by Stylianos Antonarakis and Johan den Dunnen (Dunnen, Johan T. Den, and Stylianos E. Antonarakis. “Mutation Nomenclature Extensions and Suggestions to Describe Complex Mutations: A Discussion.” Human Mutation, vol. 15, no. 1, 2000, pp. 7-12., doi:10.1002/(sici)1098-1004(200001)15:13.0.co;2-n; Dunnen, Johan T. Den, et al. “HGVS Recommendations for the Description of Sequence Variants: 2016 Update.” Human Mutation, vol. 37, no. 6, 2016, pp. 564-569., doi:10.1002/humu.22981.).
Methylated Primers
All methylated primers for the generation of the Methylated Fragments, contained the recognition site GCCATGCTGTCXAGGNNNNNNNN↓NNNN↑ (SEQ ID NO: 1), where X is 5-methylcytosine and N is either A, C, G or T. The recognition site, used in our methylated primers is compatible with MspJI, FspEI and LpnPI restriction enzymes.
Generation of the Wild-Type, p.Y66W, p.T203Y and p.[Y66W; T203Y] IVTT Templates
The V-gSynth methodology consists of three simple steps (
A. Preparation of Methylated Fragments
Methylated Fragments 1-Y66, 1-Y66W, 2, 3-T203 and 3-T203Y were prepared in 1×Q5 Reaction Buffer, 1×Q5 High GC Enhancer, 0.5 μM each forward and reverse primer, 0.2 mM each dNTP, 1 ng pRSET/EmGFP vector and 0.02 U/μL Q5 DNA Polymerase. Thermocycling Conditions were 30 s at 98° C., followed by 30 cycles of 10 s at 98° C., 15 s at 65° C. and 45 s at 72° C., with a final step of 2 min at 72° C. The Methylated Fragments were purified using SPRI beads, eluted in water, quantified by Qubit and used directly within FspEI digestions.
B. FspEI Digestion of Methylated Fragments
FspEI digestion consisted of 1× CutSmart buffer, 1× Enzyme Activator, 0.01 Units/μL and 100 to 1000 ng of a Methylated Fragment (prepared as described above) and incubated at 37° C. for 30 min. The Digested Fragments were purified using SPRI beads, eluted in water and used directly within T7 Ligase Assemblies.
C. T7 DNA Ligase Assembly of Digested Fragments
Assembly of the IVTT templates consisted of an equimolar mix (100 to 1000 ng total DNA) of the Digested Fragments 1-Y66, 2 and 3-T203 (wild-type), 1-Y66W, 2 and 3-T203 (p.Y66W), 1-Y66, 2 and 3-T203Y (p.T203Y) and 1-Y66W, 2 and 3-T203Y (p.[Y66W; T203Y]) in 1×T7 DNA Ligase Reaction Buffer with 150 Units/μL of T7 DNA Ligase and incubated at 25° C. for 60 min. Assembled IVTT templates were used directly (without purification) within IVTT reactions or amplified for sequencing.
Generation of On-Bead Wild-Type, Y66W, T203Y and [Y66W; T203Y] IVTT Templates
The on-bead wild-type, p.Y66W, p.T203Y and p.[Y66W; T203Y] IVTT templates were prepared as described above with the exception that Primer 6 was replaced with Primer 6-azide. Once the IVTT templates had been assembled, the templated were covalently attached by click chemistry (Klob_2001; Best_2009; Jewett_2010) to DBCO beads by adding an equal volume of DBCO beads (1 mg/mL) in 6 mM Tris-HCl (pH 7.4), 1.2 M NaCl, 0.6 mM EDTA, 0.006% Tween and 40% DMSO was added to each individual assembly reaction, before being incubated for 2 hr at room temperature. The individual on-bead, assembled, IVTT templates were washed four times with 1×PBS/0.01% Tween, before being stored in 1×PBS/0.01% Tween (1 mg/mL) at 4° C., ready for use as controls within the emulsion based IVTT reactions (see below).
Generation of On-Bead [Y66X; Y203X] IVTT Library
To generate the 256 members of the on-bead, p.[Y66X; Y203X] IVTT library, thirty-three Methylated Fragments were prepared, sixteen of the Methylated Fragments were variations on Methylated Fragment 1 and carried the sixteen codons Y66N, Y66T, Y66S, Y66I, Y66H, Y66P, Y66R, Y66L, Y66D, Y66A, Y66G, Y66V, Y66, Y66S, Y66C and Y66F (simplified to Methylated Fragment 1-Y66X). A further sixteen Methylated Fragments were variations on Methylated Fragment 3 and carried the sixteen codons T203N, T203, T203S, T203I, T203H, T203P, T203R, T203L, T203D, T203A, T203G, T203V, T203S, T203C and T203F (simplified to Methylated Fragment 3-Y203X). Finally, Methylated Fragment 2 was identical throughout the 256 variants.
The combination of the sixteen codons at position p.Y66X and p.T203X are equivalent to the nucleotide substitution c.[199_201>NNC; 700_702>NNC]. The p.Y66S codon substitution occurs twice, because the nucleotide substitutions c.199_201>AGC and c.199_201>TCC, are equivalent at the protein level. Similarly, the p.T203S codon substitution occurs twice as the nucleotide substitutions c.700_702>AGC and c.700_702>TCC are equivalent at the protein level. FspEI digestion and T7 DNA ligase assemblies were carried out as described above using Primer 6-azide throughout to covalently attached the p.[Y66X; Y203X] library to DBCO beads. Once each of the 256 variants was individually attached to the DBCO beads, the beads were pooled and stored in 1×PBS/0.01% Tween (1 mg/mL) at 4° C., with the on-bead p.[Y66X; Y203X] library being used either for the preparation of a NGS library or for fluorescent imaging (see below).
In-Vitro Transcription and Translation (IVTT) Reactions
In-vitro transcription and translation (IVTT) reactions used the PUREexpress system and contained 10 μL of component A, 7.5 μL component B, 250 ng of template with the reactions being adjusted to a final volume of 25 μL with nuclease-free water. IVTT reactions were incubated at 37° C. for 4 hours before running on an SDS-PAGE. Emulsion based IVTT reactions contained 10 μL of component A, 7.5 μL component B, 1 μL template beads (1 mg/mL), with the reactions being adjusted to a final volume of 25 μL with nuclease-free water. The aqueous phase was mixed with 100 μL of an oil phase containing 2.0% PicoSurf-1 in HFE7500 (v/v). The emulsion was created by vortexing for 3 min at 0.3/4 of the maximal vortex speed, followed by incubation of the emulsions at 37° C. for 4 hours before imaging.
Fluorescence Imaging
Sawano et al. demonstrated that the four GFP variant, wild-type, p.Y66W, p.T203Y and p.[Y66W; T203Y] can be distinguished using the ratio, from the fluorescence of two excitation wavelengths, where p.T203Y>wild-type>p.[Y66W; T203Y]>p.Y66W, therefore making the four GFP variant distinguishable within a mixture (Sawano 2000). This approach was used to image the emulsions, using 488 and 440 nm as the two excitation wavelengths, on an Olympus FV1000 fluorescent microscope.
InDel Duplexes
InDel Duplex Pool 1 contained seven duplexes T65_G67delTYG, T65_G67delins(X)1, T65_G67delins(X)2, T65_G67delins(X)3, T65_G67delins(X)4, T65_G67delins(X)5, T65_G67delins(X)6; while InDel Duplex Pool 2 contained the seven duplexes S202_Q204delSTQ, 202_Q204delins(X)1, 202_Q204delins(X)2, 202_Q204delins(X)3, 202_Q204delins(X)4, 202_Q204delins(X)5, 202_Q204delins(X)6 (
Generation of the InDel Library, Containing the Forty-Nine InDel Combinations
The highly diverse InDel library can be described as a combination of forty-nine libraries with p.[T65_G67delTYG; S202_Q204delSTQ] being the smallest library, contains only 1 member, were codons T65_Y66_G67 and S202 T203_Q204 are deleted. Library p.[T65_G67delins(X)6; S202_Q204delins(X)6] is the largest library containing ˜2.8×1014 members, as codons T65_Y66_G67 and S202_T203_Q204 were deleted and twelve degenerate codons inserted, six degenerate codons inserted at position of T65_Y66_G67 and a further six degenerate codons inserted at position of S202 T203_Q204 (
The methylated primers used to generate the Methylated InDel Fragments 1, 2 and 3 were designed to delete codons T65_Y66_G67 between Methylated InDel Fragments 1 and 2, while also deleting codons S202 T203_Q204 between Methylated InDel Fragments 2 and 3. FspEI digestion of Methylated InDel Fragments 1, 2 and 3 then removes a further 12/16 nucleotides from the 5-methylcytosine and generates the Digested InDel Fragments 1, 2 and 3. Once codons T65_Y66_G67 have been deleted, seven InDel duplexes (InDel Duplex Pool 1) are used to insert a series of 0 to 6 consecutive and degenerate codons, a further seven InDel duplexes (InDel Duplex Pool 2) are used to insert a second series of 0 to 6 consecutive and degenerate codons at the deleted S202 T203_Q204 codons (
Sequencing and Data Analysis
Sanger sequencing was performed by Eurofins Genomics (Koln, Germany), samples were prepared by PCR using Q5 DNA Polymerase, purified using SPRI beads, eluted in water then quantified by Qubit. Sanger sequencing samples were prepared according to the manufacturer's instructions before shipping. NGS library QC and sequencing were performed by the Cambridge Genome Centre (Cambridge, UK) on an Illumina NextSeq using a NextSeq 500/550 High Output Kit v2.5 (150 Cycles). NGS Libraries were prepared by PCR using Q5 DNA Polymerase to add sequencing primers and individual barcodes. NGS Libraries were isolated on an agarose gel and purified using the GeneJET Gel Extraction kit. The NextSeq FASTQ files were quality filtered and trimmed using cutadapt with custom Adapter 1a, Adapter 1b, Adapter 2a and Adapter 2b sequences (
To compare the geometric synthesis methods of the present disclosure to standard and widely used phosphoramidite synthesis methods, the geometric synthesis methods of the present disclosure and phosphoramidite synthesis methods were used to synthesize a series of 300 nucleotide-long target nucleic acid molecules. Different target nucleic acid molecules were designed with different characteristics to determine the impact that the target nucleic acid sequence has on the efficiency and accuracy of both the geometric synthesis methods of the present disclosure and standard phosphoramidite methods.
The products synthesized by both methods were analyzed using next-generation sequencing methods. The analysis of the next-generation sequencing methods was performed by sampling 100,000 quality trimmed, paired-end reads for each synthesized target nucleic acid and mapping this data to the desired, reference sequences. Overlapping regions from the pair-end reads were removed before synthesis accuracy was determined.
As shown in
Phosphoramidite synthesis and the geometric synthesis methods of the present disclosure were used to synthesize a target nucleic acid that had a GC content that increased from 40% to 60% along the length of the target as measured using a sliding window of 50 nucleotides (herein referred to as a 40%→60% GC target). As shown in Table 6 and
Phosphoramidite synthesis and the geometric synthesis methods of the present disclosure were used to synthesize a target nucleic acid that contained 10-nucleotide long T and C homopolymeric regions (herein referred to as a T & C homopolymer target). As shown in Table 6 and
Phosphoramidite synthesis and the geometric synthesis methods of the present disclosure were used to synthesize a target nucleic acid that contained six variable nucleotides N1 to N6 at specific locations within the target sequence (herein referred to as a N1 to N6 target). As shown in Table 6 and
Summary of Examples 3A-3C
As shown in Table 6, on average, 99.7 t 0.1% of the products of the geometric synthesis methods of the present disclosure aligned to their reference (target) sequences as compared to only 96.4±1.7% of the phosphoramidite HAE products. Furthermore, 85.3±3.4% of the geometric synthesis products were the correct full-length, while only 22.7±8.9% of the phosphoramidite HAE products were full-length. The yields of 85.3% and 22.7% full-length product indicated a coupling efficiency of >99.9% and 99.5% for geometric synthesis and phosphoramidite synthesis, respectively, indicating that the analysis was robust. Thus, these results indicate that the geometric synthesis methods of the present disclosure are superior to the standard phosphoramidite synthesis methods.
Phosphoramidite synthesis and the geometric synthesis methods of the present disclosure were used to synthesize a target nucleic acid that had a GC content that increased from 10% to 90% along the length of the target as measured using a sliding window of 50 nucleotides (herein referred to as a 10%→90% GC target), as shown in
In Tables 8-10, the Number of Reads refers to the number of quality trimmed pair-end reads (Trim Galore), which were then used in the alignments; the Alignment % refers to the percent of concordantly aligned sequences from quality trimmed paired-end reads (Bowtie 2); the Average Overlap (bp) refers to the number of base pairs (bp) on average which overlapped (and were removed) from the aligned paired-end reads (clipOverlap); the Full-length Reads % refers to the percent of aligned reads with the target size of 300 nucleotides; and the Coupling Efficiency % refers to the equivalent nucleotide coupling efficiency based on yield of full-length sequences, where yield=(coupling efficiency{circumflex over ( )}(length-1)).
Phosphoramidite synthesis and the geometric synthesis methods of the present disclosure were used to synthesize a target nucleic acid that had a GC content that increased from 20% to 80% along the length of the target as measured using a sliding window of 50 nucleotides (herein referred to as a 20%→80% GC target), as shown in
Phosphoramidite synthesis and the geometric synthesis methods of the present disclosure were used to synthesize a target nucleic acid that had a GC content that increased from 30% to 70% along the length of the target as measured using a sliding window of 50 nucleotides (herein referred to as a 30%→70% GC target), as shown in
Phosphoramidite synthesis and the geometric synthesis methods of the present disclosure were used to synthesize a target nucleic acid that had a GC content that increased from 40%/c to 60% along the length of the target as measured using a sliding window of 50 nucleotides (herein referred to as a 40%→60% GC target), as shown in
Phosphoramidite synthesis and the geometric synthesis methods of the present disclosure were used to synthesize a target nucleic acid that had a GC content that increased from 50% to 50% along the length of the target as measured using a sliding window of 50 nucleotides (herein referred to as a 50%→50% GC target), as shown in
Phosphoramidite synthesis and the geometric synthesis methods of the present disclosure were used to synthesize a target nucleic acid that contained six variable nucleotides N1 to N6 and had a GC content that of about 50% along the length of the target as measured using a sliding window of 50 nucleotides (herein referred to as a N1 to N6 target), as shown in
Phosphoramidite synthesis and the geometric synthesis methods of the present disclosure were used to synthesize a target nucleic acid that contained 10-nucleotide long A and G homopolymeric regions and had a GC content that of about 50% along the length of the target as measured using a sliding window of 50 nucleotides (herein referred to as an A & G homopolymer target), as shown in
Phosphoramidite synthesis and the geometric synthesis methods of the present disclosure were used to synthesize a target nucleic acid that contained 10-nucleotide long A and G homopolymeric regions and had a GC content that of about 50% along the length of the target as measured using a sliding window of 50 nucleotides (herein referred to as a T & C homopolymer target), as shown in
Phosphoramidite synthesis and the geometric synthesis methods of the present disclosure were used to synthesize a target nucleic acid that contained repetitious sequences had a GC content that of about 50% along the length of the target as measured using a sliding window of 50 nucleotides (herein referred to as a T & C homopolymer target), as shown in
The following is an example describing the use of the double-stranded geometric synthesis methods of the present disclosure to de novo synthesis an entire 2.7 kb plasmid.
The double-stranded geometric synthesis methods of the present disclosure were used to de novo synthesis the plasmid pUC19, which is a high-copy number plasmid used in bacteria. The pUC19 plasmid included an ampicillin resistance gene and a multiple cloning site that spans the LacZ gene, permitting the screening of bacteria that contain the pUC19 plasmid using blue-white screening to determine plasmids that contain DNA within the multiple cloning site. As part of the double-stranded synthesis, a coding sequence encoding for the amino acids CAMENA were added.
The results described in this example demonstrate that, unlike existing DNA assembly methods, the double-stranded geometric synthesis methods of the present disclosure can be used to generate gene-length DNA fragments, including those as long as 2.7 kB with high fidelity and high purity.
The following example describes the use of highly specific 4-mer overhangs in the double-stranded geometric synthesis methods of the present disclosure, and the ability of these 4-mer overhangs to ensure high fidelity of individual ligation reactions within the entire assembly reaction. By ensuring high fidelity of the individual ligation reactions, these specific 4-mer overhangs allow the geometric synthesis methods of the present disclosure to be used to make DNA molecules whose lengths are comparable to the lengths of human genes, which is not currently feasible using existing DNA assembly methodologies or existing phosphoramidite synthesis methodologies.
To determine the optimal 4-mer overhangs for use in the methods of the present disclosure, double-stranded geometric synthesis assemblies of the pUC19 plasmid were analyzed.
To generate the pUC19 plasmid using the double-strand geometric synthesis methods of the present disclosure, the pUC19 plasmid was divided into double-stranded fragments comprising two, 5′ overhangs. Each 5′ overhang comprised a 4 nucleotide, “4-mer” sequence. The 4-mer sequences of the overhangs were selected to exclude self-recognizing sites such as ACGT.
During the assembly process, the results of each ligation was analyzed using agarose gels to score the outcome of each sub-assembly after two sequential rounds of ligation. Thus, each experiments considers four double-stranded nucleic acid fragments, A, B, C and D, which are initially ligated to form AB and CD, and then are ligated to form the new ABCD fragment (
To determine the best 4-mers for using the 5′ overhangs of the fragments being ligated, the outcomes of 247 different experiments were analyzes, covering 170 different 4-mer sites. The 4-mers were analyzed in sets of 5, called “4-mer quintuplets”, as each set of two-round ligations (as shown in
The results of the analysis showed that of the 247 different experiments analyzed, 123 of the 4-mer quintuplets resulted in a “Good” outcome. Further analysis showed that these 123 experiments comprised 58 unique 4-mer quintuplets. Of these 58 unique 4-mer quintuplets, only 27 of the 4-mer quintuplets exhibited only “Good” outcomes (see Table 11).
That is, in all of the experiments that used one of these 27 4-mer quintuplets, each of the experiments resulted in the generation of the proper product. These 27 4-mer quintuplets are shown in Table 2. The remaining 31 unique 4-mer quintuplets (58−27=31) exhibited either “short”, “single”, “double” or “concatemer” outcomes when tested in other experiments.
In addition to positive outcome producing combination of sites, the data also reveal unsatisfactory site combinations. Analysis of the four not-‘Good’ outcome classes shows, firstly that there are proportionally fewer negative outcome combinations. For the ‘Short’ outcome we found that of 16 original experiments, 12 were unique and 6 were ‘Short’-only, three of the remaining 6 had ‘Good’ matches as quintuplets and the final 3 had many ‘Good’ matches as triplets. For the ‘Single’ outcome we found that of the 22 original experiments, 20 were unique and 5 were ‘Single’-only. For the ‘Double’ outcome we found that of the 32 original experiments, 25 were unique and 7 were Double-only. Finally, form the ‘Concatemer’ outcome, which is the most abundant negative outcome, there were 38 unique experiments of the 54 total and there were 22 ‘Concatemer’-only quintuplets (see Tables 12-15).
In the two rounds of ligation that are shown in
Thus, the results of this example demonstrate that sets of double-stranded nucleic acid fragments comprising 5′ overhangs that comprise the 4-mer quintuplets listed in Table 2 or the 4-mer triplets listed in Table 1 display unexpected and superior in that they can be used in highly efficient and highly accurate ligations reactions within a double-stranded geometric assembly reaction.
The following example describes the derivation of optimal 4-mers for use in the geometric synthesis methods of the present disclosure.
To determine 4-mers that demonstrate increased fidelity (i.e. the percentage of the time the 4-mer correctly hybridizes and ligates to another nucleic acid molecule comprising the complementary 4-mer as opposed to a fragment comprising a mismatched 4-mer) and yield (i.e. the frequency of ligation events) in ligation reactions in the geometric synthesis methods of the present disclosure, the large-scale ligation data presented in Potapov et al. (“Comprehensive Profiling of Four Base Overhang Ligation Fidelity by T4 DNA Ligase and Application to DNA assembly,” ACS Synthetic Biology, 2018, 7, 11, 2665-2674) was further analyzed. As shown in Table 16, for 256 different 4-mers tested, the number of ligation events was analyzed to determine how many of these ligation events were matched (i.e. to a fragment with a complementary 4-mer overhang; ‘Total Matched Ligations Observed’) and how many mismatched ligation evens were observed (i.e. to a fragment with a non-complementary 4-mer overhang; ‘Total Mismatch Ligations observed’). A fidelity percentage was then determined by ‘Total Matched Ligations Observed’ by the ‘Total Ligations Observed’ (matched+mismatched). Additionally, for each of the 4-mers, the top three non-complementary 4-mers that the 4-mer mismatched with were determined, along with the percentage of the mismatches that corresponded to each of the top three 4-mer mismatches (see Table 17).
The 4-mers that demonstrated high fidelity and/or yield were further selected for use in the methods of the present disclosure. These 4-mers are presented in Tables 3, 4 and 5.
This application claims priority to and the benefit of U.S. Provisional Application No. 62/902,729, filed Sep. 19, 2019, and U.S. Provisional Application No. 62/923,920, filed Oct. 21, 2019. The contents of each of the aforementioned patent applications are incorporated herein by reference in their entireties.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2020/051838 | 9/21/2020 | WO |
Number | Date | Country | |
---|---|---|---|
62902729 | Sep 2019 | US | |
62923920 | Oct 2019 | US |