Gene synthesis is a broadly enabling technology for life science research and health care. While the cost of DNA sequencing has dropped by five orders of magnitude in the past decade, DNA synthesis remains expensive for many applications. Although DNA microarrays have decreased the cost of oligonucleotide synthesis, the use of array synthesized oligos in practice is limited by short synthesis lengths, high synthesis error rates, low yield and the challenges of assembling long constructs from complex pools.
Recognized herein is a need for a cheap, controlled, and high quality, high throughput way to assemble or synthesize a pool of long polynucleotides from relatively short oligo fragments. The pool of polynucleotides of interest can be assembled or synthesized from various fragments in a same mixture without non-specific linkages. The present disclosure provides compositions and methods for assembling or synthesizing polynucleotides of interest using a large number (e.g., hundreds, thousands or more) of designed connector sequences (also referred to as Zips in this disclosure). The polynucleotides of interest assembled or synthesized herein can be any sequences of interest. The polynucleotides of interest assembled or synthesized herein can be a functional genetic element not limited to a gene or a protein-coding sequence.
In an aspect, the present disclosure provides a method of synthesizing a plurality of n different polynucleotides (where n is equal to or greater than 2) in a mixture, wherein
In some embodiments, generating the third mixture of at least n polynucleotides comprises specifically linking, for each i, the ZipAi sequence and the ZipBi sequence. In some embodiments, generating the fifth mixture of at least n polynucleotides comprises specifically linking, for each i, the ZipCi sequence and the ZipABi sequence.
In another aspect, the present disclosure provides a method of synthesizing a plurality of n different polynucleotides (where n is equal to or greater than 2) in a mixture, wherein
In another aspect, the present disclosure provides a method of synthesizing a plurality of n different polynucleotides (where n is equal to or greater than 2) in a mixture, wherein
In some embodiments, the SeqAi sequence, the SeqBi sequence and the SeqCi sequence of the Seqi sequence are linked seamlessly (e.g., without any intervening sequences). In some embodiments, the Seqi sequence comprises the SeqAi sequence and the SeqBi sequence without an intervening sequence in between the SeqAi sequence and the SeqBi sequence.
In some embodiments, the Seqi sequence comprises the SeqBi sequence and the SeqCi sequence without an intervening sequence in between the SeqBi sequence and the SeqCi sequence. In some embodiments, the Seqi sequence with an intervening sequence in between the SeqAi sequence and the SeqBi sequence or the SeqBi sequence and the SeqCi sequence is not a functional genetic element. In some embodiments, the Seqi sequence comprises the SeqAi sequence, the SeqBi sequence and the SeqCi sequence sequentially from the 5′ end to the 3′ end.
In some embodiments, for each i, the ZipAi sequence and the ZipBi sequence are connector sequences for specifically linking the SeqAi sequence and the SeqBi sequence. In some embodiments, for each i, the ZipCi sequence and the ZipABi sequence are connector sequences for specifically linking the SeqCi sequence and a sequence comprising the SeqAi sequence and the SeqBi sequence. In some embodiments, for each i, the ZipAi sequence and the ZipBi sequence are a same nucleic acid sequence. In some embodiments, for each i, the ZipAi sequence and the ZipBi sequence are complementary. In some embodiments, for each i, the ZipAi sequence and the ZipBi sequence are different nucleic acid sequences. In some embodiments, for each i, the ZipABi sequence is a same nucleic acid sequence as the ZipAi sequence or the ZipBi sequence. In some embodiments, for each i, the ZipABi sequence, the ZipAi sequence, and the ZipBi sequence are a same nucleic acid sequence. In some embodiments, for each i, the ZipABi sequence is a different nucleic acid sequence from the ZipAi sequence or the ZipBi sequence.
In some embodiments, the ZipABi sequence, the ZipAi sequence, and the ZipBi sequence are different nucleic acid sequences. In some embodiments, the ZipCi sequence and the ZipABi sequence are a same nucleic acid sequence. In some embodiments, the ZipCi sequence and the ZipABi sequence are complementary. In some embodiments, the ZipCi sequence and the ZipABi sequence are different nucleic acid sequences. In some embodiments, for each i, the ZipAi sequence, the ZipBi sequence, the ZipABi sequence, or the ZipCi sequence is from 5 nucleotides to 200 nucleotides in length. In some embodiments, for each i, the SeqAi sequence, the SeqBi sequence, or the SeqCi sequence is from 5 nucleotides to 5,000 nucleotides in length.
In some embodiments, generating the third mixture of at least n polynucleotides comprises linking, for each i, the ZipAi sequence and the ZipBi sequence. In some embodiments, for each i, the ZipAi sequence hybridizes to the ZipBi sequence. In some embodiments, the method further comprises extending a free 3′ end of the ZipAi sequence or the ZipBi sequence using the ith polynucleotide from the first mixture or the second mixture as a template to generate the ith polynucleotide comprising the SeqAi sequence, the SeqBi sequence and the ZipABi sequence. In some embodiments, for each i, the ith polynucleotide of the third mixture further comprises an Operator sequence that is a primer binding site. In some embodiments, the Operator sequence is a same sequence among the third mixture of at least n polynucleotides. In some embodiments, the method further comprises removing the Operator sequence. In some embodiments, removing comprises using an enzyme to degrade the Operator sequence.
In some embodiments, the method further comprises circularizing the ith polynucleotide comprising the SeqAi sequence, the SeqBi sequence and the ZipABi sequence to generate a circularized polynucleotide. In some embodiments, circularizing the ith polynucleotide comprising the SeqAi sequence, the SeqBi sequence and the ZipABi sequence comprises circularizing the ith polynucleotide by a ligase. In some embodiments, the method further comprises linearizing the circularized polynucleotide. In some embodiments, linearizing the circularized product comprises cutting the circularized polynucleotide or amplifying the circularized polynucleotide using polymerase chain reaction (PCR). In some embodiments, linearizing the circularized product such that the ZipABi sequence is not flanked by the SeqAi sequence and the SeqBi sequence. In some embodiments, the method further comprises exposing the ZipABi sequence on a terminus of the ith polynucleotide comprising the SeqAi sequence, the SeqBi sequence and the ZipABi sequence. In some embodiments, the ZipABi sequence is not flanked by the SeqAi sequence and the SeqBi sequence. In some embodiments, the ZipABi sequence is at a terminus of the ith polynucleotide comprises the SeqAi sequence, the SeqBi sequence and the ZipABi sequence.
In some embodiments, generating the fifth mixture of at least n polynucleotides comprises linking, for each i, the ZipCi sequence and the ZipABi sequence. In some embodiments, linking comprising hybridizing the ZipCi sequence and the ZipABi sequence. In some embodiments, the method further comprises repeating operations above for the third mixture of n polynucleotides and the fourth mixture of n polynucleotides, thereby generating the fifth mixture of n polynucleotides. In some embodiments, the method further comprises removing the ZipCi sequence and the ZipABi sequence, thereby generating the ith polynucleotide comprises the Seqi sequence comprising the SeqAi sequence, the SeqBi sequence and the SeqCi sequence.
In some embodiments, the method further comprises, prior to (a) or (b), providing a pool of polynucleotides comprising the at least n polynucleotides of the first mixture, the at least n polynucleotides of the second mixture, and/or the at least n polynucleotides of the fourth mixture. In some embodiments, the method further comprises amplifying the at least n polynucleotides of the first mixture, the at least n polynucleotides of the second mixture, and/or the at least n polynucleotides of the fourth mixture from the pool to generate double-stranded polynucleotides. In some embodiments, only the at least n polynucleotides of the first mixture, the at least n polynucleotides of the second mixture, or the at least n polynucleotides of the fourth mixture are amplified from the pool. In some embodiments, the method further comprises removing an Operator sequence from the double-stranded polynucleotides, and wherein the Operator sequence is a primer binding site. In some embodiments, degrading one strand of the double-stranded polynucleotides to generate the at least n polynucleotides of the first mixture, the at least n polynucleotides of the second mixture, and/or the at least n polynucleotides of the fourth mixture.
In another aspect, the present disclosure provides a method of synthesizing a plurality of n different polynucleotides (where n is equal to or greater than 2) in a same mixture, wherein
In some embodiments, in the second subpopulation, the ZipBi sequence is a ZipB1i sequence, and the ith polynucleotide further comprises a ZipB2i sequence. In some embodiments, the SeqBi sequence is located in between the ZipB1i sequence and the ZipB2i sequence. In some embodiments, the ZipB1i sequence is located in between the SeqBi sequence and the ZipB2i sequence. In some embodiments, the ZipB2i sequence is located in between the SeqBi sequence and the ZipB1i sequence.
In some embodiments, the SeqAi sequence, the SeqBi sequence and the SeqCi sequence of the Seqi sequence are linked seamlessly. In some embodiments, the Seq, sequence comprises the SeqAi sequence and the SeqBi sequence without an intervening sequence in between the SeqAi sequence and the SeqBi sequence. In some embodiments, the Seqi sequence comprises the SeqBi sequence and the SeqCi sequence without an intervening sequence in between the SeqBi sequence and the SeqCi sequence. In some embodiments, the Seqi sequence comprises the SeqAi sequence, the SeqBi sequence and the SeqCi sequence sequentially from the 5′ end to the 3′ end.
In some embodiments, the ZipAi sequence and the ZipBi sequence are connector sequences for specifically linking the SeqAi sequence and the SeqBi sequence. In some embodiments, the ZipB2i sequence and the ZipCi sequence are connector sequences for specifically linking the SeqBi sequence and the SeqCi sequence. In some embodiments, for each i, the ZipAi sequence and the ZipBi sequence are a same nucleic acid sequence. In some embodiments, for each i, the ZipAi sequence and the ZipBi sequence are complementary. In some embodiments, for each i, the ZipAi sequence and the ZipBi sequence are different nucleic acid sequences. In some embodiments, for each i, the ZipB2i sequence and the ZipCi sequence are a same nucleic acid sequence. In some embodiments, for each i, the ZipB2i sequence and the ZipCi sequence are complementary. In some embodiments, for each i, the ZipB2i sequence and the ZipCi sequence are different nucleic acid sequences. In some embodiments, for each i, the ZipAi sequence, the ZipBi sequence, the ZipB1i sequence, the ZipB2i sequence, or the ZipCi sequence is from 5 nucleotides to 200 nucleotides in length. In some embodiments, for each i, the SeqAi sequence, the SeqBi sequence, or the SeqCi sequence is from 5 nucleotides to 5,000 nucleotides in length.
In some embodiments, the method further comprises specifically linking (i) the ZipAi sequence and the ZipB1i sequence, and/or (ii) the ZipB2i sequence and the ZipCi sequence. In some embodiments, linking comprising hybridizing (i) the ZipAi sequence and the ZipB1i sequence, and/or (ii) the ZipB2i sequence and the ZipCi sequence.
In some embodiments, the method further comprises generating a plurality of intermediate products, wherein an ith intermediate product of the plurality comprises the SeqAi sequence, the ZipAi sequence (or the ZipB1i sequence), the SeqBi sequence, the ZipCi sequence (or the ZipB2i sequence) and the SeqCi sequence.
In some embodiments, the ith intermediate product of the plurality comprises the SeqAi sequence, the ZipAi sequence (or the ZipB1i sequence), the SeqBi sequence, the ZipCi sequence (or the ZipB2i sequence) and the SeqCi sequence sequentially from 5′ end to 3′ end.
In some embodiments, the method further comprises removing the ZipAi sequence (or the ZipB1i sequence) and the ZipCi sequence (or the ZipB2i sequence), thereby generating the Seqi sequence comprising the SeqAi sequence, the SeqBi sequence and the SeqCi sequence without any intervening sequence. In some embodiments, removing comprises using a DNA tweezer. In some embodiments, using the DNA tweezer comprises degrading one strand of the ZipAi sequence or the ZipCi sequence region, and using a staple strand to hybridize with regions flanking the ZipAi sequence or the ZipCi sequence on the complementary strand to bring the SeqAi sequence, the SeqBi sequence and the SeqCi sequence region in close proximity for ligation.
In some embodiments, concatenation of the SeqAi sequence and the SeqBi sequence without an intervening sequence is a functional genetic element. In some embodiments, concatenation of the SeqAi sequence, the SeqBi sequence and the SeqCi sequence without any intervening sequence is a functional genetic element. In some embodiments, the functional genetic element is a gene, a protein-coding sequence, a promoter, an internal ribosome entry site, a ribozyme, an aptamer, a nucleic acid sequence that is capable of being specifically bound by a transposase, a guide ribonucleic acid (gRNA) for CRISPR/Cas9 based gene editing, a primer-extension gRNA for prime editing, or any combination thereof. In some embodiments, the functional genetic element does not comprise a sequence that is identical to the ZipAi sequence, the ZipBi sequence, the ZipABi sequence or the ZipCi sequence. In some embodiments, for each i, the SeqAi sequence, the SeqBi sequence, and/or the SeqCi sequence are uniquely or specifically linked. In some embodiments, a plurality of at least 2, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 500, 1,000, 5,000 or more polynucleotides are synthesized. In some embodiments, each polynucleotide of the plurality synthesized is from about 15 to about 15,000 nucleotides in length.
In another aspect, the present disclosure provides a method of synthesizing a plurality of n different polynucleotides (where n is equal to or greater than 2) in a mixture, wherein
In some embodiments, for each i, the ZipAi sequence and the ZipBi sequence are connector sequences for specifically linking the SeqAi sequence and the SeqBi sequence. In some embodiments, generating the third mixture in (c) comprises specifically linking, for each i, the ZipAi sequence and the ZipBi sequence. In some embodiments, linking comprising hybridizing, for each i, the ZipAi sequence and the ZipBi sequence. In some embodiments, the method further comprises extending a free 3′ end of the ZipAi sequence or the ZipBi sequence using the ith polynucleotide from the first mixture or the second mixture as a template to generate the ith polynucleotide comprising the SeqAi sequence, the SeqBi sequence and the ZipABi sequence. In some embodiments, the method further comprises generating an intermediate product comprising the SeqBi sequence, the ZipABi sequence and the SeqAi sequence sequentially from 5′ end to 3′ end.
In some embodiments, the method further comprises contacting the third mixture with a fourth mixture of at least n polynucleotides, wherein an ith polynucleotide of the fourth mixture comprises a SeqCi sequence and a ZipCi sequence, and wherein the ZipCi sequence is different from a ZipCi sequence when i≠j.
In some embodiments, the method further comprises generating a fifth mixture of at least n polynucleotides, wherein an ith polynucleotide comprises the Seqi sequence comprising the SeqAi sequence and the SeqBi sequence and further comprising the SeqCi sequence.
In some embodiments, the SeqBi sequence and the SeqCi sequence are linked without an intervening sequence.
In another aspect, the present disclosure provides a method of synthesizing a plurality of n different polynucleotides (where n is equal to or greater than 2) in a mixture, wherein
In some embodiments, for each i, the ZipAi sequence and the ZipBi sequence are a same nucleic acid sequence. In some embodiments, for each i, the ZipAi sequence and the ZipBi sequence are complementary. In some embodiments, for each i, the ZipABi sequence is a same nucleic acid sequence as the ZipAi sequence or the ZipBi sequence. In some embodiments, the SeqAi sequence, the SeqBi sequence and the SeqCi sequence are specifically linked without any intervening sequences. In some embodiments, concatenation of the SeqAi sequence, the SeqBi sequence and the SeqCi sequence without any intervening sequence is a functional genetic element.
In another aspect, the present disclosure provides a composition comprising a mixture described herein. In some cases, the composition comprises the first mixture, the second mixture, or the third mixture described herein.
In another aspect, the present disclosure provides a composition for synthesizing a plurality of n different polynucleotides, comprising:
In some embodiments, the first mixture and the second mixture are within a same compartment or a same mixture.
In some embodiments, the ZipAi sequence and the ZipBi sequence are specifically linked. In some embodiments, the ZipAi sequence and the ZipBi sequence are hybridized. In some embodiments, the ZipAi sequence and the ZipBi sequence are a same nucleic acid sequence, complementary nucleic acid sequences, or different nucleic acid sequences.
In some embodiments, the composition further comprises a third mixture of at least n polynucleotides, where an ith polynucleotide comprises a SeqCi sequence and a ZipCi sequence, and wherein the ZipCi sequence is different from a ZipCi sequence of a jth polynucleotide when i≠j.
In some embodiments, for each of i, concatenation of the SeqAi sequence, the SeqBi sequence, and the SeqCi sequence without any intervening sequence is a functional genetic element.
In some embodiments, the ZipCi sequence is a connector sequence for linking the SeqCi sequence and a sequence comprising the SeqAi sequence and the SeqBi sequence. In some embodiments, the ZipCi sequence is a same nucleic acid sequence as the ZipAi sequence or the ZipBi sequence. In some embodiments, the ZipCj sequence is a different nucleic acid sequence from the ZipAi sequence or the ZipBi sequence. In some embodiments, the ZipCi sequence, the ZipAi sequence and the ZipBi sequence are a same nucleic acid sequence.
In some embodiments, the first mixture, the second mixture and the third mixture are within a same compartment or a same mixture.
In some embodiments, the functional genetic element comprises a gene, a protein-coding sequence, a promoter, an internal ribosome entry site, a ribozyme, an aptamer, a nucleic acid sequence that is capable of being specifically bound by a transposase, a guide ribonucleic acid (gRNA) for CRISPR/Cas9 based gene editing, and/or a primer-extension gRNA for prime editing.
In another aspect, the present disclosure provides a composition comprising a polynucleotide having at least two double-stranded regions separated by a single-stranded region, wherein the at least two double-stranded regions comprise a first double-stranded region and a second double-stranded region, wherein the single-stranded region hybridizes with a stable strand such that the single-stranded region forms a loop stabilized by the stable strand to bring a 3′ end of the first double-stranded region and a 5′ end of the second double-stranded region to close proximity, and wherein the first double-stranded region and the second double-stranded region are from a same functional genetic element.
In some embodiments, the functional genetic element is a gene, a protein-coding sequence, a promoter, an internal ribosome entry site, a ribozyme, an aptamer, a nucleic acid sequence that is capable of being specifically bound by a transposase, a guide ribonucleic acid (gRNA) for CRISPR/Cas9 based gene editing, or a primer-extension gRNA for prime editing. In some embodiments, the 3′ end of the first double-stranded region and the 5′ end of the second double-stranded region are in close proximity for ligation. In some embodiments, the 3′ end of the first double-stranded region and the 5′ end of the second double-stranded region are joined. In some embodiments, the 3′ end of the first double-stranded region and the 5′ end of the second double-stranded region are ligated. In some embodiments, the single-stranded region comprises, from 5′ to 3′, a first segment and a second segment, and the stable strand comprises, from 5′ to 3′, a third segment and a fourth segment, and wherein the first segment hybridizes with the third segment and the second segment hybridizes with the fourth segment. In some embodiments, the method further comprises a plurality of polynucleotides, each having at least two double-stranded regions separated by a single-stranded region, wherein the at least two double-stranded regions comprise a first double-stranded region and a second double-stranded region, wherein the single-stranded region hybridizes with a stable strand such that the single-stranded region forms a loop stabilized by the stable strand to bring a 3′ end of the first double-stranded region and a 5′ end of the second double-stranded region to close proximity, and wherein the first double-stranded region and the second double-stranded region are from a same functional genetic element. In some embodiments, each polynucleotide of the plurality is a different functional genetic element. In some embodiments, the polynucleotide comprises three double-stranded regions separated by two single-stranded regions, each single-stranded region hybridizing with a stable strand. In some embodiments, the three double-stranded regions are from a same functional genetic element.
Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in this art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. To the extent publications and patents or patent applications incorporated by reference contradict the disclosure contained in the specification, the specification is intended to supersede and/or take precedence over any such contradictory material.
The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings (also “Figure”, “Fig.”, and “FIGURE” herein) of which:
In this disclosure, the use of the singular includes the plural unless specifically stated otherwise. Also, the use of “or” means “and/or” unless stated otherwise. Similarly, “comprise,” “comprises,” “comprising” “include,” “includes,” and “including” are not intended to be limiting.
The term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within 1 or more than 1 standard deviation, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, up to 10%, up to 5%, or up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, e.g., within 5-fold, or within 2-fold, of a value. Where particular values are described in the application and claims, unless otherwise stated the term “about” meaning within an acceptable error range for the particular value should be assumed.
Whenever the term “at least,” “greater than,” or “greater than or equal to” precedes the first numerical value in a series of two or more numerical values, the term “at least,” “greater than” or “greater than or equal to” applies to each of the numerical values in that series of numerical values. For example, greater than or equal to 1, 2, or 3 is equivalent to greater than or equal to 1, greater than or equal to 2, or greater than or equal to 3.
Whenever the term “no more than,” “less than,” or “less than or equal to” precedes the first numerical value in a series of two or more numerical values, the term “no more than,” “less than,” or “less than or equal to” applies to each of the numerical values in that series of numerical values. For example, less than or equal to 3, 2, or 1 is equivalent to less than or equal to 3, less than or equal to 2, or less than or equal to 1.
The terms “polynucleotide”, “nucleic acid” and “oligonucleotide” are used interchangeably in the present disclosure. They can refer to a polymeric form of nucleotides of various length. They may comprise deoxyribonucleotides and/or ribonucleotides, or analogs thereof. A polynucleotide may include one or more nucleotides selected from adenosine (A), cytosine (C), guanine (G), thymine (T) and uracil (U), or variants thereof. A nucleotide can include a nucleoside and at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more phosphate (PO3) groups. A nucleotide can include a nucleobase, a five-carbon sugar (either ribose or deoxyribose), and one or more phosphate groups. A polynucleotide may have any three-dimensional structure and may perform various functions. A polynucleotide can have various configurations, such as linear, circular, stem-loop, and branched. The following are non-limiting examples of polynucleotides: coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro-RNA (miRNA), circular RNA, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers. A polynucleotide may comprise one or more modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure may be imparted before or after assembly of the polymer. The sequence of nucleotides may be interrupted by non-nucleotide components. A polynucleotide may be further modified after polymerization, such as by conjugation with a labeling component. Polynucleotides may include one or more nucleotide variants, including nonstandard nucleotide(s), non-natural nucleotide(s), nucleotide analog(s) and/or modified nucleotides.
The term “sequence,” as used herein, refers to the order of nucleotides in a nucleic acid molecule, or the order of amino acid residues of a peptide. A nucleic acid sequence can be a deoxyribonucleic acid (DNA) sequence or ribonucleic acid (RNA) sequence; can be linear, circular or branched; and can be either single-stranded or double-stranded. A sequence can be mutated such that it is different from a reference sequence (e.g., wildtype sequence). A sequence can be of any length, for example, between 2 and 1,000,000 or more amino acids or nucleotides in length (or any integer value there between or there above), e.g., between about 100 and about 10,000 nucleotides or between about 200 and about 500 amino acids or nucleotides. Any given nucleic acid sequence can encompass the sequence information of the given nucleic acid sequence and a reverse complement sequence of the given nucleic acid sequence. In some cases, a DNA sequence can encompass the sequence information of the corresponding RNA sequence that is transcribed from the DNA. The sequence can be alphabetical representation of a polynucleotide or polypeptide molecule. The sequence can be a piece of information that can be used by a computer processor. In some cases, the nucleic acid sequence may be used to refer to the physical nucleic acid molecule itself.
The term “blunt end,” as used herein, refers to an end of a double-stranded nucleic acid molecule wherein substantially all of the nucleotides in the end of one strand of the nucleic acid molecule are base paired with opposing nucleotides in the other strand of the same nucleic acid molecule. A nucleic acid molecule is not blunt ended if it has an end that includes a single-stranded portion having at least one nucleotide in length, referred to herein as an “overhang” or “sticky end.”
The terms “link” or “connect” are used interchangeably in the present disclosure. They refer to physically linking two or more nucleic acid molecules. The two or more nucleic acid molecules may be linked such that the two or more nucleic acid molecules form a continuous nucleic acid molecule. The two or more nucleic acid molecules can be covalently linked or non-covalently linked. Linking may be accomplished in a variety of manners, including formation of hydrogen bonds, ionic and covalent bonds, or van der Wals forces.
Percent (%) sequence identity with respect to a reference nucleic acid sequence (or peptide sequence) is the percentage of nucleotides (or amino acid residues in case of peptide sequence) in a candidate sequence that are identical with the nucleotides (or amino acid residues) in the reference nucleic acid sequence (or peptide sequence), after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity, and not considering any conservative substitutions as part of the sequence identity. Alignment for purposes of determining percent sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software such as BLAST, BLAST-2, CLUSTALW, ALIGN or Megalign (DNASTAR) software. Those skilled in the art can determine appropriate parameters for aligning sequences, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared.
The term “substantially the same” and its grammatical equivalents as applied to nucleic acid or amino acid sequences mean that a nucleic acid or amino acid sequence comprises a sequence that has at least 90% sequence identity or more, at least 95%, at least 98% or at least 99%, compared to a reference sequence using the programs described above, e.g., BLAST, using standard parameters. For example, the BLASTN program (for nucleotide sequences) uses as defaults a word length (W) of 11, an expectation (E) of 10, M=5, N=−4, and a comparison of both strands. For amino acid sequences, the BLASTP program uses as defaults a word length (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix (see Henikoff & Henikoff, Proc. Natl. Acad. Sci. USA 89:10915 (1992)).
Domain-level description of sequence: in the present disclosure, the polynucleotide sequence may be described at domain level. Each domain name can correspond to a specific polynucleotide sequence. For example, domain ‘A’ may have a sequence of 5′-TATTCCC-3′, domain ‘B’ may have a sequence of 5′-AGGGAC-3′, and domain ‘C’ may have a sequence of 5′-GGGAAGA-3′. In this case the polynucleotide having a sequence that is the concatenation of domains A, B, and C, can be written as [A|B|C}. The symbol ‘[’ denotes the 5′ end, the symbol ‘}’ denotes the 3′ end, and the symbol ‘|’ separates domain names. An ssDNA or a section of ssDNA having sequence ‘X’ can be referred to as [X}. An asterisk sign shows sequence complementarity. For example, domain [X*} is the reverse complement of domain [X}. The notation ds[X} can be used to describe a double-stranded DNA formed by [X} and [X*}. In some cases, especially in situation where it is not necessary to distinguish dsDNA and ssDNA, a dsDNA whose one strand has the sequence [X} may also be loosely referred to as [X}. A single-stranded RNA molecule or segment with the sequence identical to [X} (except replacing T with U) may also be referred to as [X}. Depending on the context, the domain name may refer to an exact sequence or describe a general function of a DNA or domain. For example, [RBS} may be used to describe a ribosome binding site, although the exact sequence for [RBS} may vary. Parentheses can be used to group a concatenation of domains, and the reverse-complement operation (denoted by ‘*’) can be applied to the concatenation by adding the ‘*’ following the closing parenthesis. For example, [(X|Y) *} is the same as [Y*|X*}. A double-stranded DNA formed by two strands [X} and [X*} can be written as [X}:[X*}. A double-stranded segment of a double-stranded DNA can be written in similar manner. For example, a dsDNA formed by [X|Y} and [Y*|X*} can be said to have double-stranded segments [X}:[X*} and [Y}: [Y*}. A double-stranded segment [X}:[X*} can also be called “double-stranded DNA [X}” or “dsDNA [X}” without creating ambiguity.
The term “a family of polynucleotides” or “a family of oligos,” as used herein, refers to a collection of polynucleotides that can be treated identically (e.g., subject to the same condition or procedure) in a reaction. A family of polynucleotides can have the same domain organization and only differ in Product Constituents and Zips. For example, in
Product Constituents, Zips and Operators: the oligos (or nucleic acid fragments) used to assemble genes of interest can be designed to contain three types of sequences: Product Constituents, Zips, and Operators. The term “Product Constituent,” as used herein, refers to a sequence that eventually become part of the final product. For example, in the process shown in
The term “Zip,” “Zip domain,” or “Zip sequence” refers to a domain used to guide gene-specific assembly of two or more polynucleotides whose sequence can be arbitrarily designed. Zips can be connector sequences. The term “gene-specific,” as used herein, refers to the fact that when multiple genes (e.g., polynucleotides of interest) are assembled in the same homogenous assembly reaction, the assembly of two or more polynucleotides contributing to the same gene (in the correct order and orientation) is wanted, while assembly of two or more polynucleotides contributing to different genes (regardless of whether the order or orientation is correct) is unwanted. Because the Zip-guided assembly can be gene-specific, Zips used to assemble polynucleotides for different genes may be different. For example, in step R7 of
The term “Operator,” as used herein, refers to a domain used to process a family of polynucleotides in the same way. Operators can have sequences that are common to all polynucleotides in the same family. For example, the domains FA and RA in the family 102 (having the sequence of [FA|Zi|Ai|RA}) can be Operators. Operators may serve different roles. A common role of Operators may be the primer binding site. For example, the domains FA and RA in the family 102 can serve as primer binding sites to amplify all polynucleotides of the family 102. Operators may also contain restriction sites (e.g., Operators ADSL and ADSR, see Example 2 for details. Operators can also be arbitrarily designed using the same process of Zip design, except that the sequence constraints (e.g., a restriction site may be present at a defined position, or the last base of a domain can be dT) need to be considered and implemented during the generation of the initial random sequences.
The letter “n” (italicized or non-italicized), used in the context of a plurality of at least n polynucleotides, donates the total number of polynucleotides of interest to be assembled or synthesized using the methods provided herein. In various embodiments, n is an integer equal to or greater than 2. For example, a plurality of at least n polynucleotides can be two or more polynucleotides. If 1000 polynucleotides of interest are synthesized, then n=1000. As used herein, a given polynucleotide of the plurality being synthesized or a given polynucleotide of a mixture (e.g., a pool or a family of polynucleotides) during the synthesis can be referred to as an ith polynucleotide. For example, the ith polynucleotide can be a first polynucleotide (when i=1), a second polynucleotide (when i=2), a third polynucleotide (when i=3) . . . or a nth polynucleotide (when i=n). Sequences or subsequences (e.g., Constituents, Zips or Operators) used to assemble or synthesize the ith polynucleotide can be denoted with “i” (in various cases, as a subscript) following the name of the sequences or subsequences. For example, Zip sequence of the ith polynucleotide can be denoted as Zipi or Zi. In some cases, another given polynucleotide of the plurality being synthesized or a given polynucleotide of a mixture (e.g., a pool or a family of polynucleotides) during the synthesis can be referred to as a jth polynucleotide. The jth polynucleotide denotes a different polynucleotide from the ith polynucleotide. The Zip sequences of the jth polynucleotide can be different from the Zip sequence of the ith polynucleotide. For example, a mixture can comprise a first polynucleotide comprising Zip1 sequence and a second polynucleotide comprising Zip2 sequence, where Zip1 sequence is different form Zip2 sequence. In other words, if i≠j, then the ith Zip (e.g., Zipi or Zi) and the jth Zip (e.g., Zipj or Zj) have different sequences. For any given polynucleotide within a mixture (e.g., a family of polynucleotides), the Zip sequence can be unique and can be different from any other Zip sequence of any other polynucleotide. As used herein, “i” and “j” can be any integer from 1 to n (the total number of polynucleotides of interest to be synthesized). For example, “i” or “j” can be 1, 2, 3, 4, 5, 6, 7, 8, 9 . . . or n.
The term “Assembly” or “assembly process,” as used herein, refers to a reaction or a series of reactions in which the Product Constituents of two or more polynucleotide molecules are linked to form a continuous (e.g., copiable by a DNA or RNA polymerase) and longer Product Constituent. Each of the individual reaction used to complete an assembly process can be called an assembly reaction. An assembly process may include a ligation reaction or a primer extension reaction. For example, in assembly reactions R7 through R10 of
Gene synthesis is a broadly enabling technology for life science research and health care. Despite of decades of improvement, currently gene synthesis cost can be prohibitive for many applications where thousands of genes need to be synthesized. As a non-limiting example, to find a better-performing version of an industrial enzyme, one may contemplate testing 10,000 naturally existing candidate enzymes whose are homologous to the original enzyme. The coding sequences for the 10,000 candidate enzymes may be found by searching a gene sequence database. However, to test the function of these enzymes, their genes may be synthesized first. In 2021, the typical cost of gene synthesis can be about $0.09 per base pair (bp). Suppose the average length of the candidate enzyme is 3,000 bp (or 3 kb), a total of 30,000,000 bp of genes may be synthesized, costing $2.7 million. Such cost may be prohibitive in many situations or applications.
One breakthrough in the area of gene synthesis over the past decade or so includes high throughput short oligonucleotide (oligo) pool synthesis, where tens of thousands (or more) of short—in some cases, 50 to 300 nucleotide (nt)—oligos can be synthesized on a microarray, cleaved from the microarray and delivered as a pool (or mixture) of oligos. However, to assemble these oligo pools into thousands of long genes in a controlled, high-throughput, and high-quality manner is still an unsolved challenge. The present disclosure can address this challenge by utilizing a large number (in many cases, hundreds, thousands or more) of designed connector domains or sequences, also referred to as Zips in the present disclosure.
In the existing methods, oligos needed to assemble only one gene are used in one assembly reaction or those oligos are mixed in one compartment. For example, if oligos named A1, B1, C1 and D1 are used to assemble gene 1, and oligos A2, B2, C2 and D2 are used to assemble gene 2, one can typically mix oligos A1, B1, C1 and D1 in one reaction (e.g., overlapping PCR) and mix oligos A2, B2, C2 and D2 in a separate reaction. In other words, in the above situations, oligos A2, B2, C2 and D2 are mixed in a different compartment separate from the reaction containing oligos A1, B1, C1 and D1. If all 8 oligos are mixed in one reaction (or one compartment), one oligo belonging to gene 1 may be inadvertently assembled with an oligo belonging to gene 2, leading to an erroneous product. This error can be called cross-gene misassembly. In this manner, if n genes need to be assembled, at least n assembly reactions (which are in separate compartments) need to be set up. This can be tedious and costly when n is large (e.g., n>100). While methods such as DropSynth exists to generate a large number (e.g., millions) of compartments (e.g., droplets), each one of which undergoes a separate assembly reaction (e.g., overlapping PCR), the length, quality, concentration uniformity of the assembled genes may not be satisfactory for most applications. This is partly because the size the contents of the droplets can be hard to precisely control.
In the present disclosure, methods and compositions are provided to assemble n genes with much less than n (e.g., less than n/10, less than n/20, less than n/30, less than n/40, or less than n/50) assembly reactions. Here, each of the assembly reaction may be a homogeneous mixture where any two molecules in the mixture may make contact. In other words, each assembly reaction may happen in one compartment, which may not comprise additional compartments, although these methods in some cases may not preclude creating additional compartments. In the present disclosure, oligos contributing to different genes may be mixed in one homogenous assembly reaction, where cross-gene misassembly can be minimized or prevented by meticulous sequence and reaction design. In the example given above, all 8 oligos (A1, B1, C1, D1, A2, B2, C2 and D2) can be processed in certain way and then mixed in one homogeneous reaction to produce gene 1 and gene 2. In fact, oligos needed to assemble as many as 1,000 or more genes can be processed and mixed in one homogenous assembly reaction to make desired assembly products (e.g.,
In some cases, the overall strategy to reduce the number of assembly reactions can be to manipulate polynucleotides belonging to the same family together (in a series of homogenous reactions), rather than to manipulate polynucleotides belonging to the same gene. For example, if the ith gene requires oligos Ai, Bi, Ci and Di (i=1 to n), all the n polynucleotides Ai (i=1 to n) can be considered one family. Similarly, all the n polynucleotides Bi (i=1 to n) can be considered one family, so on and so forth. Therefore, only four families of polynucleotides may be of concern for assembling the ith gene that requires oligos Ai, Bi, Ci and Di (i=1 to n). A series of 5 to 20 reactions, each containing one or a few families of polynucleotides, may be needed to process each family or several families together. After this series of reactions, all of the n genes can be assembled.
In various embodiments, the polynucleotide assembly reactions provided in the present disclosure can be carried out in a liquid. The polynucleotide assembly reactions provided herein may not be performed on a solid support or a solid surface. The nucleic acid fragments used to assemble various polynucleotides of interest can be soluble in the assembly reactions and may not be fixed on a solid support or a solid surface.
The present disclosure provides methods for synthesizing or assembling a plurality of different polynucleotides of interest from two or more fragments in a mixture (e.g., a same mixture or a single mixture), in the same compartment, or in a single compartment. The methods provided herein may not require microarray or chip for nucleic acid synthesis. The methods provided herein may not require separating fragments for assembling each gene into separate compartments (e.g., in emulsions). The plurality of polynucleotides of interest can be synthesized or assembled in bulk in one compartment. The plurality of polynucleotides of interest can be synthesized or assembled in solution. The plurality of polynucleotides of interest can be assembled or synthesized from various fragments in a same mixture without non-specific linkages or cross-gene misassembly. The plurality of polynucleotides can comprise different nucleic acid sequences. In some cases, each polynucleotide of the plurality synthesized or assembled comprise a unique sequence that is different from other sequences in the plurality.
For example, the methods provided herein can be used to synthesize a plurality of n polynucleotides or a plurality of at least n polynucleotides, where n can be an integer that is equal to or greater than 2. In some cases, n denotes the total number of polynucleotides of interest that are synthesized in a mixture. In some cases, the plurality of n different polynucleotides is synthesized in a single compartment or the same mixture. In some cases, the plurality of n polynucleotides synthesized can comprise at least 2, 5, 10, 20, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000 or more different sequences. As described herein, “Seq” can be used to denote the sequence of each polynucleotide of the plurality of n polynucleotides. The Seq sequence can be the sequence of interest or the sequence desired to be synthesized. And, an ith polynucleotide of the plurality can comprise a Seqi sequence (where i=1 to n). For example, a first polynucleotide of the plurality can comprise a Seq1 sequence, a second polynucleotide of the plurality can comprise a Seq2 sequence, a third polynucleotide of the plurality can comprise a Seq3 sequence . . . and an nth polynucleotide of the plurality can comprise a Seqn sequence. As used herein, “Seq” followed by a letter such as SeqA, SeqB,SeqC . . . or SeqZ can be used to denote nucleic acid fragments that are used to synthesize the polynucleotide of interest containing a Seq sequence. For simplicity, in some cases, a single letter without “Seq” may be used to denote the sequence of interest. For example, A1, A2, A3, A1000, B1, B2, B3, and B1000 in figures described herein can be used to denote the sequences of interest. For each i, the Seqi sequence can be synthesized from two or more fragments including SeqAi , SeqBi, SeqCi . . . and/or SeqZi. For example, the Seqi sequence can be synthesized from a sequence containing SeqAi and a sequence containing SeqBi.
As an example shown in
SeqAi sequence, a SeqBi sequence and a SeqCi sequence. In some cases, the synthesized Seqi sequence can comprise a SeqAi sequence, a SeqBi sequence and a SeqCi sequence sequentially from 5′ end to 3′ end.
The methods provided herein can comprise providing a first mixture of at least n polynucleotides 501, where an ith polynucleotide comprises a SeqAi sequence 501 and a ZipAi sequence 503. The first mixture of at least n polynucleotides can be a family of polynucleotides. The SeqAi sequence can be a portion of the Seqi sequence of interest, and the ZipAi sequence can be a connector sequence used for linking the SeqAi sequence with another sequence. Each connector sequence in the first mixture can comprise a unique sequence that is different from other connector sequences in the first mixture. For example, the ZipAi sequence can be different from a ZipAj sequence when i≠j. As used herein, i or j can be an integer from 1 to n (n can be the total number of polynucleotides to be synthesized), which can be used to denote any given polynucleotide of a mixture of polynucleotides. Next, a second mixture of at least n polynucleotides 504 can be provided. In the second mixture, an ith polynucleotide can comprise a SeqBi sequence 505 and a ZipBi sequence 506. The second mixture of at least n polynucleotides can be a family of polynucleotides. The SeqBi sequence can be a portion of the Seqi sequence of interest, and the ZipBi sequence can be a connector sequence used for linking the SeqBi sequence with another sequence. Each connector sequence in the first mixture can comprise a unique sequence that is different from other connector sequences in the second mixture. The ZipBi sequence can be different from a ZipBi sequence when i≠j. Next, the first mixture 501 and the second mixture 504 can be contacted, thereby generating a third mixture of n polynucleotides 507. In the third mixture, an ith polynucleotide comprises a SeqAi sequence, a SeqBi sequence and a ZipABi sequence 508. The ZipABi sequence can be different from a ZipABj sequence when i≠j. The ZipABi sequence may not be flanked by the SeqAi sequence and the SeqBi sequence. The ZipABi sequence may be at a terminus of the ith polynucleotide comprising the SeqAi sequence, the SeqBi sequence and the ZipABi sequence. Next, a fourth mixture of at least n polynucleotides 509 can be provided. The fourth mixture of at least n polynucleotides can be a family of polynucleotides. In the fourth mixture, an ith polynucleotide can comprise a SeqCi sequence 511 and a ZipCi sequence 510. The SeqCi sequence can be a portion of the Seqi sequence of interest, and the ZipCi sequence can be a connector sequence used for linking the SeqCi sequence with another sequence. Each connector sequence in the first mixture can comprise a unique sequence that is different from other connector sequences in the second mixture. The ZipCi sequence can be different from a ZipCi sequence when i≠j. Next, the third mixture 507 and the fourth mixture 509 can be contacted, thereby generating a fifth mixture of n polynucleotides 512, where an ith polynucleotide comprises the Seqi sequence comprising the SeqAi sequence, the SeqBi sequence and the SeqCi sequence. In some cases, a sixth mixture, a seventh mixture or more may be used to add more fragments onto already synthesized polynucleotides to generate the polynucleotides of interest.
In some cases, the methods can comprise providing a first mixture of at least two polynucleotides comprising a first polynucleotide and second polynucleotide. The first polynucleotide of the first mixture can comprise a SeqA1 sequence and a ZipA1 sequence and a second polynucleotide of the first mixture can comprise a SeqA2 sequence and a ZipA2 sequence. The ZipA sequence can be different from the ZipA2 sequence. In some cases, a second mixture of at least two polynucleotides can be provided. The second mixture can comprise a first polynucleotide and a second polynucleotide. The first polynucleotide of the second mixture can comprise a SeqBi sequence and a ZipB1 sequence and the second polynucleotide of the second mixture can comprise a SeqB2 sequence and a ZipB2 sequence. The ZipB1 sequence can be different from the ZipB2 sequence. In some cases, a third mixture of at least two polynucleotides can be provided. The third mixture can comprise a first polynucleotide and a second polynucleotide. The first polynucleotide of the third mixture can comprise a SeqC1 sequence and a ZipC1 sequence and the second polynucleotide of the third mixture can comprise a SeqC2 sequence and a ZipC2 sequence. The ZipC1 sequence can be different from the ZipC2 sequence. In some cases, an additional one or more mixtures can be provided. The polynucleotides within each of the mixtures can be mixed to generate final product polynucleotides.
In some cases, the methods provided herein can comprise providing a first mixture of at least n polynucleotides. An ith polynucleotide of the first mixture can comprise a SeqAi sequence and a ZipAi sequence, and the ZipAi sequence can be different from a ZipAj (where j=1 to n) sequence of a jth polynucleotide when i≠j. Next, a second mixture of at least n polynucleotides can be provided. An ith polynucleotide of the second mixture can comprise a SeqBi sequence and a ZipBi sequence, and the ZipBi sequence can be different from a ZipBi sequence of a jth polynucleotide when i≠j. Next, the first mixture and the second mixture can be contacted to generate a third mixture of at least n polynucleotides, wherein an ith polynucleotide comprises a SeqAi sequence, a SeqBi sequence and a ZipABi sequence, and wherein the ZipABi sequence is different from a ZipABi sequence of a jth polynucleotide when i≠j. Next, a fourth mixture of at least n polynucleotides can be provided, wherein an ith polynucleotide comprises a SeqCi sequence and a ZipCi sequence, and wherein the ZipCi sequence is different from a ZipCi sequence of jth polynucleotide when i≠j. Next, the third mixture and the fourth mixture can be contacted to generate a fifth mixture of at least n polynucleotides, wherein an ith polynucleotide comprises the Seqi sequence comprising the SeqAi sequence, the SeqBi sequence and the SeqCi sequence. In some cases, generating the third mixture of at least n polynucleotides comprises specifically linking, for each i, the ZipAi sequence and the ZipBi sequence. In some cases, generating the fifth mixture of at least n polynucleotides comprises specifically linking, for each i, the ZipCi sequence and the ZipABi sequence.
The SeqAi sequence, the SeqBi sequence and the SeqCi sequence of the Seqi sequence can be linked seamlessly. As used herein, “seamless” used in the context of gene fusion or gene assembly refers to processes that allow two or more nucleic acid fragments to be joined precisely so that no unwanted (or intervening) nucleotides are added at the junctions between the nucleic acid fragments. For example, the Seqi sequence can comprise the SeqAi sequence and the SeqBi sequence without an intervening sequence (e.g., a Zip sequence) in between the SeqAi sequence and the SeqBi sequence. The Seqi sequence can comprise the SeqBi sequence and the SeqCi sequence without an intervening sequence in between the SeqBi sequence and the SeqCi sequence. In some cases, the Seqi sequence with an intervening sequence in between the SeqAi sequence and the SeqBi sequence or the SeqBi sequence and the SeqCi sequence is not a functional genetic element. The Seqi sequence can comprise the SeqAi sequence, the SeqBi sequence and the SeqCi sequence sequentially from the 5′ end to the 3′ end.
For each i, the SeqAi sequence, the SeqBi sequence and the SeqCi sequence of the Seq, sequence can be linked specifically. The Zip sequences used in the methods can be connector sequences for linking one fragment with another fragment in a mixture specifically. As used herein, “Zip” followed by a letter such as ZipA, ZipB, ZipC . . . or ZipZ can be used to denote the connector sequence of a sequence of interest (e.g., a nucleic acid fragment containing corresponding Seq sequence). For simplicity, in some cases, a single letter “Z” may be used to denote the connector sequence. For example, Z1, Z2, Z3, and Z1000 in figures described herein can be used to denote the connector sequences. For example, in some cases, for each i, the ZipAi sequence and the ZipBi sequence can be connector sequences for linking the SeqAi sequence and the SeqBi sequence. Each Zip sequence can be used to specifically link one fragment (e.g., SeqA) to another fragment (e.g., SeqB) such that the synthesized sequence containing SeqA and SeqB is a functional genetic element. For example, the Zip sequence can be used to specifically link a fragment containing SeqA1 to another fragment containing SeqB1 such that the synthesized sequence containing SeqA1 and SeqB1 is a functional genetic element. SeqA1 and SeqBi can be from the same polynucleotide of interest to be synthesized or the same functional genetic element to be synthesized. The Zip sequence can be used to prevent or minimize misassembly of the fragments. For example, the Zip sequence may not link a fragment containing SeqA1 to another fragment containing SeqB2. The Zip sequences used in a mixture can be re-used in another mixture. For each i, the ZipCi sequence and the ZipABi sequence can be connector sequences for linking the SeqCi sequence and a sequence comprising the SeqAi sequence and the SeqBi sequence. For each i, the ZipAi sequence and the ZipBi sequence can be a same nucleic acid sequence. In some cases, the ZipAi sequence and the ZipBi sequence may be substantially the same. For each i, the ZipAi sequence and the ZipBi sequence can be complementary (e.g., fully or partially complementary) or the ZipBi sequence is a reverse complement of the ZipAi sequence. For each i, the ZipAi sequence and the ZipBi sequence can be different nucleic acid sequences. For each i, the ZipABi sequence can be a same nucleic acid sequence as the ZipAi sequence or the ZipBi sequence. For each i, the ZipABi sequence, the ZipAi sequence, and the ZipBi sequence can be a same nucleic acid sequence (e.g., Zip sequences in
The connector sequence or Zip sequence described herein can be of various length. For example, the connector sequence or Zip sequence can be at least about 2, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 220, 250, 280, 300, 350, 400, 450 or more nucleotides in length. The connector sequence or Zip sequence can be at most about 500, 450, 400, 350, 300, 250, 200, 150, 100, 50, 20 or less nucleotides in length. The connector sequence or Zip sequence can be from 2 to 50, from 10 to 60, from 5 to 100, from 10 to 200, from 2 to 100, from 5 to 200, from 5 to 300, or from 5 to 400 nucleotides in length. For example, in some cases, for each i, the ZipAi sequence, the ZipBi sequence, the ZipABi sequence, or the ZipCi sequence can be from 5 nucleotides to 200 nucleotides in length.
The nucleic acid fragments (e.g., the SeqAi sequence, the SeqBi sequence, or the SeqCi sequence, etc.) used to synthesize the polynucleotide of interest can be of various length. For example, the nucleic acid fragments can be at least about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 150, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000 or more nucleotides in length. The nucleic acid fragments can be from 5 to 50, from 5 to 100, from 5 to 200, from 5 to 500, from 5 to 1,000, from 5 to 2,000, from 5 to 5,000, from 5 to 10,000, from 5 to 50,000, from 10 to 200, from 10 to 500, from 10 to 1,000, from 10 to 5,000, from 10 to 10,000, from 100 to 1,000, from 200 to 5,000, or from 200 to 10,000 nucleotides in length. For example, in some cases, for each i, the SeqAi sequence, the SeqBi sequence, or the SeqCi sequence is from 5 nucleotides to 5,000 nucleotides in length.
As described herein, the first mixture and the second mixture can be contacted, thereby generating a third mixture of n polynucleotides. Various methods, including hybridization, primer extension and ligation, can be used to generate the third mixture of n polynucleotides from the first mixture and the second mixture. In some cases, generating the third mixture of n polynucleotides comprises linking (e.g., specifically linking), for each i, the ZipAi sequence and the ZipBi sequence. The linking can be specific such that ZipAi sequence links to ZipBi sequence but does not link to ZipBi sequence when i≠j. In some cases, the ZipAi sequence and the ZipBi sequence are the same or complementary. In such cases, linking can comprise hybridizing, for each i, the ZipAi sequence and the ZipBi sequence (e.g., 111 of
As described herein, the third mixture 507 can be contacted with a fourth mixture 509 to generate a fifth mixture of n polynucleotides 512 (e.g.,
In many cases, an Operator domain at the 5′ end or 3′ end of a family of polynucleotides needs to be removed so that a Product Constituent or a Zip can be at the 5′ end or 3′ end. These reactions can be called “adaptor removal reactions” or “Operator removal reactions,” which can ensure the seamless ligation of Product Constituents or can improve the specificity of Zip-based assembly. For example, in R5 of
An alternative method to carry out adaptor removal reaction can be through the use of deoxyuridine (dU). For example, the last bases of [FB} and [RA*} can be designed to be T. A version of [FB} of [RA*} primers where all the dT base are replaced with dU bases (hereby called ‘dU-laden’ primer) can be used to amplify 112. Following this reaction, the USER enzyme mix (available from NEB Biolabs) can be used to remove the [FB} and [RA*} domains from 112, leaving 3′ overhangs. Next, a 3′-to-5′ ssDNA-specific exonuclease (e.g., Exonuclease I or Exo I) can be used to degrade these 3′ overhangs and form blunt ends.
An example method can comprise providing a first mixture of at least n polynucleotides, wherein an ith polynucleotide comprises a SeqAi sequence and a ZipAi sequence, and wherein the ZipAi sequence is different from a ZipAj (where j=1 to n) sequence of a jth polynucleotide when i≠j. Next, a second mixture of at least n polynucleotides, wherein an ith polynucleotide comprises a SeqBi sequence and a ZipBi sequence, and wherein the ZipBi sequence is different from a ZipBi sequence of a jth polynucleotide when i≠j can be provided. Next, the first mixture and the second mixture can be contacted to generate a third mixture of at least n polynucleotides, an ith polynucleotide of the third mixture comprising a SeqAi sequence, a SeqBi sequence and a ZipABi sequence, wherein the ZipABi sequence is different from a ZipABi sequence of a jth polynucleotide when i≠j, and wherein, for each i, the ZipAi sequence specifically links to the ZipBi sequence. Next, optionally, within the third mixture, for each i, a free 3′ end of the ZipAi sequence or the ZipBi sequence can be extended using the ith polynucleotide from the first mixture or the second mixture as a template to generate the ith polynucleotide comprising the SeqAi sequence, the SeqBi sequence and the ZipABi sequence. Next, optionally, within the third mixture, a sequence segment from 3′ and/or 5′end of the ith polynucleotide can be removed (e.g., adaptor removal reaction). Next, optionally, within the third mixture, the ith polynucleotide comprising the SeqAi sequence, the SeqBi sequence and the ZipABi sequence can be circularized to generate a circularized polynucleotide. Next, optionally, within the third mixture, the circularized polynucleotide can be linearized such that, for each i, the ZipABi sequence is not flanked by the SeqAi sequence and the SeqBi sequence. In some cases, a fourth mixture of at least n polynucleotides can be provided, wherein an ith polynucleotide comprises a SeqCi sequence and a ZipCi sequence, and wherein the ZipCi sequence is different from a ZipCj sequence of a jth polynucleotide when i≠j. Next, the third mixture and the fourth mixture can be contacted to generate a fifth mixture of at least n polynucleotides, an ith polynucleotide of the fifth mixture comprising the Seqi sequence comprising the SeqAi sequence, the SeqBi sequence and the SeqCi sequence, wherein, for each i, the ZipCi sequence specifically links to the ZipABi sequence. The methods can be repeated to link additional fragments to synthesize the polynucleotides of interest.
The present disclosure, in some other aspects, provides methods of synthesizing a plurality of n polynucleotides from two or more fragments in a single mixture. The method can comprise providing a mixture comprising a first subpopulation of n polynucleotides, a second subpopulation of n polynucleotides, and a third subpopulation of n polynucleotides. The first subpopulation, the second subpopulation, and the third subpopulation can be mixed within a single mixture. In another words, for each polynucleotide of the plurality of polynucleotides to be synthesized, three or more nucleic acid fragments can be assembled in a single mixture without contacting a first mixture with a second mixture first. In the first subpopulation, an ith polynucleotide can comprise a SeqAi sequence and a ZipAi sequence. The ZipAi sequence can be different from a ZipA sequence when i≠j. In the second subpopulation, an ith polynucleotide can comprise a SeqBi sequence and a ZipBi sequence. The ZipBi sequence can be different from a ZipBi sequence when i≠j. In the third subpopulation, an ith polynucleotide can comprise a SeqCi sequence and a ZipCi sequence. The ZipCi sequence can be different from a ZipCi sequence when i≠j. Next, the first subpopulation, the second subpopulation and the third subpopulation can be contacted within the mixture to generate a plurality of n polynucleotides, where an ith polynucleotide of the plurality comprises the Seqi sequence comprising the SeqAi sequence, the SeqBi sequence and the SeqCi sequence. The SeqAi sequence, the SeqBi sequence and the SeqCi sequence of the Seqi sequence can be linked seamlessly. The Seqi sequence can comprise the SeqAi sequence and the SeqBi sequence without an intervening sequence in between the SeqAi sequence and the SeqBi sequence. The Seqi sequence can comprise the SeqBi sequence and the SeqCi sequence without an intervening sequence in between the SeqBi sequence and the SeqCi sequence. The Seqi sequence can comprise the SeqAi sequence, the SeqBi sequence and the SeqCi sequence sequentially from the 5′ end to the 3′ end. For example, as shown in
In some cases, in the second subpopulation, the ZipBi sequence can be a ZipB1i sequence, and the ith polynucleotide can further comprise a ZipB2i sequence. The two connector sequences can be located at various positions. The two connector sequences may not flank the sequence of interest. For example, in some cases, the SeqBi sequence can be located in between the ZipB1i sequence and the ZipB2i sequence. In some cases, the ZipB1i sequence can be located in between the SeqBi sequence and the ZipB2i sequence. In some cases, the ZipB2i sequence can be located in between the SeqBi sequence and the ZipB1i sequence.
The ZipAi sequence and the ZipBi sequence can be connector sequences for linking the SeqAi sequence and the SeqBi sequence. The ZipB2i sequence and the ZipCi sequence can be connector sequences for linking the SeqBi sequence and the SeqCi sequence. In some cases, for each i, the ZipAi sequence and the ZipBi sequence are a same nucleic acid sequence. In some cases, for each i, the ZipAi sequence and the ZipBi sequence are complementary. In some cases, for each i, the ZipAi sequence and the ZipBi sequence are different nucleic acid sequences. In some cases, for each i, the ZipB2i sequence and the ZipCi sequence are a same nucleic acid sequence. In some cases, for each i, the ZipB2i sequence and the ZipCi sequence are complementary. In some cases, for each i, the ZipB2i sequence and the ZipCi sequence are different nucleic acid sequences. For each i, the ZipAi sequence, the ZipBi sequence, the ZipB1i sequence, the ZipB2i sequence, or the ZipCi sequence can be of various length, for example, from 5 nucleotides to 200 nucleotides in length. For each i, the SeqAi sequence, the SeqBi sequence, or the SeqCi sequence can be from 5 nucleotides to 5,000 nucleotides in length.
The method can further comprise linking (i) the ZipAi sequence and the ZipB1i sequence, and/or (ii) the ZipB2i sequence and the ZipCi sequence. In some cases, linking comprising hybridizing (i) the ZipAi sequence and the ZipB1i sequence, and/or (ii) the ZipB2i sequence and the ZipCi sequence. For example, the ZipAi sequence and the ZipB1i sequence can be complementary, and the ZipB2i sequence and the ZipCi sequence can be complementary. In some cases, a plurality of intermediate products can be generated, where an ith intermediate product of the plurality comprises the SeqAi sequence, the ZipAi sequence (or the ZipB1i sequence), the SeqBi sequence, the ZipCi sequence (or the ZipB2i sequence) and the SeqCi sequence. The ith intermediate product of the plurality can comprise the SeqAi sequence, the ZipAi sequence (or the ZipB1i sequence), the SeqBi sequence, the ZipCi sequence (or the ZipB2i sequence) and the SeqCi sequence sequentially from 5′ end to 3′ end (e.g., 221-227 of
In various embodiments described herein, the plurality of polynucleotides synthesized herein can be a functional genetic element. For example, the concatenation of the SeqAi sequence and the SeqBi sequence without an intervening sequence can be a functional genetic element. In some cases, concatenation of the SeqAi sequence, the SeqBi sequence and the SeqCi sequence without any intervening sequence can be a functional genetic element. The sequence of the functional genetic element can exist nationally in a cell or tissue. The functional genetic element can be a gene, a protein-coding sequence, a promoter, an internal ribosome entry site, a ribozyme, an aptamer, a nucleic acid sequence that is capable of being specifically bound by a transposase, a guide ribonucleic acid (gRNA) for CRISPR/Cas9 based gene editing, a primer-extension gRNA for prime editing, or any combination thereof. It is to be understood that the methods described herein may be used to assemble any genetic element or any polynucleotide of interest. In some cases, the polynucleotide of interest may not be functional or be a functional element. The functional genetic element may not comprise a sequence that is identical to any connector sequence or Zip sequence described herein. The connector sequence or Zip sequence can be irrelevant to any polynucleotides of interest synthesized herein. For example, the functional genetic element may not comprise a sequence that is identical to the ZipAi sequence, the ZipBi sequence, the ZipABi sequence or the ZipCi sequence. For each i, the SeqAi sequence, the SeqBi sequence, and/or the SeqCi sequence can be uniquely or specifically linked. In some cases, a plurality of at least 2, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 500, 1,000, 5,000 or more polynucleotides can be synthesized in one mixture.
The polynucleotide synthesized can be of various length. For example, polynucleotide of interest can be at least about 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 150, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 15,000, 20,000, 50,000, 100,000 or more nucleotides in length. In some cases, each polynucleotide of the plurality synthesized can be from about 15 to about 15,000 nucleotides in length.
The present disclosure, in some aspects, provides methods for synthesizing a plurality of polynucleotides (e.g., a plurality of at least n polynucleotides, where n is an integer equal to or greater than 2), where an ith polynucleotide of the plurality comprises a Seqi sequence (where i=1 to n), and the Seqi sequence comprises a SeqAi sequence and a SeqBi sequence. In the methods provided herein, an ith polynucleotide of the plurality can comprise a ZipABi sequence, a SeqAi sequence and a SeqBi sequence sequentially from 5′ end to 3′ end. For example, the methods provided herein can comprise providing a first mixture of n polynucleotide. In the first mixture, an ith polynucleotide can comprise a SeqAi sequence and a ZipAi sequence. For each i, the ZipAi sequence can be unique within the first mixture. In other words, the ZipAi sequence can be different from a ZipAj sequence when i≠j. For example, ZipA1 sequence can be different from a ZipA2 sequence within the first mixture. Next, a second mixture of n polynucleotides can be provided. In the second mixture, an ith polynucleotide can comprise a SeqBi sequence and a ZipBi sequence. For each i, the ZipBi sequence can be unique within the second mixture. In other words, the ZipBi sequence can be different from a ZipBi sequence when i≠j. Next, the first mixture and the second mixture can be contacted to generate a third mixture of a plurality of n polynucleotides, where an ith polynucleotide can comprise a ZipABi sequence, a SeqAi sequence and a SeqBi sequence sequentially from 5′ end to 3′ end. In the third mixture, for each i, the ZipABi sequence can be unique. The ZipABi sequence can be different from a
ZipABi sequence when i≠j. In the methods provided herein, the SeqAi sequence and the SeqBi sequence can be linked without an intervening sequence (e.g., no Zip sequences or other sequences in between). The SeqAi sequence and the SeqBi sequence can be linked seamlessly. For each i, the ZipAi sequence and the ZipBi sequence can be connector sequences for linking the SeqAi sequence and the SeqBi sequence. As described herein, in some cases, generating the third mixture can comprise linking, for each i, the ZipAi sequence and the ZipBi sequence. In some cases, linking can comprise hybridizing, for each i, the ZipAi sequence and the ZipBi sequence. The methods can further comprise extending a free 3′ end of the ZipAi sequence or the ZipBi sequence using the ith polynucleotide from the first mixture or the second mixture as a template to generate the ith polynucleotide comprising the SeqAi sequence, the SeqBi sequence and the ZipABi sequence. In some cases, an intermediate product comprising the SeqBi sequence, the ZipABi sequence and the SeqAi sequence sequentially from 5′ end to 3′ end (e.g., 111, 112 and 113 of
Any nucleic acid molecule described in the present disclosure can be a double-stranded nucleic acid molecule or single-stranded nucleic acid molecule. In some cases, a nucleic acid molecule may comprise a double-stranded region and a single-stranded region. For example, the nucleic acid molecule having a connector sequence or anti-connector sequence may be a double-stranded nucleic acid molecule having the connector sequence or anti-connector sequence region as a single-stranded region (e.g., an overhang or sticky end). The overhang can be at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more nucleotides long. The overhang can be at 5′ end or 3′ end of a nucleic acid molecule.
Any nucleic acid molecule describe herein can comprise one or more modified nucleotides. Examples of modified nucleotides include, but are not limited to diaminopurine, 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, xantine, 4-acetylcytosine, 5-(carboxyhydroxylmethyl) uracil, 5-carboxymethylaminomethyl-2-thiouridine, 5-carboxymethylaminomethyluracil, dihydrouracil, beta-D-galactosylqueosine, inosine, N6-isopentenyladenine, 1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, 2-methyladenine, 2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-adenine, 7-methylguanine, 5-methylaminomethyluracil, 5-methoxyaminomethyl-2-thiouracil, beta-D-mannosylqueosine, 5′-methoxycarboxymethyluracil, 5-methoxyuracil, 2-methylthio-D46-isopentenyladenine, uracil-5-oxyacetic acid (v), wybutoxosine, pseudouracil, queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil, uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid(v), 5-methyl-2-thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3)w, 2,6-diaminopurine and the like. In some cases, nucleotides may include modifications in their phosphate moieties, including modifications to a triphosphate moiety. Non-limiting examples of such modifications include phosphate chains of greater length (e.g., a phosphate chain having, 4, 5, 6, 7, 8, 9, 10 or more phosphate moieties) and modifications with thiol moieties (e.g., alpha-thiotriphosphate and beta-thiotriphosphates). Nucleic acid molecules may also be modified at the base moiety (e.g., at one or more atoms that typically are available to form a hydrogen bond with a complementary nucleotide and/or at one or more atoms that are not typically capable of forming a hydrogen bond with a complementary nucleotide), sugar moiety or phosphate backbone. Nucleic acid molecules may also contain amine-modified groups, such as amino ally 1-dUTP (aa-dUTP) and aminohexhylacrylamide-dCTP (aha-dCTP) to allow covalent attachment of amine reactive moieties, such as N-hydroxysuccinimide esters (NHS). Alternatives to standard DNA base pairs or RNA base pairs in the oligonucleotides of the present disclosure can provide higher density in bits per cubic mm, higher safety (resistant to accidental or purposeful synthesis of natural toxins), easier discrimination in photo-programmed polymerases, or lower secondary structure. Such alternative base pairs can be compatible with natural and mutant polymerases for de novo and/or amplification synthesis.
The present disclosure also compositions for synthesizing the polynucleotides of interest. For example, a composition provided herein can comprise any mixture described herein, including the first mixture, the second mixture, and the third mixture described herein. For another example, the composition provided herein can comprise an intermediate product or a mixture of intermediate products generated during the process of synthesizing the final products of interest.
In some cases, provided herein is a composition for synthesizing a plurality of n polynucleotides. The composition can comprise a first mixture of n polynucleotides. An ith polynucleotide of the first mixture can comprise a SeqAi sequence and a ZipAi sequence (where i=1 to n). The ZipAi sequence can be different from a ZipAi sequence when i≠j. The composition can further comprise a second mixture of n polynucleotides. An ith polynucleotide can comprise a SeqBi sequence and ZipBi sequence (where i=1 to n). The ZipBi sequence can be different from a ZipBj sequence when i≠j. In the composition, for each of i, concatenation of the SeqAi sequence and the SeqBi sequence without intervening sequence can be a functional genetic element. The ZipAi sequence and the ZipBi sequence can be connector sequences for linking the SeqAi sequence and the SeqBi sequence. The ZipAi sequence and the ZipBi sequence can be connector sequences for linking the SeqAi sequence and the SeqBi sequence specifically.
The first mixture and the second mixture can be within a same compartment or a same mixture. The first mixture and the second mixture can be combined or mixed to form a single mixture. The ZipAi sequence and the ZipBi sequence can be linked. The ZipAi sequence and the ZipBi sequence can be hybridized. The ZipAi sequence and the ZipBi sequence can be a same nucleic acid sequence, complementary nucleic acid sequences, or different nucleic acid sequences. The composition can further comprise a third mixture of n polynucleotides, where an ith polynucleotide comprises a SeqCi sequence and a ZipCi sequence and the ZipCi sequence is different from a ZipCi sequence when i≠j. In the compositions, for each of i, concatenation of the SeqAi sequence, the SeqBi sequence, and the SeqCi sequence without any intervening sequence can be a functional genetic element. The ZipCj sequence can be a connector sequence for linking the SeqCi sequence and a sequence comprising the SeqAi sequence and the SeqBi sequence. The ZipCi sequence can be a same nucleic acid sequence as the ZipAi sequence or the ZipBi sequence. The ZipCj sequence can be a different nucleic acid sequence from the ZipAi sequence or the ZipBi sequence. The ZipCj sequence, the ZipAi sequence and the ZipBi sequence can be a same nucleic acid sequence. The first mixture, the second mixture and the third mixture can be within a same compartment or a same mixture. The functional genetic element can be a gene, a protein-coding sequence, a promoter, an internal ribosome entry site, a ribozyme, an aptamer, a nucleic acid sequence that is capable of being specifically bound by a transposase, a guide ribonucleic acid (gRNA) for CRISPR/Cas9 based gene editing, or a primer-extension gRNA for prime editing.
The present disclosure, in some aspects, provides a composition comprising a polynucleotide having at least two double-stranded regions separated by a single-stranded region (see e.g., 228 of
A plurality of polynucleotides (e.g., a plurality of at least n polynucleotides, where n is equal to or greater than 2) of interest can be synthesized or assembled by the methods described herein. The plurality of polynucleotides of interest can be functional genetic elements. The functional genetic element can be a gene, a protein-coding sequence, a promoter, an internal ribosome entry site, a ribozyme, an aptamer, a nucleic acid sequence that is capable of being specifically bound by a transposase, a guide ribonucleic acid (gRNA) for CRISPR/Cas9 based gene editing, or a primer-extension gRNA for prime editing. The plurality of polynucleotides of interest can be synthesized or assembled using two or more nucleic acid fragments. The plurality of polynucleotides of interest can be synthesized or assembled using two or more nucleic acid fragments in a same mixture or a single mixture. In some cases, two or more different polynucleotides can be synthesized or assembled together in the same mixture. Each polynucleotide of the plurality of polynucleotides of interest can be synthesized or assembled from two or more nucleic acid fragments, where each nucleic acid fragment can be from a different mixture. When combining two or more different mixtures containing two or more different nucleic acid fragments into a single mixture, various reactions can be performed to generate the synthesized polynucleotides. The plurality of polynucleotides of interest can be a plurality of different mutants or variants of a wild-type polynucleotide. For example, a mixture of 100 different polynucleotides can be synthesized in a same mixture, where each polynucleotide comprises a mutation (e.g., a point mutation, a deletion, an addition, or a modification) of a wild-type sequence or a reference sequence.
As described herein, a given polynucleotide of the plurality being synthesized can be referred to as an ith polynucleotide which may comprise a sequence referred to as “Seqi sequence” (where i=1 to n). For example, the given polynucleotide can be a first polynucleotide comprising a Seqi sequence, a second polynucleotide comprising a Seq2 sequence, a third polynucleotide comprising a Seq3 sequence . . . or an nth polynucleotide comprising a Seqi sequence. For each given nucleotide, the Seq sequence can be synthesized or assembled by two or more nucleic acid fragments specifically. For example, the Seq sequence can be synthesized or assembled by SeqAi SeqBi SeqCi SeqD or more nucleic acid fragments. In some cases, the plurality of nucleic acid fragments containing SeqA sequences (e.g., SeqA1, SeqA2, SeqA3 . . . and/or SeqBn) can be provided in a first mixture. The nucleic acid fragments containing SeqA sequences can be a family of polynucleotides. In some cases, the plurality of nucleic acid fragments containing SeqB sequences (e.g., SeqB1, SeqB2, SeqB3 . . . and/or SeqBn) can be provided in a second mixture. The nucleic acid fragments containing SeqB sequences can be a family of polynucleotides. In some cases, the plurality of nucleic acid fragments containing SeqC sequences (e.g., SeqC1, SeqC2, SeqC3 . . . and/or SeqCn) can be provided in a third mixture. The nucleic acid fragments containing SeqC sequences can be a family of polynucleotides.
A nucleic acid fragment for synthesizing or assembling a polynucleotide of interest described herein can further comprise a connector or a Zip sequence. Within each mixture of nucleic acid fragments, the Zip sequence in a given fragment containing a given SeqA sequence is unique such that a given SeqA sequence is specifically or uniquely linked to another fragment containing a SeqB sequence from another mixture when the two mixtures are combined. For example, in various embodiments, a first mixture of n polynucleotides can be provided, where an ith polynucleotide comprises a SeqAi sequence and a ZipAi sequence, and where the ZipAi sequence is different from a ZipA sequence when i≠j. For another example, in various embodiments, a second mixture of n polynucleotides can be provided, where an ith polynucleotide comprises a SeqBi sequence and ZipBi sequence, and wherein the ZipBi sequence is different from a ZipBi sequence when i≠j. In various embodiments, a SeqAi sequence, a SeqBi sequence, a SeqCi sequence, or more sequences can be specifically linked to form the functional genetic element of interest. In other words, the SeqAi sequence, the SeqBi sequence, the SeqCi sequence, or more sequences can be derived from a same functional genetic element. The nucleic acid fragment described herein can comprise a restriction enzyme recognition site. For example, the restriction enzyme recognition site can be a recognition site for Type IIS restriction enzyme. Examples of Type-IIS restriction enzymes which can be useful in the present disclosure include, but are not limited to, EarI, MnlI, PleI, AlwI, BbsI, BbvI, BcoDI, BsaI, BseRI, BsmAI, BsmBI, BspMI, Esp3I, HgaI, SapI, SfaNI, BbvI, BsmFI, BsrDI, BtsI, FokI, BseRI, HphI, MlyI and MboII. In some cases, two or more different restriction enzymes can be used during nucleic acid construction process. In some cases, a restriction enzyme that create a 4-bp 5′ overhang (for example, BbsI, BbvI, BcoDI, Bsal, BsmBI, FokI, etc.) can be used. In some cases, a restriction enzyme that creates a blunt end or 3′ overhang (for example, BseRI, BsrDI, BtsI, MlyI, etc.) can be used.
A nucleic acid fragment described herein can be circularized. In some cases, a nucleic acid fragment generated as an intermediate product can be circularized. For example, the nucleic acid fragment can be circularized by joining two ends of the nucleic acid fragment by ligation. The ligation can be blunt end ligation. The ligation can be performed after creating sticky ends using 5′-to-3′ exonuclease (e.g., Gibson Assembly), 3′-to-5′ exonuclease (e.g., sequence and ligase independent cloning or SLIC), or USER enzyme mix (e.g., USER friendly DNA recombination or USERec). Additional examples of circularization methods include, but are not limited to, circular polymerase extension cloning (CPEC) and seamless ligation cloning extract (SLICE) assembly. Alternatively, these two ends can be joined by overlapping PCR. A variety of ligases can be used for ligation, for example, including but not limited to, T4 DNA ligase, T4 RNA ligase, E. coli DNA ligase.
The nucleic acid fragment can be synthesized chemically. For example, the initial mixtures used to synthesize or assemble any polynucleotide of interest can be synthesized chemically. For example, the nucleic acid fragment can be pre-synthesized by chip-based synthesis. In some cases, the nucleic acid fragment synthesized can be equal to or greater than about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, or more nucleotides in length. In some cases, the nucleic acid fragment synthesized by can be equal to or less than about 500, 450, 400, 350, 300, 250, 200, 150, 100, 90, 80, 70, 60, 50, 40, 30, 20, or 10 nucleotides in length. For example, in some cases, the nucleic acid fragments can be at least about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 150, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000 8,000, 9,000, 10,000 or more nucleotides in length. The nucleic acid fragments can be from 5to 50, from 5 to 100, from 5 to 200, from 5 to 500, from 5 to 1,000, from 5 to 2,000, from 5 to 5,000, from 5 to 10,000, from 5 to 50,000, from 10 to 200, from 10 to 500, from 10 to 1,000, from 10 to 5,000, from 10 to 10,000, from 100 to 1,000, from 200 to 5,000, or from 200 to 10,000 nucleotides in length. For example, in some cases, for each i, the SeqAi sequence, the SeqBi sequence, or the SeqCi sequence is from 5 nucleotides to 5,000 nucleotides in length.
In various embodiments, the nucleic acid fragment containing the SeqAi sequence, the nucleic acid fragment containing the SeqBi sequence, the nucleic acid fragment containing the SeqCi sequence or more fragments can be pre-synthesized chemically and provided in a single pool. A first mixture of nucleic acid fragments containing the SeqA sequences, a second mixture of nucleic acid fragments containing the SeqB sequences, or the third mixture of nucleic acid fragments containing the SeqC sequences can be prepared from the single pool, for example, by specifically amplifying the fragments containing the SeqA sequences, the SeqB sequences or the SeqC sequences. For example,
As an example, in various embodiments, the methods can further comprise, prior to providing two or more mixtures for polynucleotide assembly, a pool of polynucleotides comprising the at least n polynucleotides of the first mixture (e.g., a first family), the at least n polynucleotides of the second mixture (e.g., a second family), and/or the at least n polynucleotides of the fourth mixture (e.g., a third family) can be provided. Next, the at least n polynucleotides of the first mixture, the at least n polynucleotides of the second mixture, and/or the at least n polynucleotides of the fourth mixture can be amplified from the pool to generate double-stranded polynucleotides. The at least n polynucleotides of the first mixture, the at least n polynucleotides of the second mixture, and/or the at least n polynucleotides of the fourth mixture can be amplified in different reactions using different primers. For example, a pair of primers targeting the primer binding site (e.g., Operator sequence) common to the first family can be used to only amplify the first family of polynucleotides. Next, the Operator sequence (e.g., the primer binding site) can be removed from the double-stranded polynucleotides. Next, one strand of the double-stranded polynucleotides can be removed (e.g., degraded) to generate the at least n polynucleotides of the first mixture, the at least n polynucleotides of the second mixture, and/or the at least n polynucleotides of the fourth mixture.
A connector sequence (also referred to as Zip sequence, or Z for short in some cases) can be used to link (or specifically link) one nucleic acid molecule (or nucleic acid fragment) to another nucleic acid molecule (or nucleic acid fragment). The connector sequence of one nucleic acid molecule can hybridize (e.g., form base pair or base pairs) with an anti-connector sequence (e.g., Zip* sequence or Z*) of another nucleic acid molecule. The anti-connector sequence can be complementary (e.g., fully or substantially complementary) with the connector sequence. The anti-connector sequence can be hybridizable with the connector sequence under certain conditions (e.g., temperature, buffer condition, pH, etc.). The anti-connector sequence can be a reverse complement sequence (or complementary sequence) of the connector sequence. When the connector sequence hybridizes with the anti-connector sequence, the base pair(s) formed can be at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 100, or more base pairs. The base pairs formed between the connector sequence and the anti-connector sequence can be contiguous or non-contiguous. For example, in the cases where non-contiguous base pairs are formed, there may be unpaired region or regions separating paired regions. If a first nucleic acid molecule comprises a connector sequence, then a complementary sequence of the connector sequence on a second nucleic acid molecule can be referred to as an anti-connector sequence. The connector sequence or Zip sequence (or anti-connector sequence or Zip* sequence) described herein can be of various length. For example, the connector sequence or Zip sequence can be at least about 2, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 220, 250, 280, 300, 350, 400, 450 or more nucleotides in length. The connector sequence or Zip sequence can be at most about 500, 450, 400, 350, 300, 250, 200, 150, 100, 50, 20 or less nucleotides in length. The connector sequence or Zip sequence can be from 2 to 50, from 10 to 60, from 5 to 100, from 10 to 200, from 2 to 100, from 5 to 200, from 5 to 300, or from 5 to 400 nucleotides in length. For another example, the connector sequence (or anti-connector sequence) can be greater than or equal to about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 100, or more nucleotides in length. The connector sequence (or anti-connector sequence) can be less than or equal to about 300, 250, 200, 150, 100, 90, 85, 80, 75, 70, 65, 60, 55, 50, 45, 40, 35, 30, 25, 20, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3 or 2 nucleotides in length. The connector sequence (or anti-connector sequence) can be at 5′ end or 3′ end of a nucleic acid molecule. The connector sequence (or anti-connector sequence) can also be an internal sequence of a nucleic acid molecule. For example, the connector sequence can be an internal connector sequence and can be exposed at 5′ end or 3′ end by cutting an internal sequence (e.g., a sequence adjacent to the internal connector sequence) of the nucleic acid molecule.
The connector sequence or Zip sequence described herein can be irrelevant to any polynucleotides of interest synthesized herein. The connector sequence or Zip sequence described herein can be arbitrary or predesigned sequences. The functional genetic element may not comprise a sequence that is identical to any connector sequence or Zip sequence described herein. After synthesizing or assembling the polynucleotides containing the final sequences of interest, any connector sequences or Zip sequences can be removed from the polynucleotides to generate the final polynucleotides of interest.
In various embodiments, a first mixture (e.g., a first family) of n polynucleotides can be provided, where an ith polynucleotide of the first mixture can comprise a SeqAi sequence and a ZipAi sequence. For each i, the ZipAi sequence can be unique within the mixture. The ZipAi sequence can be different from a ZipA sequence when i≠j. For example, a ZipA1 sequence can be different from a ZipA2, ZipA3, ZipA+ . . . or ZipA100 sequence (assuming 100 fragments are within the first mixture in order to synthesize 100 polynucleotides of interest as final products). In some cases, a second mixture of n polynucleotides can be provided, where an ith polynucleotide of the second mixture can comprise a SeqBi sequence and a ZipBi sequence and the ZipBi sequence can be different from a ZipBi sequence when i≠j. A SeqA in the first mixture can be specifically linked to a corresponding SeqB in the second mixture. For each i, the ZipAi sequence and the ZipBi sequence can be a same nucleic acid sequence or different nucleic acid sequences. For each i, the ZipAi sequence and the ZipBi sequence can be complementary. For each i, the ZipAi sequence and the ZipBi sequence can hybridize with each other.
The connector sequences can be re-used in each mixture (e.g., a family of polynucleotides). For example, a set of connector sequences in the first mixture can be the same as the set of connector sequences in the second mixture. For example, when contacting the first mixture and the second mixture, a third mixture of polynucleotides may be generated, where an ith polynucleotide comprises a SeqAi sequence, a SeqBi sequence and a ZipABi sequence and the ZipABi sequence is different from a ZipABi sequence when i≠j. In some cases, the ZipABi sequence can be the same as the SeqAi sequence or the SeqBi sequence. In some cases, the ZipABi sequence can be the SeqAi sequence or the SeqBi sequence. In some cases, the ZipABi sequence is the SeqAi sequence or the SeqBi sequence after circularization and linearization to expose the ZipABi sequence at the terminus of a polynucleotide. In some cases, a fourth mixture of polynucleotides can be provided, where an ith polynucleotide can comprise a SeqCi sequence and a ZipCi sequence and the ZipCi sequence is different from a ZipCj sequence when i≠j. The ZipCi sequence can be the same as the ZipABi sequence, which can be the same as the SeqAi sequence or the SeqBi sequence.
Different Zips used in one homogenous assembly reaction may have different length or GC content, but may have similar melting temperature. In some cases, hundreds to thousands of Zip sequences are used in a homogenous assembly reaction. Designing the Zip sequences may follow similar rules as designing primers for PCR reaction, such as: all Zips used in one assembly reaction can have similar melting temperature, a Zip may not form strong hairpin at 5° C. below melting temperature, one Zip may not hybridize strongly to another Zip at 5° C. below melting temperature, one Zip may not hybridize strongly to the complement of another Zip at 5° C. below melting temperature.
To generate a set of 1,000 Zips that can be used in the same assembly reaction, one million random 50-mer sequences can be generated first. Next, a desired melting temperature (e.g., 60° C.) can be chosen. Then, the shortest sub-sequence of each of the 30-mer sequence (starting from the 5′ end) whose melting temperature is above the desired melting temperature can be kept, while the rest of the bases can be removed. The resultant one million sequences (with various length) can be called “trimmed random sequences.” Next, the secondary structure of each trimmed random sequence can be evaluated and ranked based on the Gibbs free energy of the minimum free-energy (MFE) structure at 5° C. below the desired melting temperature. The top 10,000 trimmed random sequences, with the highest (e.g., least negative) Gibbs free energy, can be kept. Each of these kept sequences can be called a Zip candidate. If restriction enzymes are used in the assembly reactions, Zip candidate sequences containing such restriction sites may be removed. Next, each of the Zip candidate can be evaluated based on how strongly it forms primer dimer with all other Zip candidates and their complements. A penalty score can be assigned if a strong primer dimer is formed. The penalty score can be positively correlated with the strength of the primer dimer. The sum of all penalty scores can be the final penalty score for each Zip. The top 3,000 Zip candidates with the lowest final penalty score can be kept which can be called Zip finalists. Then this primer dimer evaluation process can be repeated for the 3,000 finalists to choose the top 1,000 sequences with the lowest final penalty score, which can be used as Zips. A number of web-based and stand-alone software packages such as Primer3, UNAfold, NUPACK, PrimerROC, Pythia, Multiple Primer Analyzer (Thermo Fisher), and OligoEvaluator (Sigma-Aldrich) can be used to implement this process.
In this example, how to create 1,000 DNA fragments (with 1,000 desired sequences) from 3,000 short oligos in two successive 1,000-plex primer-extension reactions is demonstrated. The orthogonality of the primer-extension reactions can be ensured by 1,000 well-designed ˜20-nt-long orthogonal sequences (e.g., Zips). The Zips may not appear or be identical to any consecutive region in the desired sequences. The desired sequences can be denoted as [Ai|Bi|Ci}, where the subscript i can be 1 to 1,000. For each DNA fragment, the sequences Ai, Bi and Ci can be contributed by three different oligos. First, 1,000 Zip sequences can be designed using criteria and process described herein. These Zips are named Z1 through Z1000, where Zi corresponds to Ai, Bi and Ci. A few more domains, which will serve as primer binding domains at various steps can also be designed using the same process. These domains can be referred to as Operator domains or Operators. The Zips and Operators may function at different temperature. For example, Zips may have Tm values around 55° C., whereas Operators may have Tm values around 65° C. The Operators used in this example include FA, RA, FB, RB, FC, RC, W, X and Y.
As shown in
The oligo pool 101 can also be amplified with 5′-protected [FB} and dU-laden [RB*} (reaction R4) to form dsDNA pool 108, which can be subject to adaptor removal reaction to form the dsDNA pool 109. Further treatment of 109 with T7 exonuclease (reaction R6) generates ssDNA pool 110. The ssDNA pools 107 and 110 can be mixed at 60° C. in typical PCR buffer (e.g., commercial buffer for Q5 DNA polymerase) for 5 to 10 hours so the matching [Zi} and [Zi*} can hybridize (reaction R7). This reaction can be referred to as “Zip-based hybridization” in the present disclosure. Then, a thermophilic DNA polymerase (e.g., Phusion, Q5, or Taq) can be added to the mixture to extend the 3′ ends of each ssDNA (reaction R8) to form dsDNA pool 112 where the matching [Ai} and [Bi} are brought to one molecule. The dsDNA pool 112 can be PCR-amplified again using dU-laden [FB} and [RA*}, and subject to adaptor removal reaction (reaction R9) to form dsDNA pool 113. This dsDNA pool can be circularized with a blunt-end DNA ligase, such as T4 DNA ligase (reaction R10), to form circular dsDNA pool 114, where [Ai} and [Bi} are seamlessly connected. In some cases, the dsDNA pool 113 may be too short for circularization to occur at high efficiency. In other words, the stiffness of dsDNA may prevent efficient circularization. In such situation, the dsDNA pool 113 can be diluted to 1 to 10 pM, denatured, and circularized using ssDNA ligase such as CircLigase or CircLigase II. In either case, the circularization product can be PCR-amplified dU-laden [Y} and 5′-protected [W*} (reaction R11) to form dsDNA pool 115. This PCR can be referred to as “inside-out PCR” in the present disclosure. The domains W and Y can be understood as “primer binding sites for inside-out PCR.” The domain [Y}:[Y*} can be removed in adaptor removal reaction (reaction R12) to form dsDNA pool 116. A ssDNA generation reaction can be set up to degrade the top strand of 116 (reaction R13) to form ssDNA pool 117, as described above.
In parallel to reactions R1 and R4, the oligo pool 101 can be PCR-amplified with 5′ protected [FC} and dU-laden [RC*} (reaction R14) to form dsDNA pool 118. The PCR product can undergo adaptor removal (reaction R15, to form 119), and ssDNA generation (reaction R16) to form ssDNA pool 120. ssDNA pools 117 and 120 can undergo Zip-based hybridization (reaction R17), followed by primer extension (reaction R18) to form dsDNA pool 122, which can further undergo adaptor removal (reaction R19). The resultant dsDNA pool 123 can be circularized (reaction R20, as in R10), to form circular dsDNA or ssDNA pool 124, which can then be PCR-amplified with [X} and [Y*} to form dsDNA pool 125. It can be seen in 125 that the DNA sequences Ai, Bi and Ci are connected without intervening Zip sequences.
This method can be used to further extend the assembly. For example, sequences Di and Ei can be both appended with Zi (similar to the design of 102 and 103) and assembled to form dsDNA pool containing [Di|Ei} (similar to 115, except that Zi is downstream of [Di|Ei}, achievable by placing the primer-binding sites for inside-out PCR downstream of Zi, instead of upstream as in 112). This dsDNA pool can undergo adaptor removal, ssDNA generation and used for Zip-based hybridization with the ssDNA pool derived from 125. As a result, [Ai|Bi|Ci} can be assembled with [Di|Ei} for form [Ai|Bi|Ci|Di|Ei}.
In other words, since the assembled dsDNA pools (such as 115 and 125) contain Zips and Operators, they can be further assembled. However, if dsDNA pools without Zip or Operator sequences are desired, they can be easily removed. For example, in the design of oligos 102, an Operator named V can be placed between Zi and Ai. Then dU-laden [V} and [Y*} can be used to amplify 125. The PCR product can then undergo adaptor removal to obtain dsDNA pool containing only [Ai|Bi|Ci} sequences.
The previous Example demonstrates how to assemble Ai and Bi without intervening Zip (e.g., Zi in
As an example (
The sequence of domain ADSR can end with a Nb.BtsI site (GCAGTG). The sequence of domain ADSL can start with the reverse-complement of a Nb.BtsI site (CACTGC). Therefore, Nt.BtsI can be used to treat 209 (reaction R2.3), where 210, 211, and 212 will be nicked to produce 214, 215, and 216, respectively, in mixture 213. This mixture can be heated to ˜75° C. (reaction R2.4), which is above the melting temperature of ZipPiand ZipQibut not high enough to melt other double-stranded domains in 213, for ˜5 min to form mixture 217, which contains 218, 219 and 220 (derived from 214, 215 and 216, respectively) so the ZipPion 218, ZipPi* on 219, ZipQion 219, and ZipQi* on 220 are exposed. The melted-off ZipPi* from 213, ZipPifrom 215, ZipQi* from 215, and ZipQifrom 216 (collectively called “melted-off Zips”) are now shown in 217. Then the temperature can be reduced from ˜75° C. to ˜55° C., a temperature at which Zips can stably hybridize, and held for 5 to 10 hours. During this time, while some melted-off Zips may rehybridize back to 218, 219 and 220, the Zips may also guide 218, 219 and 220 to form larger complexes 221 (reaction R2.5). The nicks can be ligated, and the ligation product can be amplified with [FPL} and a modified version of [FPR} (the modification being that 5′-T*T*T*T*T*TTdUdU is appended to the 5′ end of [FPR}, where * designates phosphorothioate and dU designates deoxyuridine) to form dsDNA pool 222 (reaction R2.6), whose bottom strand is 5′ protected but also contains dU bases close to the 5′ end. This PCR product can undergo ssDNA generation reaction (reaction R2.7) to form ssDNA pool 223. USER enzyme mixture can be used to cleave the dU nucleotides in 223 to form 5′ unprotected 224 (reaction R2.8).
Next, 5′-protected [FPL}, dU-laden and 3′-blocked [ADSR} (whose 3′ end is modified with inverted dT), and dU-laden [ADSL} are hybridized onto 224 (reaction R2.9) to form 225. A DNA polymerase without strand-displacement activity, such as PhusionU, can be used to extend each extendable 3′ end (reaction R2.10) to form 226. Then USER enzyme mixture can be used (reaction R2.11) to degrade the [ADSR} and [ADSL}, leaving precise ends at the 3′ end of Pi, 5′ end of Qi, 3′ end of Qi and 5′ end of Ri in 227 (note that the last base of dU laden [ADSL} is dU). Then a staple strand with the sequence of [ADSL|ADSR} will be hybridized onto 227 at ˜70° C. (reaction R2.12) to bring the 3′ end of Pi and 5′ end of Qi to proximity, and to bring the 3′ of Qi and 5′ end or Ri to proximity thus forming 228. Next, T4 DNA ligase can be added to ligate the ends in proximity (reaction R2.13) to form 229.
Next, a mixture of ssDNA-specific and dsDNA-specific 5′-to-3′ exonucleases such as T7 exonuclease and RecJf, respectively, can be used to degrade the bottom strands and the staple strands of 229 (reaction R2.14) to form ssDNA pool 230, which can then be PCR-amplified (reaction R2.15) to from dsDNA pool 231.
It is to be understood that the circularization method and tweezer method can be used in combination. For example, ˜200-nt oligonucleotides can be assembled into ˜1 kb fragments using the circularization method. Then the ˜1 kb fragments can be further assembled into 3-5 kb fragments using the tweezer method.
Example 3: Constructing pools of paired CDR3-J polynucleotides from shorter oligo pools
As described in International Application No. PCT/US2020/026558, Chen and Porter disclosed methods to construct thousands of TCR genes in homogenous solutions from pools of paired CDR3-J polynucleotides. Here, this example shows that the paired CDR3-J polynucleotide pool can be assembled from 4 pools of much shorter oligos, in two levels of Zip-based multiplex assemblies where Zips are reused in the 1st and 2nd level (
Some of the Zip sequences used in this example are:
The other Zips have the same length and similar GC content.
An oligo pool containing the top or bottom strands of 301, 302, 303 and 304 was obtained from a commercial source. Using this oligo pool as a template:
A total of 583 TCRs were intended to be synthesized. Therefore, each of the family 301, 302, 303, and 304 has 583 species (e.g., sequences).
For simplicity the 5′ protections and dU modifications are not shown in
The dsDNAs in the pool 311 were then circularized (step R3.9) as described in Example 1 (R10), to form circular DNA pool 313, which was then PCR amplified using primers 5′-protected [GQ1} and dU-laden [GQ4*} (step R3.11) to form dsDNA pool 315. In a similar series of steps (steps R3.10, R3.12), dsDNA pool 312 was converted to circular DNA pool 314, and then amplified to form linear dsDNA pool 316.
It can be seen that, in the pools 313 and 315, each of C3Ja1i (i=1 to 583) initially carried by dsDNA molecules in 302, is joined with the corresponding C3Ja2i, initially carried by dsDNA molecule in 301, without intervening sequence. Similarly, each of C3Jb1i (i=1 to 583) initially carried by dsDNA molecules in 304, is joined with the corresponding C3Jb2i, initially carried by dsDNA molecule in 303, without intervening sequence.
Next, 315 and 316 were converted to 317 and 318 (through steps R3.13 and R3.14), respectively, through adaptor removal reactions. These dsDNA pools underwent ssDNA generation reactions, Zip-based hybridization, and a reaction analogous to R8 of Example 1 to produce dsDNA pool 319. The dsDNA pool 319 was then used as a pool of paired CDR3-J oligos in downstream reactions to assemble full-length TCRs. Note that [C3Jali|C3Ja2i} has the sequence of [ConAi|CDR3Jαi}, and [C3Jbli|C3Jb2;} has the sequence of [ConBi|CDR3Jβi}. The latter annotations are useful in understanding the downstream reactions (
To assess the efficiency and accuracy of ligating C3Ja1i to the corresponding C3Ja2i (i=1 to 583), [GQ1} and [ACD} * were used to amplify the dsDNA pool 315 (result of R3.11), resulting in [GQ1|C3Jali|C3Ja2i|ACD}. Then Illumina sequencing adaptors along with unique molecule identifier (UMI) were added to flank the dsDNA [GQ1|C3Ja1i|C3Ja2i |ACD} and the resultant DNA library was analyzed by NGS. 538 out of the 583 of C3Ja1i (92%) were detected. For each detected C3Ja1i, two values were calculated: “match_mols_freq” and “match_accuracy.” The value “match_mols_freq” of a C3Ja1i (e.g., for a particular i) is defined by the UMI-corrected read numbers matched to the C3Ja1i divided by UMI-corrected read numbers matched to any C3Jal. Therefore, it reflects the frequency, or relative concentration of a [GQ1|C3Jali|C3Ja2; |ACD} (for a particular i) in the mixture. To calculate “match_accuracy”, all UMI-corrected reads mapped to a C3Ja1i are grouped and the sequences corresponding to the position of C3Ja2 in those reads were analyzed to determine if the correct C3Ja2 (e.g., C3Ja2i) was ligated to C3Ja1i. The fraction of UMI-corrected reads that mapped to C3Ja1i within this group was calculated and noted as “match_accuracy”. Therefore, it reflects the accuracy of the C3Ja1-C3Ja2 assembly. As can be seen in
Similar analysis was done for pool 316 resulting from the C3Jb1-C3Jb2 assembly. The match_mols_freq and match_accuracy values for each species are shown on
A similar strategy was used to characterize the pool 319, the Zip-guided assembly (reaction R3.15) product of 317 and 318. As shown on
Through steps R4.1 through R4.7, dsDNA pool 410 was prepared. Briefly, an adaptor removal reaction was carried out to remove [GQ1}:[GQ1*} (reaction R4.1) to form pool 402, which further underwent ssDNA generation to produce 403. The pool 404 containing ˜50 TRAV germline sequences, each having a 3′ single-stranded connector sequence (ConA#) was mixed with the pool 403, where each species of 403 was hybridized to the designated species in the pool 404 (reaction R4.3). Primer extension and ligation was carried out to produce pool 405. The [BCD}:[BCD*} domain contained a TypeIIS restriction site; cutting of 405 by the corresponding restriction enzyme (reaction R4.4) generated a 4-nt sticky end which was used to ligate a DNA segment (407) containing TRBC1 and a matching 4-nt sticky end (reaction R4.5). The resultant pool 408 was circularized (reaction R4.6) to form circular DNA pool 409, which was then linearized between GQ2 and GQ3 to from pool 410. The pool 410 underwent adaptor removal to remove [GQ3}:[GQ3*} (reaction R4.8 to form 411) and ssDNA generation (reaction R4.9 to form 413), before having each species in the pool 413 ligated to its corresponding TRBV germline sequences in pool 412 (reaction R4.10, analogous to R4.3), forming final product 414. It can be seen that in 410 and 414, each of [C3Jali|C3Ja2i} (e.g., [ConAi|CDR3Jαi}) and its corresponding [C3Jbli|C3Jb2;} (e.g., [ConBi|CDR3Jβi}) are joined without an intervening sequence that contains any Zip or any other variable sequences.
The final product 414 was characterized using an NGS-based method similar to that for 319 described above. The relative concentration and assembly accuracy of each species in 414 are shown in
This example shows how a family of ˜1,000 genes, each containing ˜1.4-kb sequences (of which 1-kb were synthesized from oligo pool using the methods described herein) can be assembled through 3 levels of consecutive Zip-based assemblies, where the Zip sequences were reused at different levels of assembly reactions. An oligonucleotide pool containing 8 families of oligos (901, 902, 903, 904, 905, 906, 907 and 908, see
Among these domains, SegN (N=1 to 8) domains are Product Constituents, which along with Zip have species-specific sequences. All other domains are Operators and have common sequences. For example, [OPAL} on all oligos has the sequence 5′-AACACTGCTGAAGCTCCCAAT-3′, [OPBL} on all oligos has the sequence 5′-TCCCTGTTTGCCATTTCGCAT-3′. Other Operators have similar length and GC content.
First, eight PCRs were set up, each specifically amplifying a family from the initial oligonucleotide pool. Specifically:
As shown in
The pools 903, 904, 905, 906, 907, and 908 prior to adaptor removal reaction are shown on lanes 3-1, 4-1, 5-1, 6-1, 7-1, and 8-1 of
The pools 903, 904, 905, 906, 907, and 908 after adaptor removal reaction are shown on lanes 3-2, 4-2, 5-2, 6-2, 7-2, and 8-2 of
The assembly reactions to form sequences [Seg1|Seg2}, [Seg3|Seg4}, [Seg5|Seg6}, and [Seg7|Seg8} (
As shown in
The PCR products of R9.28 and R9.29 prior to adaptor removal reaction are shown on lanes #5 and #7, respectively, of
The PCR products of R9.28 and R9.29 after adaptor removal reaction are shown on lanes #6 and #8, respectively, of
The assembly reactions to form sequences [Seg1|Seg2|Seg3|Seg4} and [Seg5|Seg6|Seg7|Seg8} (
Next 5′-protected [OPAL} and dU-laden [IPAR*} were used to amplify circular dsDNA pool 914 into linear dsDNA pool 916 (reaction R9.36). [IPAL} and [OPAR*} were used to amplify circular dsDNA pool 913 into linear dsDNA pool 915 (reaction R9.35).
The genes to be synthesized in this example are antibody genes, where each of the Product Constituent [Seg1|Seg2|Seg3|Seg4} encodes an antibody light chain variable region, and each of the Product Constituent [Seg5|Seg6|Seg7|Seg8} encodes an antibody heavy chain variable region. Since the antibody chains for different genes have different lengths, stretches of scrambled filler sequences containing A and T bases (‘AT Filler’, black rounded squares in
[IPAL} and [LRK*} were used to amplify all fragments within 915 that encode a kappa light chain. This product was ligated to kappa light chain constant domain (IGKC of 917,
[IPAL} and [LRL*} were used to amplify all fragments within 915 that encode a lambda light chain. This product was ligated to lambda light chain constant domain (IGLC of 917,
These two products were then mixed to form pool 917. Both light chain constant domains contained, at its 3′ end, a furin cleavage site, flexible linker, and P2A (FFP2A), followed by portion of the leader peptide of heavy chain (LHv, a common sequence for all genes). Next, dU-laden [IPAL} and 5′-protected [LHv*} were used to amplify 917. This product was paired with pool 916, in Zip-based ligation and circularization reactions similar to those described before in this Example (collectively noted as R9.38), to form the final, ˜1.4 kb product (
While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. It is not intended that the invention be limited by the specific examples provided within the specification. While the invention has been described with reference to the aforementioned specification, the descriptions and illustrations of the embodiments herein are not meant to be construed in a limiting sense. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. Furthermore, it shall be understood that all aspects of the invention are not limited to the specific depictions, configurations or relative proportions set forth herein which depend upon a variety of conditions and variables. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is therefore contemplated that the invention shall also cover any such alternatives, modifications, variations or equivalents. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.
This application claims the benefit of U.S. Provisional Patent Application No. 63/282,845, filed on Nov. 24, 2021, and U.S. Provisional Patent Application No. 63/305,488, filed on Feb. 1, 2022, the entire content of each of which is incorporated herein by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/050685 | 11/22/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63282845 | Nov 2021 | US | |
63305488 | Feb 2022 | US |