The instant application contains a Sequence Listing, which has been submitted in ASCII format via EFS-Web and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Oct. 27, 2020, is named RICEP0072WO_ST25.txt and is 14.3 kilobytes in size.
The present invention relates generally to the field of molecular biology. More particularly, it concerns compositions and methods for assembling multiple DNA molecules into a linear concatemer.
Nanopore sequencing (NS) is a method of sequencing where ionic current is passed through a nanopore and DNA sequence is decoded from the change in current as the nucleotides in the DNA molecule pass through the nanopore. There are a number of advantages to NS over short-read Next Generation Sequencing (NGS) like Illumina sequencing. NS allows long fragments of DNA, typically in 10-50 kb range to be sequenced, while NGS is limited to 150-300nt. The sequencing time is greatly reduced compared to Next Generation Sequencing (<1 hr for NS, compared to >24 hours to >72 hours for NGS) and sequencing data can be obtained in real time. Additionally, nanopore sequencing devices by Oxford Nanopore Technologies are small (approx. 10 cm×3 cm×3 cm) and no capital costs are required. A major disadvantage of NS over NGS is the higher intrinsic error rate (>7%) compared to 0.2% for Illumina NGS. The higher error rate prevents use of NS for directly detecting single nucleotide variations at low variant allele fraction (VAF).
In principle, ultradeep NS followed by background subtraction could allow detection of mutations, including single-nucleotide variants. However, the number of NS reads is limited, and NS of short DNA fragments or amplicons by NS results in lower throughput and lower quality reads due to higher error rates near the ends of DNA. Therefore, it is desirable to assemble short amplicons into longer DNA for NS sequencing.
Short DNA can be assembled by blunt end ligation. However, blunt end ligation is inefficient compared to cohesive end ligation, which makes it difficult to assemble long fragments from short 100-300 bp amplicons. Gibson assembly uses the sequential action of three enzymes, an exonuclease, a polymerase, and a ligase, to assemble DNA. The presence of exonuclease can lead to loss of sequence information and the polymerase can introduce errors in the sequence. The requirement for coordinated action of three enzymes also makes the system less robust and less efficient for long assemblies. As such, new methods are needed to assemble short DNA into long fragments for NS sequencing.
Provided herein are compositions and methods for assembling short DNA into long fragments by Linear DNA Assembly (LDA) using type IIS restriction enzyme digestion and ligation by DNA ligase. Type IIS restriction enzymes cut outside their recognition site, which allows assembly to occur by ligation even in the presence of the restriction enzyme. Provided are methods and reagents to improve assembly length by LDA. Also provided are methods that combine variant enrichment (e.g., Blocker Displacement Amplification (BDA)) with LDA for low VAF (0.1%) detection on NS platform, which is a 200-fold improvement over the current VAF detection capability of NS.
In one embodiment, provided herein are aqueous solutions for DNA monomer assembly, the solution comprising: a plurality of double-stranded DNA monomer species, each monomer species comprising, from 5′ to 3′: a type IIS restriction site in the (+) orientation (S1), a designed Left sticky end DNA sequence (1), an insert sequence (A), a second designed Right sticky end DNA sequence (1*), and a type IIS restriction site in the (−) orientation (S1*), wherein at least two different DNA monomers comprise the same Left sticky end DNA sequence, wherein at least two different DNA monomers comprise the same Right sticky end DNA sequence, and wherein the Left sticky end DNA sequence and the Right sticky end DNA sequence are complementary to and can form Watson-Crick base pairs with each other; a type IIS DNA restriction enzyme; a DNA ligase enzyme; and a chemical buffer suitable for the enzymatic functions of the type IIS DNA restriction enzyme and the DNA ligase enzyme.
In some aspects, the solutions further comprise a partially double-stranded
DNA seed molecule, the seed molecule comprising, from 5′ to 3′: a single-stranded Left sticky end DNA sequence (1); and a double stranded DNA region devoid of a type IIS restriction site (C). In some aspects, the solutions further comprise a partially double-stranded DNA seed molecule, the seed molecules comprising, from 5′ to 3′: a Left sticky end DNA sequence (1); a double stranded DNA region devoid of a type IIS restriction site (C); and a
Left sticky end DNA sequence (1).
In some aspects, the chemical buffer comprises between 20 mM and 150 mM Tris-HCl, between 2 mM and 50 mM MgCl2, between 0 mM and 50 mM DTT, and between 0.1 mM and 10 mM ATP, wherein the buffer exhibits a pH between 5.5 and 9.5 at 25° C. In some aspects, the chemical buffer comprises Tris-HCl at a concentration between 50 mM and 150 mM, between 75 mM and 150 mM, between 100 mM and 150 mM, between 20 mM and 125 mM, between 20 mM and 100 mM, between 20 mM and 75 mM, between 20 mM and 50 mM, or any range derivable therein. In some aspects, the chemical buffer comprises Tris-HCl at a concentration of about 20 mM, 25 mM, 30 mM, 35 mM, 40 mM, 45 mM, 50 mM, 55 mM, 60 mM, 65 mM, 70 mM, 75 mM, 80 mM, 85 mM, 90 mM, 95 mM, 100 mM, 105 mM, 110 mM 115 mM, 120 mM, 125 mM, 130 mM, 135 mM, 140 mM, 145 mM, or 150 mM. In some aspects, the chemical buffer comprises MgCl2 at a concentration between 2 mM and 50 mM, 5 mM and 50 mM, 10 mM and 50 mM, 15 mM and 50 mM, 20 mM and 50 mM, 25 mM and 50 mM, 30 mM and 50 mM, 2 mM and 45 mM, 2 mM and 40 mM, 2 mM and 35 mM, 2 mM and 30 mM, 2 mM and 25 mM, 10 mM and 40 mM, or any range derivable therein. In some aspects, the chemical buffer comprises MgCl2 at a concentration of about 2 mM, 5 mM, 10 mM, 15 mM, 20 mM, 25 mM, 30 mM, 35 mM, 40 mM, 45 mM, or 50 mM. In some aspects, the chemical buffer comprises DTT at a concentration of between 5 mM and 50 mM, between 10 mM and 50 mM, between 15 mM and 50 mM, between 20 mM and 50 mM, between 5 mM and 40 mM, between 2 mM and 25 mM, a range derivable therein any of the foregoing ranges, less than 45 mM, less than 40 mM, less than 35 mM, less than 30 mM, less than 25 mM, less than 20 mM, less than 15 mM, less than 10 mM, or less than 4 mM. In some aspects, the chemical buffer comprises DTT at a concentration of about 0 mM, 1 mM, 5 mM, 10 mM, 15 mM, 20 mM, 25 mM, 30 mM, 35 mM, 40 mM, 45 mM, or 50 mM. In some aspects, the chemical buffer comprises ATP at a concentration of between 0.1 mM and 9 mM, 0.1 mM and 8 mM, 0.1 mM and 7 mM, 0.1 mM and 6 mM, 0.1 mM and 5 mM, 1 mM and 10 mM, 2 mM and 9 mM, 3 mM and 8 mM, or any range derivable therein. In some aspects, the chemical buffer comprises ATP at a concentration of about 0.1 mM, 1 mM, 2 mM, 3 mM, 4 mM, 5 mM, 6 mM, 7 mM, 8 mM, 9 mM, or 10 mM. In some aspects, the chemical buffer exhibits a pH at 25° C. between 5.5 and 9.5, between 6 and 9.5, between 6.5 and 9.5, between 7 and 9.5, between 7.5 and 9.5, between 8 and 9.5, between 6 and 8, or any range derivable therein. In some aspects, the chemical buffer exhibits a pH at 25° C. of about 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, or 9.5.
In some aspects, the type IIS DNA restriction enzyme is selected from BsaI, BbsI, BsmBI, BtgZI, Esp3I, and SapI. In some aspects, the S1 and S1* restriction sites correspond to the recognition site of the type IIS DNA restriction enzyme selected. In some aspects, the concentration of the type IIS DNA restriction enzyme is between 0.15 U/μL and 15 U/μL, between 0.25 U/μL and 15 U/μL, between 0.5 U/μL and 15 U/μL, between 1 U/μL and 15 U/μL, between 2 U/μL and 15 U/μL, between 5 U/μL and 15 U/μL, between 0.15 U/μL and 10 U/μL, between 1 U/μL and 10 U/μL, or any range derivable therein. In some aspects, the concentration of the type IIS DNA restriction enzyme is about 0.15, 0.2, 0.25, 0.5, 0.75, 1, 1.5, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 IR
In some aspects, the DNA ligase enzyme is selected from T4 DNA ligase, T7 DNA ligase, T3 DNA ligase, Taq DNA ligase, and E. Coli DNA ligase. In some aspects, the concentration of the DNA ligase is between 5 U/μL and 500 U/μL, between 5 U/μL and 400 U/μL, between 5 U/μL and 300 U/μL, between 5 U/μL and 200 U/μL, between 5 U/μL and 100 U/μL, between 5 U/μL and 50 U/μL, between 50 U/μL and 500 U/μL, between 100U/μL and 500 U/μL, between 50 U/μL and 300 U/μL, between 50 U/μL and 200 U/μL, or any range derivable therein. In some aspects, the concentration of the DNA ligase is about 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, or 500 U/μL.
In some aspects, the Left sticky end DNA sequence and the Right sticky end DNA sequence each have a length of 2-6 nucleotides (e.g., having a length of 2, 3, 4, 5, 6, or 7 nucleotides). In some aspects, the insert sequence of each monomer has a length between 40 nt and 2,000 nt, between 100 nt and 2,000 nt, between 500 nt and 2,000 nt, between 40 nt and 1,000 nt, between 40 nt and 500 nt, between 40 nt and 100 nt, or any range derivable therein. In some aspects, the insert sequence of each monomer has a length of about 40 nt, 50 nt, 60 nt, 70 nt, 80 nt, 90 nt, 100 nt, 150 nt, 200 nt, 250 nt, 300 nt, 400 nt, 500 nt, 750 nt, 1,000 nt, 1,250 nt, 1,500 nt, 1,750 nt, or 2,000 nt.
In some aspects, the total concentration of all DNA monomers is between 5 nM and 5 μM, 5 nM and 1 μM, 5 nM and 500 nM, 5 nM and 100 nM, 100 nM and 5 500 nM and 5 100 nM and 1 100 nM and 5 or any range derivable therein. In some aspects, the total concentration of all DNA monomers is about 5 nM, 10 nM, 20 nM, 50 nM, 100 nM, 200 nM, 500 nM, 1 μM, 2 μM, 3 μM, 4 μM, or 5 μM.
In some aspects, the total concentration of all DNA monomers is 1x to 1000x (e.g., 1x to 100x, 1x to 50 x, 10x to 1000x, 50x to 1000x, 100x to 1000x, 100x to 500x, or any range derivable therein) the concentration of partially double-stranded DNA seed molecules. In some aspects, the total concentration of all DNA monomers is 1x, 2x, 5x, 10x, 25x, 50x, 100x, 200x, 500x, or 1000x the concentration of partially double-stranded DNA seed molecules.
In one embodiment, provided herein are methods for linear assembly of DNA concatemers from a plurality of double-stranded DNA monomers, each monomer species comprising, from 5′ to 3′: a type IIS restriction site in the (+) orientation (S1), a designed Left sticky end DNA sequence (1), an insert sequence (A), a second designed Right sticky end DNA sequence (1*), and a type IIS restriction site in the (−) orientation (S1*); wherein at least two different DNA monomers comprise the same Left sticky end DNA sequence, wherein at least two different DNA monomers comprise the same Right sticky end DNA sequence, and wherein the Left sticky end DNA sequence and the Right sticky end DNA sequence are complementary to and can form Watson-Crick base pairs with each other; the method comprising: mixing the DNA monomers with a type IIS DNA restriction enzyme, a DNA ligase enzyme, and a chemical buffer suitable for the enzymatic functions of the type IIS DNA restriction enzyme and the DNA ligase enzyme; and thermal cycling the solution between 5 cycles and 100 cycles (e.g., 5-50 cycles, 10-50 cycles, 5-40 cycles, 10-40 cycles, or any range derivable therein), with each cycle comprising between 5 seconds and 5 minutes (e.g., 5-60 seconds, 5-120 seconds, 30-60 seconds, 30-120 seconds, 20-60 seconds, 20-120 seconds, or any range derivable therein) at a temperature between 30° C. and 45° C. (e.g., at 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, or 45° C., or any range derivable therein), and between 30 seconds and 30 minutes (e.g., 30-1000 seconds, 60-1000 seconds, 30-1500 seconds, 60-1500 seconds, 30-500 seconds, 60-200 seconds, 30-200 seconds, 60-500 seconds, 30-250 seconds, 60-250 seconds, or any range derivable therein) at a temperature between 10° C. and 25° C. (e.g., at 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25° C., or any range derivable therein).
In some aspects, a partially double-stranded DNA seed molecule is mixed with the monomer molecules before thermal cycling, the seed molecule comprising, from 5′ to 3′: a single-stranded Left sticky end DNA sequence (1); and a double stranded DNA region devoid of a type IIS restriction site (C). In some aspects, a partially double-stranded DNA seed molecule is mixed with the monomer molecules before thermal cycling, the seed molecule comprising, from 5′ to 3′: a Left sticky end DNA sequence (1); a double stranded DNA region devoid of a type IIS restriction site (C); and a Left sticky end DNA sequence (1). In some aspects, a partially double-stranded DNA seed molecule is mixed with the monomer molecules before thermal cycling, the seed molecule comprising, from 5′ to 3′: a Left sticky end DNA sequence (1); a double stranded DNA region devoid of a type IIS restriction site (C) and a unique barcode; and a sticky end DNA sequence (2) for appending adapters for nanopore sequencing.
In some aspects, the DNA monomers are generated by a method comprising:
amplifying a DNA template by multiplex polymerase chain reaction (PCR) amplification, comprising: adding to a DNA template solution (1) a set of forward DNA primers comprising, from 5′ to 3′: a type IIS restriction site in the (+) orientation (S1), a designed Left sticky end DNA sequence (1), and a gene-specific sequence; (2) a set of reverse DNA primers comprising, from 5′ to 3′: a type IIS restriction site in the (+) orientation (S1), a designed Right sticky end DNA sequence (1*), and a gene-specific sequence; (3) a DNA polymerase; and (4) a chemical buffer suitable for PCR amplification; thermal cycling the solution between 5 cycles and 60 cycles (e.g., 5-60 cycles, 10-60 cycles, 5-50 cycles, 10-50 cycles, 5-40 cycles, 10-40 cycles, or any range derivable therein), with each cycle comprising between 5 seconds and 1 minute (e.g., 5-60 seconds, 5-50 seconds, 30-60 seconds, 30-50 seconds, 20-60 seconds, 20-50 seconds, or any range derivable therein) at a temperature between 90° C. and 100° C. (e.g., at 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100° C., or any range derivable therein), and between 30 seconds and 2 minutes (e.g., 30-60 seconds, 60-120 seconds, 40-60 seconds, 40-120 seconds, or any range derivable therein) at a temperature between 55° C. and 72° C. (e.g., at 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, or 72° C., or any range derivable therein).
In some aspects, a set of gene-specific DNA Blockers are additionally added to the DNA template solution. In some aspects, the region of the DNA template that the Blockers bind overlaps with that of the forward DNA primers by between 4 and 15 nucleotides (e.g., 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 nucleotides, or any range derivable therein). In some aspects, the standard free energy of the forward primer displacing the Blocker at 60° C. in 5 mM Mg2+is between 0 kcal/mol and +5 kcal/mol (e.g., 0-4, 0-3, 0-2, 0-1, 1-5, 1-4, 1-3, 1-2, 2-5, 2-4, 2-3, 3-5, 3-4, 4-5 kcal/mol, or any range derivable therein).
In one embodiment, provided herein are methods of generating DNA monomers for linear assembly, the method comprising: obtaining a DNA sample solution that comprises a DNA template; amplifying the DNA template by multiplex polymerase chain reaction (PCR) amplification, comprising: adding to the DNA solution (1) a set of forward DNA primers comprising, from 5′ to 3′: a type IIS restriction site in the (+) orientation (S1), a designed Left sticky end DNA sequence (1), and a gene-specific sequence; (2) a set of reverse DNA primers comprising, from 5′ to 3′: a type IIS restriction site in the (+) orientation (S1), a designed Right sticky end DNA sequence (1*), and a gene-specific sequence; (3) a DNA polymerase; and (4) a chemical buffer suitable for PCR amplification; thermal cycling the solution between 5 cycles and 60 cycles(e.g., 5-60 cycles, 10-60 cycles, 5-50 cycles, 10-50 cycles, 5-40 cycles, 10-40 cycles, or any range derivable therein), with each cycle comprising between 5 seconds and 1 minute (e.g., 5-60 seconds, 5-50 seconds, 30-60 seconds, 30-50 seconds, 20-60 seconds, 20-50 seconds, or any range derivable therein) at a temperature between 90° C. and 100° C. (e.g., at 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100° C., or any range derivable therein), and between 30 seconds and 2 minutes (e.g., 30-60 seconds, 60-120 seconds, 40-60 seconds, 40-120 seconds, or any range derivable therein) at a temperature between 55° C. and 72° C. (e.g., at 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, or 72° C., or any range derivable therein). In some aspects, the forward and/or reverse primers further comprise a UMI barcode.
In some aspects, a set of gene-specific DNA Blockers are additionally added to the DNA template solution. In some aspects, the region of the DNA template that the Blockers bind overlaps with that of the forward DNA primers by between 4 and 15 nucleotides (e.g., 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 nucleotides, or any range derivable therein). In some aspects, the standard free energy of the forward primer displacing the Blocker at 60° C. in 5 mM Mg2+is between 0 kcal/mol and +5 kcal/mol (e.g., 0-4, 0-3, 0-2, 0-1, 1-5, 1-4, 1-3, 1-2, 2-5, 2-4, 2-3, 3-5, 3-4, 4-5 kcal/mol, or any range derivable therein).
In one embodiment, provided herein are methods for preparing a solution of heterogeneous DNA concatemers, the method comprising: preparing a set of DNA monomers from a DNA template sample according to the method of any one of the present embodiments; purifying the monomers to remove unreacted primers and enzymes; and performing linear DNA assembly according to the method of one of the present embodiments. In some aspects, purifying the monomers comprises using either an affinity column or magnetic beads.
In some aspects, a set of gene-specific DNA Blockers are additionally added to the DNA template solution. In some aspects, the region of the DNA template that the Blockers bind overlaps with that of the forward DNA primers by between 4 and 15 nucleotides (e.g., 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 nucleotides, or any range derivable therein). In some aspects, the standard free energy of the forward primer displacing the Blocker at 60° C. in 5 mM Mg2+is between 0 kcal/mol and +5 kcal/mol (e.g., 0-4, 0-3, 0-2, 0-1, 1-5, 1-4, 1-3, 1-2, 2-5, 2-4, 2-3, 3-5, 3-4, 4-5 kcal/mol, or any range derivable therein).
In one embodiment, provided herein are methods for targeted nanopore sequencing of gene regions of interest, the method comprising: obtaining a DNA sample of interest comprising a DNA template; preparing a set of DNA monomers from the DNA template according to the method of any one of the present embodiments; purifying the monomers to remove unreacted primers and enzymes; performing linear DNA assembly according to the method of any one of the present embodiments; purifying the concatemers to remove unreacted monomers, Type IIS reaction side products, and enzymes; appending adapters for nanopore sequencing to the purified concatemers; purifying the adapter-appended concatemers to remove excess adapters and enzymes; and performing nanopore sequencing.
In one embodiment, provided herein are methods for constructing a monomer species comprising, from 5′ to 3′: a type IIS restriction site in the (+) orientation (S1), a designed Left sticky end DNA sequence (1), an insert sequence (A), a second designed Right sticky end DNA sequence (1*), and a type IIS restriction site in the (−) orientation (S1*); wherein at least two different DNA monomers comprise the same Left sticky end DNA sequence, wherein at least two different DNA monomers comprise the same Right sticky end DNA sequence, and wherein the Left sticky end DNA sequence and the Right sticky end DNA sequence are complementary to and can form Watson-Crick base pairs with each other; the method comprising: obtaining a solution of double-stranded DNA inserts of interest; performing a first ligation reaction on a first portion of the solution with a double stranded DNA adaptor comprising: a type IIS restriction site in the (+) orientation (S1), and a designed Left sticky end DNA sequence (1); performing a second reaction ligation reaction on a second portion of the solution with a double stranded DNA adaptor comprising: a type IIS restriction site in the (+) orientation (S1), and a designed Right sticky end DNA sequence (1*); and mixing the products of the first and second ligations reactions in a solution in a chemical buffer conducive to ligation. In some aspects, the double-stranded DNA inserts are dA-tailed prior to performing the ligation.
In one embodiment, provided herein are methods for targeted nanopore sequencing of gene regions of interest, the method comprising: obtaining a DNA sample of interest comprising a DNA template; preparing a set of DNA monomers from the DNA template according to the method of one of the present embodiments; purifying the monomers to remove unreacted primers and enzymes; performing linear DNA assembly according to the method of any one of the present embodiments; purifying the concatemers to remove unreacted monomers, Type IIS reaction side products, and enzymes; appending adapters for nanopore sequencing to the purified concatemers; purifying the adapter-appended concatemers to remove excess adapters and enzymes; and performing nanopore sequencing.
In some aspects, the step of mixing the DNA monomers further comprises mixing with two single-stranded destructive probes, the first single-stranded destructive probe comprising, from 5′ to 3′, a type IIS recognition sequence (S1), and a Left sticky end DNA sequence (1); and the second single-stranded destructive probe comprising, from 5′ to 3′: a type IIS recognition sequence (S1), and the Right sticky end DNA sequence (1*). In some aspects, the concentration of the destructive probe is between 1x and 100x of the total concentration of the DNA monomers. In some aspects, the destructive probes have chemical modifications that prevents restriction digestion. In some aspects, the modifications are selected from phosphorothioate-substituted backbone, sugar modified nucleotides (e.g., 2′Fluoro, 2′-OMe), inverted DNA nucleotides, methylated bases, DNA with carbon spacers, or DNA with polyethylene glycol (PEG) spacers.
As used herein, “essentially free,” in terms of a specified component, is used herein to mean that none of the specified component has been purposefully formulated into a composition and/or is present only as a contaminant or in trace amounts. The total amount of the specified component resulting from any unintended contamination of a composition is therefore well below 0.05%, preferably below 0.01%. Most preferred is a composition in which no amount of the specified component can be detected with standard analytical methods.
As used herein the specification, “a” or “an” may mean one or more. As used herein in the claim(s), when used in conjunction with the word “comprising,” the words “a” or “an” may mean one or more than one.
The use of the term “or” in the claims is used to mean “and/or” unless explicitly indicated to refer to alternatives only or the alternatives are mutually exclusive, although the disclosure supports a definition that refers to only alternatives and “and/or.” As used herein “another” may mean at least a second or more.
Throughout this application, the term “about” is used to indicate that a value includes the inherent variation of error for the device, the method being employed to determine the value, the variation that exists among the study subjects, or a value that is within 10% of a stated value.
Other objects, features and advantages of the present invention will become apparent from the following detailed description. It should be understood, however, that the detailed description and the specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.
The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present invention. The invention may be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein.
Hetero-polymer assembly using LDA. Monomers of different family (A and B) have the same sticky ends 1 and 1*. Hybridization between family A and B monomers can occur during the ligation step leading to assembly of hetero-polymers containing monomers from different families. S1 is the type IIS restriction enzyme recognition site in the (+) orientation and S1* is the recognition site in the (−) orientation.
Preparation of DNA monomers by dA-tailing and ligation of LDA-adapters. LDA-adapterl and LDA-adpater2 are ligated to dA-tailed insert DNA in two separate ligation reactions. The ligated monomers from the two reactions are mixed and included in the LDA reaction. Monomers with LDA-adapter1 have sticky end 1, while monomers with LDA-adapter2 have sticky end 1* after the restriction step. These two monomer populations can ligate with each other during the ligation step to form linear assemblies.
NS read throughput for DNA assembled by LDA (mean size of 1577 nt) is comparable to throughput of DNA monomers without assembly (mean size of 317 nt).
NA18537 and 0.1% variant sample is 0.1% human gDNA NA18562 in NA18537. gDNA sample NA18562 bears the two SNPs that were detected. BDA probes were designed for the SNP rs3789806(C>G), while the SNP rs9648696(T>C) occurs in cis. In the top panel, fraction of reads at each nucleotide position corresponding to the wildtype (NA18537 homozygous) allele is plotted. The two SNPs can be clearly detected in the 0.1% variant sample. The bottom panel shows the ΔVariant allele %, which is the fraction of reads mapped to the highest frequency variant allele in the 0.1% variant sample, minus the variant allele frequency in the matched normal 0% variant sample.
Provided herein are methods to assemble short DNA molecules (e.g., PCR amplicons) into long, linear concatemers using type IIS restriction enzyme digestion and ligation by DNA ligase. The provided methods and reagents improve assembly length. In nanopore sequencing, the number of DNA molecules that can be sequenced by a flow cell is similar regardless of the length of each DNA molecule, so the provided methods greatly improve the effective throughput of nanopore sequencing. The higher effective sequencing depth can also improve the limit of detection for mutations including single nucleotide variants and small insertions/deletions.
The provided Linear DNA Assembly (LDA) methods use Type IIS restriction enzyme digestion and DNA ligation to assemble many DNA monomers into a long linear concatemer. These methods discourage the formation of circular concatemer products (which cannot be sequenced by NS), (2) do not require assembly of different components in a pre-determined order and are suitable for a variety of NS panels with variable panel sizes, and (3) include several molecular innovations to increase the average length of the assembled concatemer.
I. Linear DNA Assembly
Linear DNA assembly (LDA) using type IIS restriction digestion and ligation requires each DNA monomer molecule to have one end with a type IIS restriction site in the plus (+) orientation (S1) followed by a designed base region (being 2-6 nucleotides in length, e.g., 4 bases) that serves as a sticky end sequence referred to as the Left sticky end sequence (1). The other end of the monomer has the type IIS restriction site in the minus (−) orientation (S1*) followed by a designed base region (being 2-6 nucleotides in length, e.g., 4 bases) that serves as a sticky end sequence referred to as the Right sticky end sequence (1*). The Left and Right sticky end sequences are designed to be complementary to each other.
A typical one pot assembly reaction contains 3 pmol to 7 pmol of DNA monomers, 30 U to 60 U of a type IIS restriction enzyme (e.g., BsaI), and 1000 U to 2000 U of DNA ligase (e.g., T4 DNA ligase) in buffer containing 50 mM Tris-HCl, 10 mM MgCl2, 10 mM DTT, and 1 mM ATP at pH 7.5. As shown in
In an exemplary homopolymer assembly reaction (
Double-stranded DNA inserts of any size can be assembled by linear DNA assembly, if the required end sequences as mentioned above are present. Adapters containing the end sequences can be added to any DNA insert by dA-tailing and ligation, as shown in
End sequences for assembly can be added to any DNA insert by PCR, as shown in
As mentioned above, self-hybridization of a molecule during assembly can lead to circularization (
IV. Blocking Side Product Assembly
During the restriction step of the assembly reaction, in addition to monomers with sticky ends (1 and 1*) two side products, SP1 and SP2 with Right sticky end sequence (1*) and Left sticky end sequence (1) respectively are formed (
Preliminary NS analysis of read length was performed for a 182 bp DNA assembled by LDA on a DNA seed by bi-directional assembly and using destructive probes (Table 1). The use of a DNA seed and destructive probes improved the length of the assembled DNA by around 56% compared to normal LDA.
Table 1. NS data showing increase in read length by LDA with seed and destructive probes
Long-read sequencing platform based on nanopores like Oxford Nanopore Sequencing (NS) have several advantages over short-read sequencers (e.g., Illumina) such as being able to produce real-time data, rapid library prep, portability, and low capital cost. But NS suffers from a higher intrinsic error rate of roughly 10% compared to 0.2% for Illumina and also produces lower number of reads compared to Illumina. This prevents the use of NS for rare variant detection. Variant enrichment strategies that can enrich rare variants over the intrinsic error rate of NS can potentially enable use of NS for rare variant detection. Variant enrichments methods like Blocker Displacement Amplification (BDA), ICE COLD PCR, or PNA-blocker PCR produces short amplicons of 100 bp - 300 bp in length. PCR that produces short amplicons are routinely used in a number of diagnostic assays like cell-free DNA (cfDNA) analysis and in assays designed for short-read sequencing platforms. But sequencing short DNA (<300 bp) on NS produces reads of low quality and yield. Linear DNA assembly to assemble short DNA into long assemblies can enable NS to produce reads of higher quality and yield for short amplicon sequencing.
NS can sequence ultra-long reads up to several Mbs in size. Therefore, higher order assemblies of short amplicons are needed to utilize the full potential of NS. In molecular cloning, assembly by type IIS restriction and ligation (i.e., Golden Gate assembly) is one method for cloning up to 20 inserts into vectors. Gibson assembly, which is another method for cloning is used for cloning up to only 5 inserts, due to lower efficiency of assembly for higher number of inserts. Gibson assembly also requires longer sticky ends around 20 bases in length. This requires two separate PCR reactions to attach end sequences for assembly on to DNA inserts. Since, forward and reverse primers with 20-base complementarity will form primer dimers even at elevated temperatures of around 55° C.-72° C. that are typically used during the annealing and extension step of PCR and impair PCR amplification of the desired DNA insert. In the LDA method provided herein, sticky ends (1 and 1*) are only 4 bases long and hence will not form primer dimers during the annealing and extension steps of PCR. As such, LDA-adapter forward and reverse primers designed as depicted in
Circularization of short DNA and Rolling Circle Amplification (RCA) of the circular DNA can generate long single stranded DNA (ssDNA) composed of multiple copies (up to 50 copies) of the same DNA sequence. But NS cannot sequence ssDNA directly, since dsDNA sequencing adaptors containing bound motor proteins are ligated to ends of DNA to be sequenced. The motor proteins are needed for translocation of DNA through the nanopore for sequencing. Even if the ends of the DNA are made double stranded by hybridizing short oligos to the ends, the presence of significant structure in the ssDNA region of the RCA product interferes with NS. To generate dsDNA from RCA, random hexamers can be used during RCA. But this method generates highly branched DNA that needs to be debranched before NS. The presence of random hexamers also generates non-specific amplification products. Therefore, though RCA can generate long DNA fragments from short DNA, it is laborious involving multiple steps (circularization, exonuclease digestion to remove linear DNA, RCA and conversion to dsDNA/debranching) making it incompatible for rapid library preparation for NS. Therefore, the LDA method provided herein, which can rapidly assemble short amplicons into relatively long assemblies, is ideally suited for NS.
Library preparation for NS involves ligating barcodes for sample identification followed by ligation of an NS adapter containing a motor protein. The motor protein on the NS adapter is necessary to regulate the speed of DNA translocation through the nanopore for proper interpretation of the DNA sequence. As depicted in
Here, the provided methods shorten NS library preparation time. The methods involve use of a barcode adapter seed that contains a single stranded Left sticky end sequence (1), a double stranded DNA region devoid of type IIS restriction site but containing a unique barcode sequence for sample identification (BC), and a Right sticky end sequence (2) for NS adapter ligation (
Table 2. Comparison of NS by normal library prep after LDA and shortened library prep after LDA with barcode adapter seed
Low VAF detection is essential for diagnostic applications in cancer.
Commercial tests based on the Illumina platform, such as FoundationOne and whole exome sequencing, for analysis of tumor mutation burden provide detailed information on potential pathogenic mutations for guiding therapy selection. However, short-read sequencers like Illumina are less suitable for the analysis of large deletions, fusions, and copy number variations. In addition, library preparation for Illumina sequencing typically takes 24 hours, with the sequencing run taking another 2 days and bioinformatic interpretation taking 1-2 days. Consequently, analysis of cancer samples can take a minimum of 4 days from sample to answer. Illumina instruments also require significant capital investment. As such, samples have to be sent to a centralized location for sequencing, which adds additional time for sample processing. These limitations can be overcome by enabling low VAF detection on the NS platform. NS is already well-suited for the analysis of DNA structural variants and copy number variants due to its long-read capability. Adding the capability of low VAF detection to NS will make it the preferred platform for rapid and comprehensive analysis of cancer genomics.
The variant enrichment method, BDA (Wu et al., 2017; US 2017/0067090; and WO 2019/164885, each of which is incorporated herein by reference in its entirety), was combined with LDA (
Low VAF detection on NS demonstrated above and in Example 3 makes NS suitable for mutation detection in cancer. Acute Myeloid Leukemia (AML) is a type of blood cancer in which the bone marrow produces abnormal red blood cells, platelets or myelobalsts. Previously, NS could detect only mutations with >20% VAF because of its high error rate. A 7-plex NS AML panel was designed for detecting mutations in 6 genes at 7 loci, which are involved in AML with a sensitivity of 1% VAF. Mutations in all 7 loci are detected in a single multiplex-reaction following the workflow in
Melanoma is a type of skin cancer in which pigment producing cells called melanocytes become mutated causing cancer. A 15-plex NS melanoma panel was designed for detecting mutations in 9 genes at 15 loci, which are involved in melanoma with sensitivity of 1% VAF in a single reaction. The panel can detect mutations in MAP2K1, MAP2K2, AKT1, AKT3, NRAS, KRAS, PIK3CA, and BRAF genes.
The panel was further tested on genomic DNA extracted from a fresh frozen melanoma clinical tissue sample (
Next, the melanoma panel was applied to 25 clinical melanoma tissue samples, including both fresh/frozen (FF) and FFPE tissue (
Importantly, many of the 153 discordant called variants based on a 20% VRF threshold could be real mutations missed by NGS. To confirm the discordant NS mutation calls, droplet digital PCR (ddPCR) was performed on 6 FFPE samples at 4 mutation loci (BRAF p. V600, KRAS p. G13, KRAS p. E62, and MAP2K1 p. P124). Of these 24, 11 mutations were called positive by NS, and 13 were called negative by NS. NS was concordant with ddPCR for 10 positive samples and 11 negative samples (
Next, the reproducibility and robustness of the NS panel was characterized on different types of nanopore sequencing instruments and flow cells. The Oxford Nanopore Flongle flow cell, in particular, is relatively inexpensive at $90, and can further reduce turnaround time relative to MinION by reducing the need for sample batching before sequencing. The NS panel was performed on all 25 melanoma samples on the Flongle. Highly quantitatively similar VRFs were observed as compared to the MinION (
These results show that BDA combined with LDA can enable NS to be used for cancer mutation profiling in the clinic.
An embodiment of an algorithm to analyze NS reads from FASTS files is described below. Similar algorithms from FASTS or FASTQ files can similarly be constructed by one of ordinary skill in the art of bioinformatic processing of sequencing data.
The following examples are included to demonstrate preferred embodiments of the invention. It should be appreciated by those of skill in the art that the techniques disclosed in the examples which follow represent techniques discovered by the inventor to function well in the practice of the invention, and thus can be considered to constitute preferred modes for its practice. However, those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments which are disclosed and still obtain a like or similar result without departing from the spirit and scope of the invention.
The results presented in
Amplicons from a typical BDA reaction were used as the template for PCR with LDA-adapter primers as shown in
ow VAF detection on NS demonstrated above and in Example 3 makes NS suitable for mutation detection in cancer. Acute Myeloid Leukemia (AML) is a type of blood cancer in which the bone marrow produces abnormal red blood cells, platelets or myelobalsts. Previously, NS could detect only mutations with >20% VAF because of its high error rate. A 7-plex NS AML panel was designed for detecting mutations in 6 genes at 7 loci, which are involved in AML with a sensitivity of 1% VAF. Mutations in all 7 loci are detected in a single multiplex-reaction following the workflow in
Melanoma is a type of skin cancer in which pigment producing cells called melanocytes become mutated causing cancer. A 15-plex NS melanoma panel (Table 3) was designed for detecting mutations in 9 genes at 15 loci, which are involved in melanoma with sensitivity of 1% VAF in a single reaction. The panel can detect mutations in MAP2K1, MAP2K2, AKT1, AKT3, NRAS, KRAS, PIK3CA, and BRAF genes.
Next, the melanoma panel was applied to 25 clinical melanoma tissue samples, including both fresh/frozen (FF) and FFPE tissue (
Importantly, many of the 153 discordant called variants based on a 20% VRF threshold could be real mutations missed by NGS. To confirm the discordant NS mutation calls, droplet digital PCR (ddPCR) was performed on 6 FFPE samples at 4 mutation loci (BRAF p. V600, KRAS p. G13, KRAS p. E62, and MAP2K1 p. P124). Of these 24, 11 mutations were called positive by NS, and 13 were called negative by NS. NS was concordant with ddPCR for 10 positive samples and 11 negative samples (
Next, the reproducibility and robustness of the NS panel was characterized on different types of nanopore sequencing instruments and flow cells. The NS panel was performed on all 25 melanoma samples on the Oxford Nanopore Flongle flow cell. Highly quantitatively similar VRFs were observed as compared to the MinION (
These results show that BDA combined with LDA can enable NS to be used for cancer mutation profiling in the clinic.
ATCCTCTCTCTGAAATCACTGAGCAGG/iSpC
All of the methods disclosed and claimed herein can be made and executed without undue experimentation in light of the present disclosure. While the compositions and methods of this invention have been described in terms of preferred embodiments, it will be apparent to those of skill in the art that variations may be applied to the methods and in the steps or in the sequence of steps of the method described herein without departing from the concept, spirit and scope of the invention. More specifically, it will be apparent that certain agents which are both chemically and physiologically related may be substituted for the agents described herein while the same or similar results would be achieved. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope and concept of the invention as defined by the appended claims.
The following references, to the extent that they provide exemplary procedural or other details supplementary to those set forth herein, are specifically incorporated herein by reference.
The present application claims the priority benefit of U.S. provisional application No. 62/940,127, filed Nov. 25, 2019, the entire contents of which is incorporated herein by reference.
This invention was made with government support under Grant Nos. R01CA203964 and R01HG008752 awarded by the National Institutes of Health. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2020/062201 | 11/25/2020 | WO |
Number | Date | Country | |
---|---|---|---|
62940127 | Nov 2019 | US |