This application is the U.S. National Phase Entry of PCT. App. No. PCT/GB2015/051736 filed on Jun. 12, 2015 and published in English as WO 2015/189637 on Dec. 17, 2015 which claims the benefit of GB 1410646.2 filed on Jun. 14, 2014, the contents of which are each incorporated by reference in its entirety.
The present invention provides methods of increasing sequencing accuracy. For example, the methods of the invention use linear amplification to generate a typically relatively small population of complementary strands from template molecules wherein the complementary strands generated from a particular template are retained in close proximity to each other, thus correcting errors at a molecule by molecule level. The linearly amplified populations may be subsequently amplified into a larger clonal population of DNA molecules e.g. by exponential amplification for sequencing.
Errors may be introduced into DNA during first extension in a clonal amplification protocol.
Incorporation errors made in the first or second cycle of the clonal amplification process produce a mixed signal when that particular position is sequenced.
In one embodiment, the method of the invention may be used to substantially reduce or eliminate high quality errors that may be generated during first extension.
In another embodiment, the methods of the invention may also be used to reduce errors caused by mis-incorporation of nucleotides during the first few cycles of amplification.
According to the present invention there is provided a method of sequencing with improved accuracy comprising;
providing a nucleic acid template;
producing, by linear amplification directly from the template nucleic acid, a population comprising a plurality of complementary strands retained in close proximity to each other or identifiable as being obtained from the same template nucleic acid; and
performing a sequencing reaction on said proximity retained (e.g. surface bound) oligonucleotides.
Preferably there are a least three rounds of linear amplification.
By having a minimum number of three rounds of linear amplification the effects of any introduced errors will be diluted compared to typical exponential amplification methods. For example, if we assume that starting from an ssDNA strand A on the surface, that linear amplification just makes a copy A′ at each round, while exponential amplification makes A′ in the 1st round, but then both A and A′ in subsequent rounds, then a mistake made in making A′ at the 1st round would result in the A′ strands being 100% wrong at round 1 (for both amplifications), 50% wrong at round 2 (for both amplifications) but then at round 3 only ⅓ of the A′ strands has a mistake in linear amplification, but for exponential amplification 2/4 of the A′ strands has a mistake. After this, using exponential amplification continues to make 50% of the A′ strands with a mistake, while with the present method, using linear amplification of the nucleic acid template, the mistake gets ever more diluted. The following example shows a comparison of error propagation for linear amplification and exponential amplification, where A and A′ denoted the strands being made, and bold and underlined indicates a mutation containing strand.
Preferably the method further comprises the step of carrying out further (exponential) amplification of the population of complementary strands after the rounds of linear amplification and prior to performing the sequencing reaction.
Optionally, the population comprising a plurality of complementary strands obtained by linear amplification directly from the template nucleic acid are retained in close proximity to each other as they are bound to a surface.
Alternatively the population comprising a plurality of complementary strands obtained by linear amplification directly from the template nucleic acid are retained in close proximity to each other as they are retained in a gel or an emulsion which prevents or limits diffusion.
Preferably, a plurality of nucleic acid templates are provided and a plurality of populations of complementary strands are produced with each population being derived directly from a template and the members of each template derived population being retained in close proximity to each other.
Advantageously, by creating molecules by linear amplification of a particular surface bound library molecule prior to clonal amplification, the molecules created by said linear amplification remain in close proximity. Surface binding in this manner leads very easily to further sequencing steps. The retention of populations of linearly amplified molecules derived from the same template in close proximity can also be achieved by hybridising or attaching the amplified molecules after linear amplification e.g. if rolling circular amplification is used. Retention in close proximity can also be achieved by carrying out the linear amplification rounds inside a gel or an emulsion to prevent free diffusion. This allows the dilution effect to be exploited such that any mis-incorporation error will get diluted out (as it is extremely unlikely that two molecules in the population of complementary strands created by linear amplification will carry the same mutation). Whilst the linear amplification step entirely in solution (i.e. without retention of complementary strands in close proximity for a population obtained directly from the same nucleic acid template) would reduce overall error rates it would still result in “high quality errors” being carried through as a mis-incorporation step (whether in the in solution linear amplification step or in an exponential amplification step) in such a situation would result in a cluster carrying the mutation. The present method allows discrimination between artefacts due to extension errors, as they get diluted out during (surface bound or otherwise proximity retained or diffusion limited) linear amplification and real mutations in the sample itself.
Optionally the method includes the step of attaching the nucleic acid template to the surface. This may be by hybridising said nucleic acid template to a first primer located on the surface. Alternatively the template nucleic acid may be attached directly to the surface.
Preferably the further amplification of the population of complementary strands utilises further amplification primers attached to the surface, said further amplification primers not having been involved in the linear amplification steps.
Optionally the linear amplification (directly from the nucleic acid template) includes the steps of;
hybridising said nucleic acid template to a first primer;
extending the first primer to produce a complementary strand to the template;
denaturing to release the complementary strand which remains in close proximity (e.g. it remains bound to the surface and thus does not travel at all or does not diffuse far before re-hybridising nearby); and
repeating the hybridisation and amplification steps to produce a population of surface bound complementary strands obtained directly from the template nucleic acid.
Optionally the linear amplification could use recombinase polymerase amplification (RPA).
Optionally the linear amplification could use “Wildfire™” (Life Technologies) type amplification to make linear copies of a template strand by a template walking process.
Preferably the clonal amplification step is bridge amplification.
Preferably the method includes the step of sequencing.
Optionally the first primer is bound to a solid support.
Preferably the solid support is a planar element.
Preferably the solid support is a flow cell.
Optionally the solid support is a bead.
Alternatively the single stranded template nucleic acid is circularised prior to being attached to the surface of the solid support e.g. by hybridisation.
Circularisation may use a circligase.
Rolling circular amplification (RCA) results in a concatemer of complementary strands obtained directly from the template nucleic acid.
Preferably the concatemer is then bound to a solid support.
Preferably the solid support comprises a plurality of second primers or fragments thereof.
Fragments of the second primer are not of sufficient size to allow for direct extension under conditions used in the linear amplification step.
Preferably the method includes the step of hybridising reverse complement oligonucleotides to second primer fragments wherein said reverse complement oligonucleotides comprise additional bases at their 5′ end that are part of the full length second primer sequence.
The hybridised primers are copied in order to elongate the surface primers thus making them suitable for further use in e.g. exponential amplification.
Optionally the second primers or fragments thereof are grafted onto attachment points.
Grafting may be via a streptavidin/biotin linkage or by streptavidin/dual biotin linkage.
Optionally only about one-half of grafting sites on the solid support are occupied by first primers.
There could alternatively be a skewed ratio.
Optionally the grafting process may be controlled by oligonucleotide concentration, incubation time, and/or temperature.
Preferably second primers are subsequently grafted onto the remaining available grafting attachment points.
Grafting may be by using thiophosphate or click chemistry
Click chemistry includes other chemistries that may be used quickly and reliably to join small units together.
Optionally, linear amplification steps may be carried out at a temperature significantly above the melting temperature (Tm) of the second primers or fragments thereof and the second part of the amplification is performed at a temperature which is below the Tm of the second primers.
Optionally when bound to a solid support at least some of the first primers are blocked.
Such primers can then be unblocked after at least initial linear amplification rounds using standard techniques and can be used for further amplification rounds.
According to a further aspect there is provided a method of sequencing comprising the linear amplification steps of the first aspect.
In order to provide a better understanding of the present invention, embodiments will be described, by way of example only and with reference to the following drawings in which;
In certain non-illustrated embodiments, the methods of the invention use a gel or emulsion to prevent significant diffusion of populations of complementary strands formed by linear amplification of specific template nucleic acids/library molecules.
In various embodiments, the methods of the invention use a solid surface that comprises a single “full length” surface-bound oligonucleotide primer (e.g., full length P5 primers) and linear amplification to produce a small clonal population of DNA molecules.
In one step,
In another step,
In another step,
Complementary strand 515b is also bound to the surface of solid support 410 via P5 oligonucleotide primer 415.
In another step,
In another step,
In another step,
In the linear amplification process, only the original seeded DNA molecules are used as templates to produce new molecules. Because only the original seeded DNA molecules are used as templates, errors that may be introduced during linear amplification do not get propagated. For example, the likelihood of introducing the same error at the same position in all of the molecules that are part of a small clonal population (produced by linear amplification) is extremely low. For example, if the accuracy of the polymerase being used is 99.9% (i.e., one error every 1,000 nucleotides), the probability of getting the same error at the same position in “n” number of molecules is equal to (0.001)n. Therefore, the probability of having a mutation at the same position in two molecules is one in a million and in three molecules is one in a billion. Note that these calculations do not take into account that the mutations introduced could be different from one another. For example, if T is the correct base, one molecule could carry an A whereas another molecule could carry a C mutation. Creation of multiple copies from the original molecule through linear amplification leads to a significant improvement in accuracy, i.e., virtually an error free technology compared to a 0.1% error in a standard amplification procedure that does not include an initial step of linear amplification. The high level of accuracy that can be achieved with the methods described herein is particularly beneficial when performing certain applications. For example, when looking at somatic mutations (i.e. cancer), consensus coverage cannot be used to “weed out” sequencing errors due to mis-incorporation during the first extension and bridge amplification steps. By using this method, one can have greater confidence that a particular SNP is a bona fide somatic mutation and not a mis-incorporation event.
In another example, if an error is introduced during the first round of linear amplification and then an additional nine cycles of linear amplification produce nine more molecules with the correct base, about 90% of the signal will be represented by the correct base. Because 90% of the signal will be represented by the correct base, it may still be possible to call the correct base. In contrast, in a standard clonal amplification protocol (i.e., first extension followed directly by surface amplification with both primers present on the flow cell surface and actively participating in the amplification step), an error in the first cycle of amplification may lead to approximately 50% of the molecules carrying the mutation (although through stochastic effects this percentage could be significantly higher). Because approximately 50% (or perhaps higher) of the molecules may carry the mutation, it is very difficult to ascertain the correct base.
In one step,
In another step,
In another step,
In another step,
In another step,
In another step,
A small clonal population of complementary strands generated by linear amplification on a flow cell surface comprising short P7 oligonucleotide primers may be further amplified after converting the short P7 oligonucleotide primers to full length primers. Alternatively, further amplification cycles can be carried out at lower temperature that enables the short P7 primers to participate in the amplification process as described previously.
In another step,
In another step,
In another step,
In another step,
In one step,
In another step,
In another step,
In another step,
In one step,
In another step,
In another step,
In one step,
In another step,
In another step,
In another step,
In another step,
In another embodiment of the invention, the solid support may be a bead. In one example, a bead-based linear amplification may be performed in an emulsion to ensure clonality of the amplified population. In another example, individual beads may be seeded in separate nanowells to ensure clonality of the amplified population. In yet another example, the beads may be suspended inside a gel-like solution to prevent or substantially minimize free movement of the beads.
In another embodiment of the invention, linear amplification may be carried out on a bead. Te bead (or beads) may be inside a flow cell during linear amplification, or the linear amplification can occur outside of the flow cell and the bead or beads are flowed into the flow cell after the liner amplification rounds or steps. Nucleic acid (DNA) can then be copied onto the beads to make copies on the surface or alternatively the nucleic acid molecules could be released from the beads using standard techniques and allowed to diffuse slightly on the surface of the flowcell be kept localised in close proximity. Sequencing could be carried out here using known techniques e.g. sequencing by synthesis, or clusters can then be formed i.e. using bridge amplification, (effectively giving clusters formed from and within the original linearly amplified populations) and consensus sequences determined using known sequencing techniques. For example if 10 clusters with the same genomic sequences have a G at position 9 and one has a T it is possible to discard the T and call the base confidently as a G. Sequences can be grouped based on positional information and also on sequence information.
Unique molecular identifiers (UMIs) can also be used in the proposed methods. If UMIs were implemented in the library of template nucleic acids it would be possible to use the UMI information to identify “sister” molecules i.e. molecules that were derived from the same template nucleic acid.
In yet another example, beads could be made to stick to glass wool or metal wool to prevent them from moving around the solution.
Number | Date | Country | Kind |
---|---|---|---|
1410646.2 | Jun 2014 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/GB2015/051736 | 6/12/2015 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2015/189637 | 12/17/2015 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
6300070 | Boles | Oct 2001 | B1 |
6514706 | Von Kalle | Feb 2003 | B1 |
7115400 | Adessi | Oct 2006 | B1 |
7985565 | Mayer | Jul 2011 | B2 |
20090203085 | Kurn et al. | Aug 2009 | A1 |
20090305288 | Nakamoto | Dec 2009 | A1 |
20100112558 | Gao | May 2010 | A1 |
20120053063 | Rigatti | Mar 2012 | A1 |
20140024537 | Rigatti | Jan 2014 | A1 |
Number | Date | Country |
---|---|---|
WO 199844151 | Oct 1998 | WO |
WO2002059353 | Aug 2002 | WO |
WO 2009102878 | Aug 2009 | WO |
WO 2011025477 | Mar 2011 | WO |
WO 2013117595 | Aug 2013 | WO |
WO 2013131962 | Sep 2013 | WO |
Entry |
---|
Ma et al., “Isothermal amplification method for next-generation sequencing,” Aug., vol. 110, No. 35, pp. 14320-14323 (Year: 2013). |
Mitra et al., “In situ localized amplification and contact replication of many individual DNA molecules,” Nucleic Acids Research, vol. 27, No. 24, e34, pp. i-vi. (Year: 1999). |
Grisedale, et al, Linear amplification of target prior to PCR for improved low template DNA results, BioTechniques (2014) retrieved from the internet: URL: http://www.ncbi.nlm.nih.gov/pubmed/24641479. |
Sengupta, et al, Single Read and Paired End mRNA-Seq Illumina Libraries from 10 Nanograms Total RNA, Journal of Visualized Experiments, 56(27) 2011. |
Svensen, et al, Microarray Generation of Thousand-Member Oligonucleotide Libraries, PLOS One 6(9):e24906 (2011). |
Vilain, et al, Small amplified RNA-SAGE, Methods of Molecular Biology, 258(1):135-152 (2004). |
Greene et al, “Transduction of human CD34+ repopulating cells with a self-inactivating lentiviral vector for SCID-X1 produced at clinical scale by a stable cell line” Human Gene Therapy Methods 23:299-300 (2012). |
Voigt et al, “Retargeting sleeping beauty transposon insertions by engineered zinc finger DNA-binding domains”, Mol Ther, vol. 20:1852-62 (2012). |
Search Report issued in Application No. GB1410646.2 dated Mar. 10, 2015. |
Number | Date | Country | |
---|---|---|---|
20170137876 A1 | May 2017 | US |