This invention relates to the use of probes for the processing of nucleic acid regions of interest (ROIs), and to methods of probe hybridisation and repetitive sequence blocking with non-deoxy nucleic acid sequences, or their synthetic, non-natural equivalents. The various aspects of the invention increase the relative fidelity and effectiveness of probe hybridisation to nucleic acid ROIs to which they were designed to hybridise, versus other hybridisation events. The invention further relates to novel nucleic acid probes and their uses, as well as the use of novel non-deoxy nucleic acid sequences, or their synthetic, non-natural equivalents, used to block or mask surfaces.
For several decades, nucleotide sequences covalently attached to detectable chemistries (probes) have been used to hybridise to and detect or enrich regions of interest (ROIs) comprising nucleic acid sequences within the genomes and transcriptomes of numerous species. Probes of varying sizes have been used for numerous applications ranging from short synthetic oligonucleotides to detect single nucleotide changes in a single ROI, to whole genomes allowing analysis of structural variation between genomes. The techniques can also be applied such that nucleotide sequences are covalently attached to surfaces (surface probes), e.g. microarrays or beads, and the ROI DNA/RNA itself is labelled with a detectable chemistry.
Modern nucleotide and structural analyses often use next generation sequencing (NGS). NGS platforms process immense numbers of DNA fragments, resulting in extremely low cost per base of sequence. Nevertheless, current whole genome sequencing (WGS) performed in ways that ensure that most bases are sequenced a sufficient number of times to permit accurate analysis, costs in excess of £1000 per genome merely to generate the raw data. WGS also outputs vast amounts of sequence requiring storage and expert analysis, so it is not yet feasible to routinely sequence complex genomes in their entirety. This is particularly true in many healthcare and research applications where ROIs comprise only a few genes and WGS yields vast excesses of other genomic sequences. This also presents ethical considerations, such as whether the additional data should be stored and analysed outside of the remit of the investigation and whether the result of such analyses should be disclosed to the patient.
Targeting NGS towards only ROIs reduces the requirement for resources compared to WGS. A number of methods have been developed to recover ROIs, one of which is hybridisation target enrichment (hTE). hTE utilises nucleotide sequences (i.e. probes), or synthetic non-naturally occurring equivalents, attached to recoverable rather than detectable chemistries (recoverable probes are also referred to as ‘baits’). The recoverable probes preferentially hybridise to the ROIs and allow physical enrichment of the ROIs over other genomic regions. Elution from the recoverable probe results in a useful degree of purification of the ROIs. hTE can be used for many applications, but is commonly used to enrich ROIs prior to NGS making it more practical and affordable to sequence large numbers of samples, expanding the use of NGS to many more settings.
The cost effectiveness of the leading hTE technologies is good for ROIs greater than at least a few Mbp. Many hTE kits are optimised to recover whole exomes, and this increases associated costs and requirement for resources when only a subset of genes are the focus of the study (though the cost is still significantly less than for WGS). Various reasons exist for the tendency of these kits to target enrichment of exomes or specific gene panels—such as the fact that when targeting larger ROIs poorly effective enrichment methods can still generate a product wherein the majority of the recovered DNA comes from the ROI rather than other genomic regions. The innate ‘Enrichment Power’ of a method is therefore important.
The Enrichment power (EP or EF) of a method is its efficiency at recovering the targeted ROI compared to its efficiency at recovering other genome regions, and it is calculated as:
Considering, for example, the alternatives of targeting ROIs comprising a whole exome (˜60 Mb), or a 3 Mb region, or a 300 kb region in the 3,000 Mb human genome, with a requirement that at least 80% of the final sequences from NGS overlap the ROI. These three scenarios would require EPs of at least 40, 800 and 8000 respectively, for a method to be suitable. An EP of ˜8000 is unachievable using currently available products. Hence, enrichments of ROIs smaller than many hundred kb are more suited to approaches with vastly superior levels of enrichment specificity, e.g. PCR based procedures (though for larger ROIs, PCR based approaches become impractical). But if targeting a 3 Mb ROI, the required EP of ˜800 falls within the top end range of current products. The whole exome enrichment, requiring an EP of ˜40, is easily achievable using current products.
While hTE is attractive when fully optimised for a particular ROI (or handful of ROIs), e.g. in a diagnostic setting, applying the same protocol to many different patients without prior optimisation typically leads to unpredictable EP and inconsistent read depth for each ROI base when NGS sequences are aligned to the ROI sequence. This is because genome sequence characteristics vary considerably from region to region, meaning that different ROIs will often have very different repeat sequence and base composition content which require different enrichment reaction conditions.
Regions of genomic DNA (gDNA) sequence containing a high (>70%) GC content tend to denature inefficiently even at very high temperatures. This is further confounded by rapid re-annealing of any fragments that are not fully denatured, once the temperature is subsequently reduced, resulting in these regions exhibiting poor accessibility to probe and poor recovery. In contrast, regions with a low (<30%) GC content tend to denature rapidly but hybridise poorly to probes, again leading to poor recovery. This also affects ROI detection with surface probes reliant on single stringency hybridisation and washing conditions. Since the extreme variability of GC content throughout the genome makes it unlikely that a single set of conditions will be suitable for every surface probe on a microarray, the consequential non-specific hybridisation of some sequences under any one set of conditions has contributed to a reduction in the popularity of microarray based approaches.
Complex genomes typically contain many sequences that are highly similar or identical to sequences at other places in the genome. These ‘repeat sequences’ comprise well over 60% of the human genome and the majority of exons in the human genome are within a few hundred bp of repeat sequence, or may even have repeat sequence within them. Repeat sequences represent challenges to methods based upon hybridisation because of cross-hybridisation between repeats in ROIs and similar non-ROI copies of those repeats, even under high stringency conditions. This can result in the formation of networks of many ROI and non-ROI DNA fragments that include repeat sequences, leading to: a) poor specificity when using probes to detect an ROI: and b) recovery of genomic regions from outside an ROI, resulting in reduced EP, when performing hTE.
Hybridisation based approaches rarely rely solely on the use of stringent conditions (e.g. high temperatures and low salt concentrations) to favour preferential hybridisation of probes and reduce networking. An excess amount of competitor DNA, e.g. Cot-1 DNA, is commonly used to preferentially hybridise to (mask or block) repeat sequences making them less available for non-preferential probe/probe hybridisation and network formation. A disadvantage of Cot-1 DNA is that it masks only a proportion of repetitive sequences and there is evidence that it may actually stabilise the above mentioned networks. Another disadvantage of Cot-1 DNA is that it cannot be easily removed from final reaction products, in DNA based applications. The ability to remove such a blocker would be advantageous as it would promote the destabilisation of the above mentioned networks.
Common to the majority of hybridisation based approaches is the requirement for a solid surface: e.g. nylon membranes (Southern blotting); glass microarray surfaces (microarray based ROIs detection and hTE); and coated paramagnetic beads (used to recover probes in solution based hTE). However, DNA and RNA can interact with surfaces largely or completely irrespective of DNA sequence content, resulting in poor specificity when detecting ROIs, and the unwanted recovery of non-targeted molecules when enriching ROIs. Surfaces can be pre-treated with various blocking agents, such as bovine serum albumin (BSA) and polyvinylpyrrolidone (PVP), or even the DNA/RNA of an unrelated species. Blocking agents interact with the surfaces and thereby shield and prevent the surfaces from interacting with sample DNA/RNA molecules.
For the above reasons, manufacturers prefer to produce specific, highly optimised, kits designed to recover pre-defined larger ROIs. So clearly there is still a need for improvements to hTE methodologies, to enable the highest possible enrichment power while reducing sensitivity to variations in ROI sequence properties, and improvements that lower the cost of hTE. This is particularly true for customisable hTE methods that target ROIs between a few tens of kb to a few Mb.
It is therefore an aim of embodiments of the present invention to overcome or mitigate at least one of the problems of the prior art.
It is also an aim of embodiments of the invention to provide methods of detecting and/or increasing enrichment of nucleic acid ROIs and to provide methods of cost effectively amplifying probes to detect ROIs.
According to a first aspect of the invention there is provided a method of hybridisation of one or more sample derived nucleic acids comprising one or more regions of interest, the method comprising the step of hybridisation of each sample nucleic acid and/or region of interest with a plurality of non-overlapping nucleic acid probes.
A region of interest (hereinafter “ROI”) is a contiguous genome nucleic acid sequence or non-contiguous set of genome nucleic acid sequences targeted for detection or recovery in an experiment. Examples of nucleic acids would include, DNA, hnRNA (heterogenous RNA), mRNA, tRNA or rRNA sequences.
“Sample derived nucleic acids” means one or more samples of nucleic acids from a biological sample or material.
In some embodiments there are at least 50%, 60%, 70%, 80%, 90% or 95% non-overlapping probes used in the hybridisation method. There may be a plurality of overlapping probes used in the hybridisation method, but they should preferably make up no more than 25%, 20%, 15%, or 10% of the total number of probes used in the methods of the invention, and in some embodiments no more than 10% or 5% of the total number of probes.
The methods of the invention therefore utilise mutually largely non-overlapping probes for each ROI, in order to maximise hybridisation coverage of each ROI.
The method of the first aspect of the invention is particularly suited for use in hybridisation target enrichments (hTE) processes, and also in ROI detection methods.
The method may comprise hybridisation of a ROI that has been broken into a plurality of nucleic acid fragments. Each fragment may be as described herein below. These embodiments are particularly suited for hTE processes and therefore the method may comprise a process of hTE comprising hybridisation of one or more nucleic acid ROIs, with the method comprising the step of fragmenting the ROI nucleic acid sequences and hybridising the resulting fragments with a plurality of non-overlapping probes.
hTE methods require that a total nucleic acid sample is first fragmented into pools with fragment sizes of at least 500 bases, 700 bases, 900 bases, 1000 bases, 1200 bases, 1400 bases or 1500 bases. ROIs will be present within a subset of these fragments. It has been found that the method is particularly useful for recovering the ROI containing fraction from nucleic acid fragment pools with an average fragment size of between 900 bp and 1.2 kb, 1 kb and 1.5 kb, or between 1.1 kb and 1.4 kb, or between 1.2 kb and 1.3 kb.
In some embodiments the method of the first aspect of the invention comprises hybridisation of one or more nucleic acid fragments comprising at least a portion of one or more ROIs, wherein each fragment comprises at least 500 bases, 700 bases, 900 bases or 1000 bases. In some embodiments the method may comprise hybridisation of one or more nucleic acid fragments comprising at least a portion of one or more ROIs, wherein each fragment comprises no more than 2000 bases, 1800 bases, 1600 bases or 1500 bases.
In other embodiments the method of the first aspect of the invention may comprise a method of detecting a ROI. In such embodiments the ROI may comprise a relatively large number of nucleic acid bases, such as whole genes, for example.
The ROI may be greater than 50 kb, 100 kb, 250 kb, 500 kb, 1 Mb or 2 Mb for example. In other embodiments of the invention the ROI may be at least 50 Mb, 100 Mb, 150 Mb or 200 Mb.
In some embodiments the number of probes designed to hybridise per 1 kb of each nucleic acid ROI on average is at least 1 probe, at least 3 probes, at least 4 probes, or at least 5 probes. In some embodiments at least 3 probes, or at least 4 probes, or at least 5 probes are designed to hybridise per 1 kb of each ROI on average. In some embodiments there may be up to 20 probes designed to hybridise per 1 kb of each ROI on average.
In some embodiments the method may comprise, in addition to hybridisation of a plurality of non-overlapping probes to each ROI, hybridisation of portions of one or more probes to regions outside of and possibly flanking the ROI or ROI fragments. In some embodiments the method may comprise annealing portions of at least one probe to a region extending up to 100 bp, 200 bp or 300 bp outside of and possibly flanking the ROI or ROI fragments. These embodiments are particularly suitable for methods of hTE in which it is important to ensure efficient recovery of all sub-regions of an ROI.
In the first aspect of the invention, the method has been found to provide a number of advantages over known hybridisation enrichment or detection methods. With respect to hTE the method enables accurate recovery of relatively long (800-1500 bp) target nucleic acid fragments that contain ROIs using a plurality of non-overlapping probes, compared to current techniques which utilise shorter target nucleic acid fragments (200-500 bp) and a plurality of frequently overlapping probes. The use of longer target nucleic acid fragments in the method: leads to more efficient recovery of ROI bases situated near to junctions with non-ROI bases; increases the uniformity of recovery throughout ROIs; promotes better recovery of “difficult” regions such as regions with secondary structure or particularly high or low proportions of C+G base content; and maximises the number of base pairs formed between probes and ROI nucleic acids, which thereby increases resistance to stringent washing and so improves the specificity of product recovery. The use of a plurality of non-overlapping probes in the method counters problematic steric hindrance and competition at regions where probes overlap.
In some embodiments the probes hybridise with at least 50%, 60%, 70%, 80% or 90% of the length of a given ROI. In some embodiments, the probes hybridise with at least 95%, 96%, 97%, 98% or 99% of the length of a given ROI.
In some embodiments at least one probe or bait is annealed within 5, 10, 15, 25, 50 or 100 bases from an end of each ROI or each ROI fragment. In some embodiments at least one probe is annealed within 5, 10, 15, 25, 50 or 100 bases from both ends of each ROI or each ROI fragment.
In some embodiments at least 5%, 10%, 15% 20%, 25%, 30%, 40% or 50% of the probes are non-overlapping on the ROI or each ROI fragment. In some embodiments at least 75%, 80%, 85%, 90% or 95% of the probes are non-overlapping on the ROI or each fragment ROI fragment. In one embodiment 100% of the probes are non-overlapping.
In some embodiments the method comprises immobilising the probes onto a surface to provide a microarray of immobilised probes. The method may comprise hybridising sample nucleic acids including the ROIs to a plurality of the immobilised probes, followed by washing, to preferentially denature and remove non-ROI derived and non-annealed nucleic acids.
In other embodiments the method may comprise in-solution hybridisation, wherein the probes and ROIs are first hybridised in-solution. The probes may be labelled with biotin or any other suitable tag or label and recovered using Streptavidin coated, or otherwise suitably coated, paramagnetic or other beads or other suitable coated solid surface, to facilitate the recovery of these surfaces and the nucleic acids attached to them. The method may then comprise the application of stringent wash conditions to preferentially remove non-hybridised or non-specific hybridised nucleic acid.
In some embodiments, the first aspect of the invention is a method of hTE of a ROI. In other embodiments the first aspect of the invention is a method of detection of a ROI.
According to a second aspect of the invention there is provided ROI sequences from a sample hybridised with a plurality of non-overlapping probes.
The ROI and probes may be as described above for the first aspect of the invention. The target-probe duplex may be produced according to the methods of the first aspect of the invention.
According to a third aspect of the invention there is provided a nucleic acid probe labelled with a plurality of the same or different labels per probe molecule.
In some embodiments the probe nucleic acid comprises at least 6, 8, 10, 12, 14 or 15 labels per molecule.
In some embodiments the nucleic acid probe comprises a label within 10, 5, 3, 2, 1 or 0 bases from an end of the nucleic acid probe. This could be an end of a probe that comprises additional bases not designed to hybridise with any ROI bases. Such non-targeting ends of the nucleic acid probe, if included, may comprise the 5′ end or the 3′ end or both ends of the molecule, and the label may be placed within 10, 5, 3, 2, 1 or 0 bases of such an end. With respect to recoverable probes the label is typically an entity that facilitates physical recovery of the label and the nucleic acids adjoined to it. The 3′ end of the probe may comprise a dideoxynucleotide so as to prevent polymerase based extension, and thereby enable polymerase chain reactions to be used to amplify and hence recover target sequences that have been captured.
Each label may independently comprise a fluorescent marker, a luminescent marker, a recoverable marker, a radioactive marker, or the like.
Each label may independently comprise biotin.
The probes that have the structure described in the third aspect of the invention may be usefully employed in the method of the first aspect of the invention, and accordingly in a fourth aspect of the invention there is provided the method of the first aspect of the invention using at least one probe of the third aspect of the invention. The method may comprise using a plurality of probes of the third aspect of the invention and in some embodiments all of the probes used are as described for the third aspect of the invention. In some embodiments of the invention the method may comprise using a plurality of non-overlapping probes of the third aspect of the invention.
The probe or probes of the third aspect of the invention ensure that their use in hybridisation events creates target-probe duplex structures in which multiple copies of the label are present, which facilitates improved ease and strength of detection or recovery.
Included non-targeting ends may comprise at least 2, 3, 4, 5, 6, 7, 8, 9 or at least 10 labels. In some embodiments the non-targeting ends may comprise more than 10 labels.
According to a fifth aspect of the invention there is provided the use of a non-deoxy ribonucleic acid molecule to block or mask a surface or to block or mask repetitive DNA sequences.
According to a sixth aspect of the invention there is provided a method of blocking or masking repetitive DNA sequences comprising mixing at least one sample nucleic acid with a non-deoxy ribonucleic acid molecule.
The non-deoxy ribonucleic acid molecule may comprise RNA which is a transcription product from whole genomic DNA or from fractionated genomic DNA.
The non-deoxyribonucleic acid molecule may be natural or synthetic.
There may be more than one non-deoxy ribonucleic acid molecule, or there may be one or more deoxyribonucleic acid molecule and at least one non-deoxyribonucleic acid molecule as blocking or masking agents. For example there may be a non-deoxyribonucleic acid and a DNA molecule as blocking or masking agents.
The RNA transcription product may be the transcription product from any prokaryote, eukaryote or archaea, for example mammalian DNA (including human DNA) or DNA of fish, reptile, bird amphibian, plant, fungal species. Suitable fish DNA includes salmon gDNA, or any combination thereof.
The or each RNA transcription product may be derived by transcription from whole genomic human DNA, human Cot-1 DNA, or salmon genomic DNA, or any combination thereof, for example.
In some embodiments, the RNA transcription product may comprise a mixture of RNA transcription products selected from mammalian DNA, fish DNA, bird DNA, reptile DNA, plant DNA and fungal DNA, such as a combination of mammalian and fish DNA, which may be RNA transcription products of whole genomic DNA. In some embodiments the combination comprises the RNA transcription products of human DNA and salmon DNA, especially of whole genomic human and salmon DNA.
It has surprisingly been found that utilising combinations of non-deoxy ribonucleic acids as blocking or masking agents may provide up to four or more times the enrichment power of DNA-based blocking or masking agents.
In another example, the blocking or masking may be effected by mixing a non-deoxyribonucleic acid molecule and a deoxyribonucleic acid molecule, with at least one sample nucleic acid, such as a mixture of an RNA transcription product of a DNA molecule and a Cot-1 DNA or salmon genomic DNA molecule, for example.
The RNA transcription product may later be eliminated from the reaction (e.g., from employed surfaces to which it has become bound, or from repetitive DNA fragments to which it has hybridised) by treatment with a removal agent, which may be an RNase (such as RNase A, RNase If or RNase H, for example).
According to a seventh aspect of the invention there is provided a method of manufacturing a surface blocking or masking agent or repetitive DNA blocking or masking agent, the method comprising:
Step b) may typically include ligating the target fragments to DNA sequences that encode one or more RNA polymerase promoters (such as T7), amplification procedures, and incubation in the presence of RNA polymerases to transcribe the DNA.
The DNase may be DNase I. The proteinase may be proteinase K.
There may be a step e) of purifying the RNA transcription product and protecting the product by addition of a reversible RNase inhibitor.
According to an eighth aspect of the invention there is provided a method of nucleic acid sequence hybridisation comprising the steps of:
The non-deoxy ribonucleic acid reagent may be a RNA transcription product as described hereinabove for the fifth, sixth and seventh aspects of the invention. In some embodiments step b) may comprise adding two or more non-deoxy ribonucleic acid molecules, such as a combination of RNA transcription products. In preferred embodiments step b) comprises adding the RNA transcription product of mammalian DNA and fish DNA, such as a combination of the transcription products of human DNA and salmon DNA, preferably of whole genomic DNA.
During hybridisation of target DNA sequences to probes, repetitive sequences within the sample DNA and/or probes may give rise to unwanted hybridisation events involving ROI and/or non-ROI related sequences. Such hybridisation between repetitive sequences can create networks of DNA fragments which can lead to unintended detection or recovery of non-ROI based sequences. To counter this tendency, repeat sequence containing blocking reagents such as Cot-1 DNA may be added during hybridisation to bias network formation towards interactions between sample derived DNA fragments and blocker molecules rather than only between sample derived DNA fragments. Multiple target derived DNA fragments are therefore less likely to become joined together in any one network, and so this minimises the recovery or detection of non-ROI sequences. Furthermore, if one includes repeat sequence blockers comprised of non-deoxy ribonucleic acids (such as a genomic DNA derived RNA transcription product) in the hybridisation process, and follow this by treating with RNase, this serves to break up repetitive element networks, so that subsequent washing is able to remove much of the destroyed network and hence reduce the level of detection or recovery of non-ROI based sequences.
The method of the eighth aspect of the invention may be combined with the method of the first aspect of the invention, to provide a method of hybridisation of a nucleic acid ROI with a plurality of probes, the method further comprising the addition of a non-deoxy ribonucleic acid molecule, such as a RNA transcription product of genomic DNA, during the hybridisation reaction. The various embodiments of the first to third aspects of the invention may be combined with the method of the eighth aspect of the invention. The probes may be as described for the third aspect of the invention and may comprise one or more probes having multiple labels as described hereinabove. It has been found that hybridisation using the method of the first aspect of the invention and the probes of the third aspect of the invention typically produces an EP of at least 250 and highly uniform rates of non-repetitive sequence recovery across an ROI, whilst being cost effective compared to related competing technologies. When the method of hybridisation of the first aspect of the invention using the probes of the third aspect of the invention is also combined with the method of the seventh aspect of the invention, the EP increases to >2000 which is believed to match or exceed the capabilities of alternative contemporary market-leading hTE methods.
In a ninth aspect of the invention there is provided a method of blocking a solid surface comprising the steps of
The blocking reagent in step b) may be a nucleic acid or synthetic non-natural equivalent and may be a non-deoxy ribonucleic acid as described hereinabove for the fifth to eighth aspects of the invention. The blocking reagent may be a transcription product from any prokaryote, eukaryote or archaea, for example mammalian DNA (including human DNA) or DNA of fish, reptile, bird amphibian, plant, fungal species. Suitable fish DNA includes salmon gDNA. The blocking aspect may comprise a surface masking blocking agent manufactured according to the sixth aspect of the invention.
Common to the majority of hybridisation based methods is the involvement of a solid surface, such as a nylon membrane (e.g., as in Southern blotting); glass surfaces (e.g., as in microarray-based ROI detection and hTE); or coated paramagnetic beads (e.g., as used to recover probes in solution-based hTE). DNA and RNA can form interactions with surfaces, resulting in non-specific signals when detecting ROIs and the recovery of non-ROI sequences when enriching ROIs. Surfaces can be pre-treated with blocking agents, such as bovine serum albumin (BSA) and polyvinylpyrrolidone (PVP), or even the DNA/RNA of an unrelated species. Blocking agents interact with the surfaces and thereby shield the surface from interaction with and binding to the sample DNA/RNA, hence significantly reducing the detection or recovery of unintended DNA sequences.
According to a tenth aspect of the invention there is provided a method of amplification of short, mixed nucleic acid sequences, comprising the steps of:
a) providing between around 1 fg (femtogram) to around 500 pg (picogram), though preferably between around 1 pg and around 250 pg, of a complex pool of single-stranded nucleic acid fragments having common sequences at their 5′ ends and having common sequences at their 3′ ends.
b) amplifying the nucleic acid fragments, preferably by polymerase chain reaction with a suitable primer or pair of primers.
In some embodiments the nucleic acid fragments in step a) have a length of ≤1.5 kb, ≤1 kb, less than 500 bases, less than 400 bases, less than 300 bases, less than 250 bases, or less than 200 bases. In some embodiments the nucleic acids in step a) have a length of between 60 and 250 bases, or between 80 and 200 bases, or between 100 and 200 bases.
Step b) should be undertaken such that there is no significant change to the diversity of the complex pool.
The nucleic acid fragments in step a) may have a plurality of common sequences at their 5′ ends and/or their 3′ ends.
It has been found that one can effectively and with high fidelity amplify these nucleic acid sequences, such as the probes described hereinabove for the various aspects of the invention, by starting from such tiny amounts of this type of starting material, in the femtogram to picogram range (which is far less than is generally used in such amplifications). When higher amounts of such short nucleic acid fragments are used in standard PCRs the molecules tend to interact with each other such that 3′ ends become non-specifically hybridised and then extended by copying whatever other sequence to which they had spuriously hybridised. As the PCR cycles progress, progressively more and longer spurious products are generated. However, if the starting concentration of these PCR targets is low, they will instead be preferentially primed as intended by primers matched to their common ends, and the desired products thereby amplified. This noted problem with such PCRs is vastly exaggerated when the starting mixture of short fragment targets is derived by synthesis on microarrays. Such array-derived DNA pools often contain a very high proportion of truncated molecules (˜30% to ˜99%) (LeProust et al., 2010), which therefore cannot be primed and amplified as intended, but can become involved in cross-priming with other target fragments.
In some embodiments step a) comprises providing at least 10 fg, 100 fg or 500 fg of nucleic acid fragments. In some embodiments step a) comprises providing no more than 450 pg, 400 pg, 350 pg or 300 pg or nucleic acid fragments.
The probes and blocking or masking reagents described hereinabove may also be used in other applications such as fluorescence in situ hybridisation (FISH), for example.
Embodiments of the various aspects of the invention will now be described by way of example only, with reference to the accompanying drawings of which:
An optimised method for the amplification of complex pools containing array-synthesised short (<200 bp) single-stranded DNA molecules was developed. A ‘model’ pool (produced by conventional long oligonucleotide synthesis) was used to evaluate various reaction parameters. The model pool as shown in
All PCRs were prepared on ice. 30-50 μl PCRs contained 1× of the supplied PCR Buffer, 0.15 pmols/μl ProAmpF04E, 0.15 pmols/μl ProAmpR01D, 0.2 mM dNTPs, 0.025μ U/μl of the required DNA Polymerase, and the required mass of mixtures of full length and truncated single-stranded DNA molecules. Reactions were sealed with a heat sealable PCR film or PCR strip-caps (Thermo fisher Scientific, Loughborough, Leics, UK).
Optimal thermal cycling conditions were determined to entail the following: 98° C. for 30 sec, 5× (98° C. for 30 sec, 65° C. for 10 sec), 25× (unless stated elsewhere) (98° C. for 10 sec, 70° C. for 10 sec) 72° C. for 1 minute then held at 15° C. Following cycling, 10 μl of the PCRs were subject to electrophoresis alongside 1 μg 50 bp ladder (NEB, Hitchin, Herts, UK) on a 2.5% LE agarose gel stained with 0.2 μg/ml EtBr. Completed PCRs were stored at −20° C. The 5° C. reduction in annealing temperature for the first 5 PCR cycles allowed the primers to initially anneal to the primer annealing sites <20 nt in length. Once these had been extended by polymerase extension, the annealing temperature could be raised to 70° C.
A wide range of DNA polymerases were evaluated including: Amplitaq Gold (Applied Biosciences), Pfu Ultra high fidelity DNA polymerase (Agilent), Phusion high fidelity DNA polymerase (NEB), iProof high fidelity DNA polymerase (BioRad) and Velocity (Bioline). These investigations showed that iProof High-Fidelity DNA polymerase (BioRad) worked particularly well for complex pool amplification, along with Phusion and Velocity.
These methods produced a robust and greatly improved method for complex pool amplification. But beyond the PCR conditions, other factors (complex pool quality and complex pool quantity) were also found to be of great importance, as described below.
Emulsion PCR (EMPCR) has been proposed as a means to improve troublesome PCRs, especially if they involve complex template DNA mixtures. EMPCR entails creating, in one tube, millions of femtolitre sized droplets of oil-coated water (including PCR buffer, primers etc), such that each of these volumes acts as a separate reaction vessel within which PCR amplification can occur starting from a few template molecules. Since this arrangement reduces the chances of cross-priming and other undesirable interactions between different templates and their products, there is theoretically a limited risk of generating many different false products. Also, should cross-priming occur, the encapsulation limits the resources available to the un-desirable product thus preventing over amplification. This does not, however eliminate the possibility of false internal priming within synthesized strands (by primers or products strands), or concatamerisation between single-stranded amplification products, within each sub-reaction. Nevertheless, EMPCR has been adopted by many researchers to try to improve the effectiveness of challenging complex pool amplifications in order to enhance product quality.
To test the actual effectiveness of EMPCR for complex pool amplification, 30 μl PCRs were seeded with 10 ng of the model complex pools, and standard HF buffer (BioRad) was replaced with a detergent free formulation of the same buffer (BioRad) to prevent dispersal of the emulsion. Emulsification was performed by overlaying the reactions with 170 μl of a pre-prepared and chilled mixture containing 73% Tegosoft DEC (Evonik, Essen, Germany) 3% Abil WE09 (Evonik, Essen, Germany) and 20% Light mineral Oil (Sigma Aldrich, Gillingham, Dorset, UK), followed by transfer into a 4° C. constant temperature room and shaking at maximum speed on a vortex device for 10 min. EMPCRs were then performed for 20 to 30 thermal cycles.
The emulsions were broken by addition of 500 μl Butanol (Thermo Fisher Scientific) and the samples briefly vortexed. Then, 150 μl of buffer PB (Qiagen, Crawley, West Sussex, UK) was added and mixed into each sample by brief vortexing. Products were recovered from the whole sample by purification upon Qiagen MinElute PCR columns according to the manufacturer's protocols. Purified reaction products were eluted in 30 μl buffer EB (Qiagen).
Agarose gel electrophoretic analysis of the products revealed that the amplified DNA fragments were all of the desired size range (a single gel band), but that each 10 μl PCR volume was able to generate only a few ng of material, no matter whether the reactions were seeded with a high quality (100% equivalent to 10 ng of amplifiable full-length molecules) or a low quality (0.1% equivalent to 100 pg of amplifiable full-length molecules) complex pool template.
Another downside with EMPCR relates to the unavoidable cost, time and complexity of the process of emulsion breaking and subsequent product purification. Solution phase PCRs can be de-salted and purified by running through a chromatography column (e.g., Microbiospin, BioRad) or micro filter column (e.g., Amicon Ultra, Millipore). But to purify EMPCRs, special columns are required to remove the emulsion oils. Such columns are more likely to allow passage of contaminants such as ethanol and chaotropic salts into the eluted product.
These results show that EMPCR can amplify 10 ng of a complex pool without generating excessive amounts of spurious product, however the poor PCR dynamics within the emulsion cause the approach to generate far too little material for the needs of most downstream applications.
Spurious products in complex pool PCR may be caused by ‘over-cycling’; especially since the problem worsens as the total number of thermal cycles increases. The concentration of genuine product will rise so high in the later cycles that DNA strands can; a) start to cross-prime onto each other, generating false longer products, and b) become available for internal priming by the common primers, generating false shorter products. However this hypothesis fails to explain why the same type of events would not also occur for many of the amplified target sequences within their individual droplets in EMPCR.
The problem may be triggered by events that occur towards the start rather than at the end of the PCR, especially in PCRs with an excessive starting concentration of complex single-stranded DNA molecules. These events then create a low background of various artefacts some of which could amplify as efficiently as genuine products, such that they come to dominate the genuine products as more and more reaction cycles are performed. The nature of these initial ‘trigger’ events would also have to be such that they cannot occur (or are very much minimized) in the EMPCR context, wherein the target molecules are mostly isolated from one another into small clusters within the oil droplets.
A PCR seeded with 10 ng of human genomic DNA will have within it few free 3′ ends and only ˜6×103 amplifiable target strands (10 ng/3 pg (Mass of a single haploid genome)×2 (to convert to single-stranded molecules)). In contrast, a PCR seeded with 10 ng of an complex pool of short single-stranded DNA molecules (which is perhaps up to 10% of the original pool that will have been supplied/purchased) will contain ˜2×1011 amplifiable molecules, with an equally large number of free 3′ ends. This >>107 fold relative excess is enormous, and it means the starting situation of an complex pool PCR is analogous to the situation that will exist in a regular genomic DNA target PCR at the end of the whole reaction (˜25 cycles). These almost 1 trillion targets therefore represent a mass of diverse sequence primers which can diffuse quickly and use their free 3′ ends to prime on other molecules, and since it is also composed of a myriad of different sequences there will be great potential for internal cross-priming of one sequence onto another. It is therefore believed that the cross-priming and mis-priming events towards the end of an ‘over-loaded’ complex pool PCR probably start happening excessively in the very first few cycles of such PCRs. This creates various artefacts that then further amplify and can eventually outnumber the desired products as the amplification of the desired products plateaus during the later stages of the PCR. This problem does not exist in EMPCR, since the original templates are physically separated from one another from the start, and the resources contained within each reaction droplet are very limited Furthermore, the overall negative impact of this undesirable mis-priming and inter-molecule priming is likely to be proportional to the fraction of target molecules that are full length (not truncated at their 5′ end), as only this class of original template can be internally primed and copied to generate an artefact with a common priming site at its newly synthesised (3′) end.
In order to overcome these problems with complex pool PCR a method was performed according to the tenth aspect of the invention in which a significantly reduced amount of template pool was used.
Duplicate PCRs were performed in 30μ μl volumes seeded with 1 μl of 10× serial dilutions of each of the different quality model pools, using 30 thermal cycles. The input pools contained 10 ng, 1 ng, 100 pg, 10 pg and 1 pg of, single-stranded DNA molecules. Example results from such experiments using the optimum enzyme and reaction conditions, detailed above in Example 1, are shown in
As can be seen in
For reactions seeded with 10 pg of total template, the 1%-100% quality models amplified the desired fragment mixture very cleanly. Thus 0.1-10 pg of full length target was sufficient, and 9.9 pg of truncated target did not compromise these reactions.
For reactions seeded with 100 pg of total template, the 0.1%-1% quality models amplified the desired fragment mixture very cleanly. Thus 0.1-1 pg of full length target was sufficient, and 99.9 pg of truncated target did not compromise these reactions. However, the reaction with 10 pg of full length target was compromised (overtaken by artefacts) by the presence of 90 pg of truncated target. The reactions with 50 pg and 100 pg of full length target also generated a lot of artefacts.
These results suggest that the main factor that determines whether artefacts are formed in an complex pool PCR is the absolute amount of full-length target present at the start of the reaction. To ensure good quality amplifications, this quantity should be of the order of 1 pg, though it is quite robust to an order of magnitude difference up or down. The amount of 5′ truncated target has a far smaller influence of the reaction fidelity, even up to the 10-100 pg range—though if more than this is present in the starting reaction then more artefacts will be produced, and a mild excess of truncated template seems to cooperate with a mild excess of full length template in generating undesirable products.
Performing fewer PCR cycles reduces the likelihood of errors within the amplified sequences. Q5 polymerase (NEB) has an error rate >100 fold lower than Taq DNA polymerase which relies on efficient 3′ to 5′ exonuclease activity. However, the efficient 3′ to 5′ exonuclease activity also degrades primers during PCR (Pers. Comm, NEB technical support). Using a single Phosphorothioate bond at the 3′ end of PCR primers would prevent 3′ to 5′ exonuclease activity but would also block desirable exonuclease activity e.g. 2 exonuclease.
A series of PCRs with differing primer concentrations in which complex pools with quality scores of 10 and 1% were carried out using the following conditions: 50 μl PCRs contained 1× of the supplied PCR Buffer, 1.5 pmols/μl ProAmpF04E, 1.5 pmols/μl ProAmpR01, 0.2 mM dNTPs, 0.025μ U/μl of Q5 DNA Polymerase, and the required mass of template pool). Reactions were sealed with a heat sealable PCR film or PCR strip-caps (Thermo fisher Scientific, Loughborough, Leics, UK). The reactions were cycled as follows: 98° C. for 30 sec, 5× (98° C. for 30 see, 65° C. for 10 sec), 20× (98° C. for 10 sec, 70° C. for 10 sec) 72° C. for 1 minute then held at 15° C. Additionally, the PCRs were performed with and without supplementing with 0.01 μU/μl of Thermostable Pyrophosphatase (NEB)
Increasing the primer concentration in the PCRs and supplementing with Thermostable Pyrophosphatase increased the yield that could be generated per 50 μl PCR from ˜0.2 μg to >0.5 μg using 5 to 6 fewer PCR cycles and 10 fold less template.
Further optimisations showed that equivalently high yields could be achieved by seeding PCRs with 10 pg to 20 pg of the complex pool and performing 15 to 17 PCR cycles (
Comparison with EMPCR
The above in-solution complex pool PCR technique of the invention is shown to be more efficient than EMPCR.
Considering a complex pool with a quality score of ˜70% and a yield of ˜300 ng:
The inventive technique is also faster to set up, as EMPCR requires a long emulsification step prior to thermal cycling, and a long demulsification step following thermal cycling.
The inventive technique allows easier purification of PCR products, as it is compatible with a wide range of purification platforms e.g. Silica membrane columns (Qiagen), Silica coated beads (Qiagen), AmPure XP beads (Beckman), and Polyacrylamide gel buffer exchange (BioRad) etc. The emulsifying oils used for EMPCR limit compatibility with some of these purification platforms.
EMPCR is also intolerant of soap containing buffers, iProof polymerase has a soap free buffer available, but many other polymerases such as Q5 polymerase are optimised for use in soap containing buffers. The inventive technique is compatible with a range of buffers. The inventive technique may be adapted to amplify ng masses of complex pools. Finally, since EMPCR compartmentalises the reaction, it is possible that the separate compartments might consume their resources at different rates and may result in an un-even product.
Complex pool PCRs were performed using standard optimised conditions but with substitution of dCTP for differing ratios of 17 Biotin-16-Aminoallyl-2′-dCTP and dCTP (Trilink). Biotin-16-Aminoallyl-2′-dCTP has a flexible linker arm making it more efficient for use in PCR than other biotinylated nucleotides. It was found that a ratio of 0.65 17 Biotin-16-Aminoallyl-2′-dCTP gave an optimal balance between yield and biotin incorporation.
PCR amplified complex pools are double-stranded. To generate multi-biotinylated probes for a hybridisation based target capture, the multi-biotinylated double-stranded pool was transformed into a single-stranded pool. To achieve this, the 3′ primer site was removed with the Bts I restriction enzyme (NEB) and the unwanted strand removed with 2-exonuclease (NEB).
To allow direct PCR recovery of captured DNA fragments following targeted enrichment, it was necessary to protect the 3′ end of the probes from primer extension by DNA polymerases. Terminal Transferase (NEB) was used to add di-deoxy ATP (ddATP, Trilink) to the 3′ end of the probe strands prior to the removal of the un-desired strand by λ-exonuclease.
The output from processing is a pool comprising single-stranded multi-biotinylated probe with a non-target end as shown in
The multi-biotinylated DNA probe of the third aspect of the invention produced for example by the method described above has several potential advantages over existing DNA and RNA probes:
The method of producing the multi-biotinylated probe library was as follows:
a. PCR Amplification of a Template Probe Library
The template probe was diluted in 10 mM Tris HCl (pH 8.5). A PCR master mix sufficient for ˜100 PCRs was prepared containing 1×Q5 high fidelity PCR buffer (NEB), 1.5 to 3 pmol/μl of 5′ biotinylated ProAmp-F primer, 1.5 to 3 pmol/ul 5′ phosphorylated ProAmp-F primer, 3 μM dGTP, 3 μM dATP, 3 μM dTTP, 105 μM dCTP, 195 μM Biotin-16-AA-CTP (Trilink), 0.02 U/μl Thermostable inorganic Pyrophosphatase (NEB), 0.05 U/μl Q5 hot start high fidelity DNA polymerase (NEB). The master-mix was vortexed.
Several 49 μl DNA free controls were aliquoted to which 1 μl of water was added. The template pool was added to the remaining master-mix to a concentration of 0.02 to 0.4 pg/μl. Following vortexing, the master-mix was aliquoted into 50 μl reactions and the PCR tubes sealed. The reactions were cycled as follows: 98° C. for 30 see, 5× (98° C. for 30 sec, 65° C. for 10 see), 10 to 20× (98° C. for 10 see, 70° C. for 10 sec) 72° C. for 1 minute then hold at 15° C. Following PCR, samples are stored at −20° C.
b. Purification and Concentration of PCRs
Several PCRs were pooled and vortexed. 200 to 500l aliquots of the pooled PCRs were purified using MinElute columns (Qiagen) using the standard operating procedure, ensuring that the binding capacity of the column was not exceeded, with the following exceptions: All centrifugations were performed at 16000 RCF. Elution buffer (EB—10 mM Tris HCl pH 8.5) was heated to 70° C. 10 μl of heated EB was added directly to each column followed by a 5 min incubation at 70° C. The eluate was recovered by centrifugation. A further 10 μl of pre-heated EB was added to each column, incubated for 1 min at 70° C. and the eluate recovered by centrifugation. Following purification, all eluates were pooled and vortexed.
c. Quantification and Quality Assessment of PCR
A DNA 1000 chip for the bioanalyser 2100 (Agilent) was used to assess the quality of the amplification. A single broad peak (broad due to the random incorporation of Biotin-16-AA-CTP) was identified with the crest of the peak at ˜200 bp. The increased peak size was caused by retardation of the PCR fragments due to incorporation of Biotin-16-AA-CTP.
Following bioanalyser 2100 analysis, the concentration of the amplified complex pool was determined using a NanoDrop spectrophotometer (Thermo).
d. Resolution of PCR into a Single-Stranded Probe Library
The total Mass of the amplified complex pool was determined. A reaction was prepared on ice such that every 20 μl contained 2 μg amplified complex pool, 1× Terminal Transferase buffer (NEB), 1× CoCl2 (NEB), 0.125 U/μl BtsI, 0.2 μg/μl BSA (NEB) and 500 μM ddATP (Trilink). The reaction was mixed by vortexing and incubated for 30 min at 55° C. The reaction was incubated on ice for 5 min.
3 μl of a mixture containing 2.5 μl of Terminal Transferase at 20,000 U/ml (NEB) in 1× Terminal Transferase buffer was added per 20 μl of the reaction. The reaction was vortexed to mix and incubated for 60 min at 37° C. The reaction was incubated on ice for 5 min.
3 μl of a mixture containing 2.5 μl of k exonuclease at 5000 U/ml (NEB) in 1× Terminal Transferase buffer was added per 20 μl of the initial reaction volume. The reaction was vortexed to mix and incubated for 20 min at 37° C. and 20 min at 80° C.
e. Purification of the Probe Library
Sufficient MicroBioSpin p6 columns (BioRad) were warmed to room temperature such that 75 μl of un-purified probe library could be passed through each column. The probe library was purified according to the manufacturer's standard operating procedure. Following purification, the eluates were pooled and gently vortexed.
e. Quantification and QC of the Purified Probe Library
The purified probe library was analysed using an RNA 6000 nano chip for the Bioanalyser 1100 (Agilent) and quantified using a NanoDrop spectrophotometer (Thermo) An ideal probe library should have a concentration of ≥50 ng/μl and an OD 260:280 of 1.7-2.0.
Preparation of Human gDNA Fragment Libraries
gDNA Fragmentation
This method describes fragmentation using a Bioruptor sonicator. Note: Other DNA fragmentation options may be implemented, for example the Covaris system (Covaris), nebulisation (Roche), or by NEBNext dsDNA Fragmentase (NEB).
The gDNA was diluted in 10 mM Tris HCl (pH 8.5) to a concentration of 20 ng/μl. 110 μl of the diluted DNA was aliquoted into separate 1.5 ml sonication tubes (Diagenode), vortexed and centrifuged briefly prior to incubation on ice until the Bioruptor (Diagenode) was prepared.
To prepare the Bioruptor, the shearing bath was chilled for 30 min with water containing an ˜0.5 cm layer of crushed ice. Following preparation, the aliquots of gDNA were placed into the Bioruptor's sample cradle and device assembled according to the manufacturers guidelines.
The samples were sonicated as follows:
Following sonication the fragmented DNAs were pooled and stored at −20° C.
Aliquots of the pooled sheared gDNA were purified using 1.2× to 1.8× AmpureXP beads (Beckman Coulter), dependant on the required fragment size, according to the manufacturers standard operating procedure. Finally the DNA was eluted in 10 mM Tris HCl (pH 8.5) with incubation at 65° C. for 5 min prior to removal of the magnetic beads. Purified sheared gDNAs were quantified using a NanoDrop spectrophotometer and the fragment size determined using a DNA 7500 chip for the bioanalyser 2100 (Agilent). Purified sheared gDNA was stored at −20° C.
25 μl reactions were prepared on ice containing 500 ng to 1000 ng of fragmented gDNA, 1× Thermopol buffer (NEB), 2% PEG 4000 (Fermentas) 1.0 mM ATP (Thermo), 0.4 mM dNTPs (Promega) 0.4 U/μl T4 polynucleotide kinase (Fermentas), 0.1 U/μl T4 DNA polymerase (Fermentas), 0.05 U/μl Taq DNA polymerase (Kapa biosystems). Reactions were vortexed briefly to mix and incubated for 20 min at 25° C. followed by 72° C. for 20 min.
The reactions were placed on ice. A 5 μl solution containing 1× Thermopol buffer (NEB), 10 times (fragments >700 bp) to 30 times (fragments <700 bp) the molar equivalent of the R.Block T7 adapter and 5 units of T4 DNA ligase (Fermentas) was added directly to each reaction. Reactions were vortexed to mix and incubated for 60 min at 22° C. and for 15 min at 65° C. Similar reactions were pooled and mixed by vortexing. Samples were stored for no longer than 24 hours over night.
Reactions were fractionated on an LE agarose gel stained with 1× Cyber Green. Using a Dark Reader transilluminator, gel slices containing fragments in the range of 800 bp to 1200 bp (Illumina sequencing) or 1200 bp to 1600 bp (454 sequencing) were excised. DNA fragments were recovered using Qiagen gel extraction columns and eluted in 50 μl 5 mM Tris HCl pH 8.5.
50 μl PCRs contained 1× LongAmp buffer (NEB), 1 pmol/μl of each LMPCR primer, 1 μg/μl Ultra Pure BSA (Ambion), 0.3 mM dNTPs, 0.1 U/μl LongAmp DNA polymerase (NEB) and 20 μl of the purified ligated gDNA fragments.
PCRs were cycled as follows: 10× to 16×95° C. for 2 min, (95° C. for 30 sec, 60° C. for 30 sec, 72° C. for 1 min to 1.5 min) 72° C. for 5 min then held at 15° C.
PCRs were purified using MinElute columns (Qiagen) using the standard operating procedure, with the following exceptions: All centrifugations were performed at 16000 RCF. Elution buffer (EB—10 mM Tris HCl pH 8.5) was heated to 70° C. 10 μl of heated EB was added directly to each column followed by 5 min incubation at 70° C. The eluate was recovered by centrifugation. A further 10 μl of pre-heated EB was added to each column, incubated for 1 min at 70° C. and the eluate recovered by centrifugation. Following purification, all eluates were pooled and vortexed. Eluted samples were stored at −20° C.
Fragment size and linker carry over were assessed using a DNA 7500 chip for the Bioanalyser 2100 (Agilent). The majority of fragments ranged from 800 bp to 1200 bp for Illumina NGS fragment libraries and 1200 to 1600 bp for Roche 454 NGS fragment libraries.
Each library was quantified using a NanoDrop spectrophotometer (Thermo).
A series of in solution target capture experiments were undertaken to test the performance of the multi-biotinylated probe. In solution target capture workflow:
Hybridisation mixes contained: 0.75 μg to 1 μg of a gDNA fragment library (Average fragment size ˜1 kb (Illumina MiSeq sequencing) or ˜1.4 kb (Roche 454 GS FLX plus sequencing); 5 μg to 10 μg of a repetitive sequence blocker (as described in Example 8); 0 to 33 pmol/μl of oligonucleotides complementary to the library linkers (library blocking oligos); 1× Superase. IN RNase inhibitor; and 0.08 μM (˜2500 individual probe sequences) to 0.13 μM (˜16,000 individual probe sequences) of multi-biotinylated probe were diluted to 35 μl in a proprietary hybridisation buffer. (0.02% Ficol, 0.04% PVP, 45 mM Tris-HCl 11 mM Ammonium Sulphate, 20 mM MgCl2, 6.8 mM 2-Mercapthoethanol and 4.4 mM EDTA. pH 8.5)
The hybridisation mixes were: incubated at 95° C. for 2 min; cooled at a rate of 1° C. every 10 sec to 10° C. above a predefined optimal annealing temperature; step-down incubated for 60 sec at every ° C. above the optimal annealing temperature and cooled at a rate of 1° C. every 10 sec between each ° C.; and incubated at the optimal annealing temperature for 24 hours.
A schematic representation of the hybridised target DNA with multiple non-overlapping multi-biotinylated probes is shown in
Referring to
Referring now to
MyOne Streptavidin T1 paramagnetic dynabeads (Invitrogen) (1 mg) were washed twice in the proprietary hybridisation buffer (as defined in Example 7) either containing or not containing a nucleotide based blocking agent (R.block or DNA based).
The dynabeads were then re-suspended in 20 μl to 65 μl of the hybridisation buffer and incubated at 55° C. for 30 min prior to heating to the pre-defined optimal annealing temperature.
Hybridisation mixes were then transferred to the binding solution, mixed with gentle pipetting and incubated at the optimal annealing temperature for 20 min.
Following hybridisation, the dynabeads were concentrated, re-suspended in 150 μl of a pre-heated proprietary wash buffer and incubated at a predefined washing temperature for 5 min. This was repeated once.
The dynabeads were concentrated, re-suspended in hybridisation buffer supplemented with 5 U of Hybridase thermostable RNase H (Epicentre) (total volume 50 μl); incubated at 55° C. for 30 min, and finally incubated at the predefined wash temperature for 5 min.
The dynabeads were concentrated, re-suspended in 150 μl of a pre-heated proprietary wash buffer (50 mM HEPES, 0.04% PVP, 10 mM MgCl2, 6.8 mM 2-MercaptoEthanol. pH 8.5) and incubated at a predefined washing temperature for 5 min.
The dynabeads were concentrated, re-suspended in 50 μl 10 mM Tris HCl (pH 8.5).
Samples were eluted from the bead-captured probes by PCR prior to purification and NGS analysis using the Roche 454 GS FLX plus sequencing platform or the Illumina MiSeq sequencing platform.
Enrichment power (EP) is a measurement of how well a target capture method performs.
Firstly, the ratio of NGS reads that overlap the targeted region over reads that do not overlap the target is calculated (fr).
Secondly, the fraction of the genome that is targeted is calculated (ft)
EP can then be calculated. EP=fr÷ft.
EP of 2000 to 3000 fold was achieved.
In all cases 90-95% of the target was recovered at a depth ≥20% of the average per base read depth, with ˜80% recovered at ≥50% of the average.
Eukaryotic gDNA was randomly fragmented to a range of sizes between 100 bp and 9000 bp to suit different applications. Adapters containing a T7 RNA polymerase promoter, or any other RNA polymerase promoter, were annealed to the fragmented DNAs, including Cot-1 DNA and repetitive sequence rich DNA from other eukaryotes.
The adapter ligated DNA fragments were either amplified by PCR prior to transcription to increase yield, or transcribed without amplification.
The fragments were transcribed from the promoter by T7 RNA polymerase, or any other RNA polymerase if the adapter contained a promoter other than the T7 promoter.
Following transcription, DNase I was used to remove contaminating DNA. Following DNase I treatment, Proteinase K was used to remove contaminating DNase and RNase. The RNA product was then purified and protected by the addition of a temperature reversible RNase inhibitor (SUPERase .IN—Ambion) or any other suitable RNase inhibitor.
The resultant product of the invention will hereinafter be called “R.Block”.
Three R.Block types were produced using the above methods, namely:
A sample of DNA similar to the source of DNA for ultimate enrichment must be obtained. For example, if target enrichment of a human genomic DNA sample is required, either extract human genomic DNA from an un-related donor or purchase the DNA from a trusted supplier. The desired DNA was extracted according to standard procedures, and dissolved in 10 mM Tris HCl (pH 8.5).
gDNA Fragmentation
The gDNA was diluted in 10 mM Tris HCl (pH 8.5) to a concentration of 20 ng/μl. 110 μl of the diluted DNA was aliquoted into separate 1.5 ml sonication tubes (Diagenode), vortexed and centrifuged briefly prior to incubation on ice until the Bioruptor (Diagenode) was prepared.
To prepare the Bioruptor®, the shearing bath was chilled for 30 min with water containing an ˜0.5 cm layer of crushed ice. Following preparation, the aliquots of gDNA were placed into the Bioruptor's® sample cradle and device assembled according to the manufacturers guidelines.
The samples were sonicated as follows:
Following sonication the fragmented DNAs were pooled and stored at −20° C.
Aliquots of the pooled sheared gDNA were purified using 1.2× to 1.8× AmpureXP beads (Beckman Coulter), dependant on the required fragment size, according to the manufacturers standard operating procedure. Finally the DNA was eluted in 10 mM Tris HCl (pH 8.5) with incubation at 65° C. for 5 min prior to removal of the magnetic beads. Purified sheared gDNAs were quantified using a NanoDrop spectrophotometer and the fragment size determined using a DNA 7500 chip for the bioanalyser 2100 (Agilent). Purified sheared gDNA was stored at −20° C.
25 μl reactions were prepared on ice containing 500 ng to 1000 ng of fragmented gDNA, 1× Thermopol buffer (NEB), 2% PEG 4000 (Fermentas) 1.0 mM ATP (Thermo), 0.4 mM dNTPs (Promega) 0.4 U/μl T4 polynucleotide kinase (Fermentas), 0.1 U/μl T4 DNA polymerase (Fermentas), 0.05 U/μl Taq DNA polymerase (Kapa biosystems). Reactions were vortexed briefly to mix and incubated for 20 min at 25° C. followed by 72° C. for 20 min.
The reactions were placed on ice. A 5 μl solution containing 1× Thermopol buffer (NEB), 10 times (fragments >700 bp) to 30 times (fragments <700 bp) the molar equivalent of the R.Block T7 adapter and 5 units of T4 DNA ligase (Fermentas) was added directly to each reaction. Reactions were vortexed to mix and incubated for 60 min at 22° C. and for 15 min at 65° C. Similar reactions were pooled and mixed by vortexing. Samples were stored for no longer than 24 hours over night.
Reactions were purified using 1.8× AmpureXP beads (Beckman Coulter), according to the manufacturer's standard operating procedure. Finally the DNA was eluted in 25 μl 10 mM Tris HCl (pH 8.5) with incubation at 65° C. for 5 min prior to removal of the magnetic beads. Fragment size and linker carry over was assessed using a DNA high sensitivity chip for the bioanalyser 2100 (Agilent). Purified sheared gDNA was stored at −20° C.
LMPCR of the R.Block template library
50 μl PCRs contained 1× LongAmp buffer (NEB), 1 pmol/μl or each LMPCR primer, 1 μg/μl Ultra Pure BSA (Ambion), 0.3 mM dNTPs, 0.1 U/μl LongAmp DNA polymerase (NEB) and 20 μl of the purified ligated gDNA fragments.
PCRs were cycled as follows: 10× to 16×95° C. for 2 min, (95° C. for 30 sec, 60° C. for 30 sec, 72° C. for 1 min to 1.5 min) 72° C. for 5 min then held at 15° C.
Several pooled PCRs were purified using MinElute columns (Qiagen) using the standard operating procedure, with the following exceptions: All centrifugations were performed at 16000 RCF. Elution buffer (EB—10 mM Tris HCl pH 8.5) was heated to 70° C. 10 μl of heated EB was added directly to each column followed by 5 min incubation at 70° C. The eluate was recovered by centrifugation. A further 10 μl of pre-heated EB was added to each column, incubated for 1 min at 70° C. and the eluate recovered by centrifugation. Following purification, all eluates were pooled and vortexed. Eluted samples were stored at −20° C.
Fragment size and linker carry over were assessed using a DNA 7500 chip for the bioanalyser 2100 (Agilent). The majority of fragments ranged from 100 bp to 500 bp for R.Block-Hc (derived from human Cot-1 DNA), 200 to 700 bp for R.Block-Hg (genomic sequence derived from human DNA) and >700 bp for R.Block-Sg (genomic sequence derived from Salmon DNA).
Each library was quantified using a NanoDrop spectrophotometer (Thermo).
25 μl Transcription reactions contained 1 μg of an R.Block template library, 1×RNAMaxx transcription buffer Agilent) 4 mM of each rNTP, 30 mM DTT (Agilent), 0.015 U/μl Yeast inorganic Pyrophosphatase (Agilent), 1 U/μl SUPERase .IN (Ambion) and 8 U/μl T7 RNA polymerase (Agilent). Reactions were incubated for 2 hours at 37° C.
To stop the reactions 1 μl Turbo DNase (2 U/μl) was added to each separate reaction and incubated for 30 min at 37° C.
A mixture of 6 μl RNAMaxx 5× transcription buffer, 2.5 μl SUPERase. IN, 23.5 μl 5 M Urea and 3 μl proteinase K (recombinant) (Thermo) was added to each reaction. Reactions were incubated for 30 min at 37° C. Reactions were held on ice and were not stored until purified.
Sufficient MicroBioSpin p6 columns (BioRad) were warmed to room temperature such that 75 μl of un-purified probe library could be passed through each column. The probe library was purified according to the manufacturer's standard operating procedure. Following purification, eluates were pooled prior to the addition of one 20th the volume of SUPERase. IN (Ambion). R.Blocks were gently mixed and stored at −80° C.
R.Block Fragment size and linker carry over were assessed using an RNA 6000 nano chip for the bioanalyser 2100 (Agilent). A high quality R.block had the following features: The majority of fragments ranged from >100 nt for R.Block-Hc, >200 nt for R.Block-Hg and >800 nt for R.Block-Sg (genomic sequence derived from Salmon DNA); >80 μg total Mass of R.Block per transcription; Very little primer or linker contamination.
R.Blocks were stored at −80° C.
A custom oligonucleotide design software (Lancaster, O. et al., Unpublished) was used to design non-overlapping (minimum gap=5 nt) nucleotide sequences (average 60 nt). Probes were designed to have a tm between 65° C. and 75° C., and were extended or contracted by up to 10 nt to fit within the tm range. No Probes were placed within 10 bp of repetitive sequences. Each probe was permitted to match the genome ≤5 times. The software then calculated the average Tm of all identified sequences. Subsequently, for each kb of targeted sequence, the 10 sequences that most closely matched the average Tm were selected. The remaining sequences were discarded.
The software output the probe sequences as a FASTA file (9) which was submitted to Mycroarray (Mycroarray MI USA). We developed a custom perl script to add primer annealing sites to each sequence (5′ CTGGCAGACGAGAGGCAGTG/genomic sequence/GTAGACCTCACCAGCGACGC 3′). The resulting FASTA file was then converted, using the same custom perl script) into a tab delimited text based table.
The template probe pool, that contained all the sequences contained in the tab delimited text based table, was synthesised so that each individual probe was synthesised at seven different loci on a microarray (Mycroarray). Following synthesis, the probes were harvested and lyophilised by the manufacturer prior to shipping. The probes were re-constituted in 10 mM Tris Hcl (pH 8) (Qiagen) to a stock concentration of ˜10 ng/μl. Working concentrations of ˜10 pg/μl were prepared by serially diluting the stock probe pool with Tris Hcl (pH 8).
50 μl PCRs contained 1×Q5 reaction Buffer (NEB), 1.5 μM ProAmpFO4E (5′ phosphate-CTGGCAGACGAGAGGCAGTG 3′), 1.5M ProAmpR01 (5′ biotin-TEG-GCGTCGCTGGTGAGGTCTAC 3′), 300 μM dTTP, 300 μM dATP, 300 μM dGTP, 105 μM dCTP (Promega), 195 μM Biotin-16-Aminoallyl-2′-dCTP (Trilink BioTechnologies), 1 U Thermostable Inorganic Pyrophosphatase (NEB), 0.5 U Q5 DNA polymerase (NEB), 10 pg-20 pg template probe pool (Mycroarray). Reactions were sealed with a heat sealable PCR film or PCR strip-caps (Thermo fisher Scientific). The reactions were cycled as follows: 98° C. for 2 min, 17× (98° C. for 15 sec, 72° C. for 25 sec), 72° C. for 1 minute then held at 15° C.
100 PCRs were pooled and vortexed. 200 μl aliquots of the pooled PCRs were purified using MinElute columns (Qiagen) using the standard operating procedure, with the following exceptions: All centrifugations were performed at 16000 RCF. Elution buffer (EB—10 mM Tris HCl pH 8.5) was heated to 70° C. 10 of heated EB was added directly to each column and incubated at 70° C. for 5 min. The eluate was recovered by centrifugation for 1 min. A further 10 μl of pre-heated EB was added to each column, incubated for 5 min at 70° C. and the eluate recovered by centrifugation for 1 min. Following purification, all eluates were pooled and vortexed.
A bulk reaction was prepared on ice such that every 201 μl contained 2 μg amplified complex pool, 1× Terminal Transferase buffer (NEB), 1× CoCl2 (NEB), 2.5 U BtsI, 4 μg BSA (NEB) and 500 μM ddATP (Trilink). The reaction was mixed by vortexing and incubated for 30 min at 55° C. The reaction was incubated on ice for 5 min.
3 μl of a mixture containing 50 U of Terminal Transferase (NEB) in 1× Terminal Transferase buffer (NEB) was added per 20 μl of the reaction. The reaction was vortexed to mix and incubated for 60 min at 37° C. The reaction was incubated on ice for 5 min.
3 μl of a mixture containing 12.5 U of λ exonuclease (NEB) in 1× Terminal Transferase buffer (NEB) was added per 20 μl of the initial bulk reaction volume. The reaction was vortexed to mix and incubated for 20 min at 37° C. and 20 min at 80° C.
Sufficient MicroBioSpin p6 columns (BioRad) were warmed to room temperature such that 75 μl of un-purified probe library could be passed through each column. The probe library was purified according to the manufacturer's standard operating procedure. Following purification, the eluates were pooled and gently vortexed.
The purified probe library was analysed using an RNA 6000 nano chip for the Bioanalyser 1100 (Agilent) and quantified using a NanoDrop spectrophotometer (Thermo) An ideal hTE capture probe library had a concentration of ≥50 ng/μl, an OD 260:280 of 1.7-2.0 and had an average fragment size of ˜150 nt (the fragment size is >100 nt due to the presence of biotin molecules retarding migration through the gel matrix).
Fragmentation of Human gDNA for Fragment Library Preparation
Human gDNA was diluted in 10 mM Tris HCl (pH 8.5) to a concentration of 20 ng/μl. 110 μl of the diluted DNA was aliquoted into separate 1.5 ml sonication tubes (Diagenode), vortexed and centrifuged briefly prior to incubation on ice until the Bioruptor (Diagenode) was prepared. The Bioruptor's shearing bath was chilled for 30 min with water containing an ˜0.5 cm layer of crushed ice. Following preparation, the aliquots of gDNA were placed into the Bioruptor's sample cradle and device assembled according to the manufacturers guidelines. The samples were sonicated as follows: Power setting Low, Sonication cycle 15 sec on followed by 90 sec off for 14 cycles to 16 cycles.
Aliquots of the pooled sheared gDNA were purified using 0.8× AmpureXP beads (Beckman Coulter) according to the manufacturers standard operating procedure. The DNA was eluted in 10 mM Tris HCl (pH 8.5) with incubation at 65° C. for 5 min prior to removal of the magnetic beads. Purified sheared gDNAs were quantified using a NanoDrop spectrophotometer and the fragment size determined using a DNA 7500 chip for the bioanalyser 2100 (Agilent). An ideally fragmented library had an average fragment size of ˜1200 bp with >50% of fragments falling in the range of 1000 to 2000 bp. Purified sheared gDNA was stored at −20° C.
End Repair and Ligation:
Illumina TruSeq adaptors were added to 2.5 μg aliquots of fragmented human gDNA using the NEBNext DNA Library Prep Master Mix Set for Illumina sequencing (E6040) and the NEBNext Multiplex Oligos for Illumina sequencing (Index Primers Set 1) (E7335).
Size Selection:
Reactions were fractionated on a 1.0% LE agarose gel stained with 0.2 mg/ml EtBr. Using a Dark Reader transilluminator (Clare Chemical Research), gel slices containing fragments in the range of 1000 bp to 2000 bp were excised. DNA fragments were recovered using Qiagen gel extraction columns and eluted in 22 μl 10 mM Tris HCl pH 8.5.
Linker Mediated PCR Enrichment for Ligated Fragments:
For each ligated DNA library 4×100 μl PCRs contained 1×Q5 High-Fidelity 2× Master Mix, 1 μM NEBNext Universal PCR Primer for Illumina, 1 μM NEBNext Index Primer for Illumina and 5 μl of ligated fragments.
PCRs were cycled as follows: to 98° C. for 30 sec, 8× to 10× (98° C. for 10 sec, 65° C. for 1 min 15 sec) 65° C. for 1 min then held at 15° C. PCRs were pooled and purified using 0.5× AmpureXP beads (Beckman Coulter) according to the manufacturers standard operating procedure. Recovered fragment library DNA was eluted in 25 μl 10 mM Tris HCl (pH 8.5) with incubation at 65° C. for 5 min prior to removal of the magnetic beads. Eluted samples were stored at −20° C.
QC:
Fragment size and linker carry over were assessed using a DNA 7500 chip for the bioanalyser 2100 (Agilent). The average fragment size was ˜1300 bp. Each library was quantified using a NanoDrop spectrophotometer (Thermo Fisher Scientific).
Fragmentation of gDNA Samples for R. Block Production
Human and salmon gDNA was diluted in 10 mM Tris HCl (pH 8.5) to a concentration of 20 ng/μl. 110 μl of the diluted DNA was aliquoted into separate 1.5 ml sonication tubes (Diagenode), vortexed and centrifuged briefly prior to incubation on ice until the Bioruptor® (Diagenode) was prepared. The Bioruptor's® shearing bath was chilled for 30 min with water containing an 0.5 cm layer of crushed ice. Following preparation, the aliquots of gDNA were placed into the Bioruptor's® sample cradle and device assembled according to the manufacturers guidelines. The samples were sonicated as follows: Power setting Low, 15 to 30 sec on followed by 90 sec off for 2 cycles to 4 cycles for the Salmon gDNA and 22 cycles to 24 cycles for the Human gDNA
Aliquots of the pooled sheared gDNA were purified using 1.8× AmpureXP beads (Beckman Coulter), dependant on the required fragment size, according to the manufacturers standard operating procedure. Finally the DNA was eluted in 10 mM Tris HCl (pH 8.5) with incubation at 65° C. for 5 min prior to removal of the magnetic beads. Purified sheared gDNAs were quantified using a NanoDrop spectrophotometer and the fragment size determined using a DNA 7500 chip for the bioanalyser 2100 (Agilent). The average size for the human gDNA was ˜500 bp and for the salmon gDNA, 3000 bp. Purified sheared gDNA was stored at −20° C.
End Repair:
25 μl reactions were prepared on ice containing 1000 ng of fragmented gDNA or human Cot-1 DNA, 1× Fast Digest buffer (Fermentas), 1 mM ATP (Thermo), 0.4 mM dNTPs (Promega) 10 U T4 polynucleotide kinase (Fermentas), 2.5 U T4 DNA polymerase (Fermentas), 1.25 U Taq DNA polymerase (Kapa biosystems). Reactions were vortexed briefly to mix and incubated for 20 min at 25° C. followed by incubation at 72° C. for 20 min.
Ligation:
The reactions were placed on ice. 0.4 μM R.Linker (5′ CGACCGACTGCCACCTGCGCTAATACGACTCACTATAGGGCTAGTGCTTCGCATC CGA*A*G*T* 3′; 5′ phosphate-CTTCGGATGCGAAGCACTAGGGCGTGCAGCCTGTGGC*A*G*C 3′; where * denote a phosphorothioate Bond) and 5 U of T4 DNA ligase (Fermentas) was added directly to each reaction. Reactions were vortexed to mix and incubated for 20 min at 250.
Linker Removal:
Samples were purified using 1.8× Ampure XP beads. Recovered fragments were recovered in 50 μl 10 mM Tris HCl (pH 8).
Linker mediated PCR enrichment for ligated fragments:
100 μl PCRs contained 1× FastStart high fidelity buffer (Roche), 1 μM of each fragment library Linker Mediated PCR (LMPCR) primer (5′ CGACCGACTGCCACCTGCGC 3′; 5′ GCTGCCACAGGCTGCACGCC 3′), 2% DMSO (Sigma-Aldrich), 0.2 mM dNTPs, 5 U FastStart DNA polymerase blend (Roche) and 50 μl of the ligated gDNA fragments. PCRs were cycled as follows: to 95° C. for 10 min, 12× (95° C. for 30 sec, 64° C. for 30 sec, 72° C. for 3 min) 72° C. for 7 min then held at 15° C. PCRs were purified using 1.8× AmpureXP beads (Beckman Coulter) according to the manufacturers standard operating procedure. Recovered fragment library DNA was eluted in 25 μl 10 mM Tris HCl (pH 8.5) with incubation at 65° C. for 5 min prior to removal of the magnetic beads. Eluted samples were stored at −20° C.
QC:
Fragment size and linker carry over were assessed using a DNA 7500 chip for the bioanalyser 2100 (Agilent). Each library was quantified using a NanoDrop spectrophotometer (Thermo Fisher Scientific).
25 μl Transcription reactions contained 1 μg of an R.Block DNA template library (human gDNA, salmon gDNA and human Cot-1 DNA), 1×RNAMaxx transcription buffer Agilent) 4 mM of each rNTP, 30 mM Dithiothreitol (Agilent), 0.015 U/μl Yeast inorganic Pyrophosphatase (Agilent), 25 U SUPERase .IN (Ambion) and 200 U T7 RNA polymerase (Agilent). Reactions were incubated for 2 hours at 37° C. To stop the reactions 2 U Turbo DNase (Thermo Fisher Scientific) was added to each separate reaction and incubated for 30 min at 37° C.
A mixture of 1×RNAMaxx transcription buffer, 50 U SUPERase. IN, 23.5 μl 3.35M Urea and 6 mg proteinase K (recombinant, PCR grade) (Thermo Fisher Scientific) was added to each reaction. Reactions were incubated for 30 min at 37° C.
Sufficient MicroBioSpin p6 columns (BioRad) were warmed to room temperature such that 75 μl of un-purified probe library could be passed through each column. The probe library was purified according to the manufacturer's standard operating procedure. Following purification, eluates were pooled prior to the addition of one 20th the volume of SUPERase. IN (Ambion).
The R.Block products produced are hereinafter labelled R.Block-Hg (derived from human genome DNA), R.Block-Hc (derived from human Cot-1 DNA) and R.Block-Sg (derived from salmon genome DNA).
R.Block Fragment size and linker carry over were assessed using an RNA 6000 nano chip for the bioanalyser 2100 (Agilent). A high quality R.block had the following features: The majority of fragments ranged from >200 nt for R.Block-Hg (derived from human gDNA) and R.Block-Hc (derived from human Cot-1 DNA) and >800 nt for R.Block-Sg (derived from salmon gDNA); >80 μg total Mass of R.Block per transcription; Very little primer or linker contamination. R.Blocks were stored at −80° C.
Optimised in-Solution hTE Protocol
30 μl hybridisation mixes contained: 1× hybridisation buffer (0.02% Ficol, 0.04% PVP, 45 mM Tris-HCl 11 mM Ammonium Sulphate, 20 mM MgCl2, 6.8 mM 2-Mercapthoethanol and 4.4 mM EDTA. pH 8.5), 0.5 μg DNA fragment library (above), R.Block (Hg, He or Sg) 10 μg (unless stated otherwise), 30 U Superase. IN RNase inhibitor and 60 ng multi-biotinylated probe (as above).
The hybridisation mixes were: incubated at 98° C. for 2 min; cooled at a rate of 1° C. per second to 72° C.; step-down incubated for 60 sec at 1° C. intervals, cooled at a rate of 1° C. per second between each interval; and incubated at 62° C. for 24 hours.
In other examples the incubation steps may be performed at a temperature in the range of 50° C. to 80° C., depending on the molecules hybridised, as will be determined by the skilled person.
0.75 mg MyOne Streptavidin C1 paramagnetic dynabeads (Invitrogen) were washed twice in 100 μl 1× hybridisation buffer. The dynabeads were then re-suspended in 20 μl 1× hybridisation buffer supplemented with 10 μg of R.Block or other blocker (unless stated in the results). The resulting binding solutions were incubated at 55° C. for 30 min prior to heating to 62° C. The hybridisation mixes were then transferred to the binding solutions, mixed with gentle pipetting and incubated at 62° C. for 20 min.
In other examples the binding steps may be performed at a temperature in the range of 50° C. to 80° C., depending on the molecules hybridised, as will be determined by the skilled person
Following hybridisation, the dynabeads were concentrated, and the hybridisation solution removed. The samples were returned to 62° C. prior re-suspension of the dynabeads in 150 μl of pre-warmed (62° C.) 1× wash solution (50 mM HEPES, 0.04% PVP, 10 mM MgCl2, 6.8 mM 2-MercaptoEthanol. pH 8.5). The samples were incubated at 62° C. for 5 min.
The dynabeads were concentrated, and the wash solution removed. The samples were returned to 62° C. prior re-suspension of the dynabeads in 50 μl of 1× hybridisation buffer supplemented with 5 U of Hybridase Thermostable RNase H (Epicentre). The samples were incubated at 62° C. for 15 min.
The dynabeads were concentrated and the RNase solution was removed. The beads were washed once more (as above) in 150 μl pre-heated wash solution, incubated at 62° C. for 5 min.
The dynabeads were concentrated and the wash solution was removed. The dynabeads were re-suspended, at room temperature, in 50 μl 10 mM Tris HCl (pH 8.5).
In other examples, washing steps may be performed at a temperature in the range of 50° C. to 80° C., depending on the molecules hybridised, as will be determined by the skilled person.
4×100 μl PCRs contained 1× Q5 PCR master-mix (NEB), 2 μM of each library amplification primer (5′ AATGATACGGCGACCACCGAG 3′; 5′ CAAGCAGAAGACGGCATACGAG 3′) and 10 μl of the bead bound captured DNA library. PCRs were cycled as follows: to 98° C. for 30 sec, 10× (98° C. for 30 sec, 65° C. for 1.5 min) 65° C. for 5 min then held at 15° C.
PCRs were purified using 0.5× AmpureXP beads (Beckman Coulter) according to the manufacturers standard operating procedure. Recovered fragment library DNA was eluted in 50 μl 10 mM Tris HCl (pH 8.5) with incubation at 65° C. for 5 min prior to removal of the magnetic beads. Eluted samples were stored at −20° C.
All libraries had been prepared with linkers containing multiplex identifier sequences to allow sample pooling prior to sequencing. Illumina MiSeq sample pooling and sequencing was performed at the University of Leicester Genomics Service Facility (NUCLEUS). Sequencing was performed using the MiSeq Reagent Kit v3 (2×300 nt) sequencing chemistry (Illumina).
Probe sequences were aligned to the human genome (GRCh37.p13 http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/data/1 using the Bowtie 2 alignment algorithm (10), specifying the -f flag to state that the probe sequences were in the FASTA format (above) and the -S flag to indicate that the output should be written into files in the SAM format. The alignments, that were generated by Bowtie 2, in the SAM format were converted into a sorted and indexed BAM format using SAMtools (11). Finally, the bamToBed function of BEDtools (12) was used to tabulate the coordinates of each probe sequence in the BED format: chromosome number; start coordinates; and end coordinates.
FASTQ files were returned as standard from Illumina MiSeq sequencing. The Bowtie 2 alignment tool was used to align the NGS sequences to the human genome (GRCh37.p13). The -q flag was used to indicate that the sequences were in the FASTQ format, the -1 and -2 flags indicated that the NGS data comprised sequence pairs and the -S flag indicated that the output should be written into files in the SAM format. The output SAM files were converted to into sorted and indexed BAM files using SAMtools. Copies of the BAM files were made, and from these copied files, sequence duplicates were removed using the MarkDuplicates tool (Remove_Duplicates=True) of the Picard tool set (http://broadinstitute.github.io/picard/).HTE quality metrics were calculated using the Target Enrichment Quality Control (TEQC) (13) library for the R statistical package (14) (Results).
The raw BAM files were imported into TEQC. TEQC was used to filter out valid NGS sequence pairs (read-pairs) with the maximum distance permitted between reads paired sequences set to 5 kb.
The potential advantages of blocking with an R.Block molecule (such as produced by the methods of the invention, as exemplified in Examples 9 and 10) are:
A series of investigations to determine whether R.Block effectively blocks network formation via interspersed repeat DNA were performed.
The R.Block Products and multi-biotinlyated probes used, were manufactured according to the process described in Example 10. The investigation was an in solution target DNA capture.
The following R.Block preparations made according to the method of Example 10 were tested as blocking agents to block network binding of interspersed repetitive DNA sequences:
In brief it was found that R.Block based on human gDNA was a more effective network blocker than R.Block based on Cot-1 DNA. 10 μg of R.Block performed more effectively than 5 μg of R.Block, as shown in
For this investigation, “hybridisation mixes” contained: 1 μg of a gDNA fragment library of Example 6 (Average fragment size ˜1.2 kb); one blocker selected from:
The hybridisation mixes were: incubated at 95° C. for 2 min; cooled at a rate of 1° C. every 10 sec to 10° C. above a pre-defined optimal annealing temperature; step-down incubated for 30 sec at every ° C. above the optimal annealing temperature and cooled at a rate of 1° C. every 10 sec between each ° C.; and incubated at the optimal annealing temperature for 24 hours.
1 mg of streptavidin coated paramagnetic dynabeads was washed twice in the proprietary hybridisation buffer.
Two different dynabeads were used for this investigation.
It was found that MyOne Streptavidin T1 tended to clump at temperatures ≥60° C., so its use was stopped.
Washing of the MyOne streptavidin C1 dynabeads at 65° C. reduced non-specific interaction between the dynabead surfaces and gDNA fragments better than washing at 55° C.
The dynabeads were then re-suspended in the hybridisation buffer and one of the following surface blocking agents was added:
The surface blocking agents act to mask or block repetitive sequence binding to the dynabeads.
These binding mixes were incubated at 55° C. for 30 min prior to heating to the pre-defined optimal annealing temperature.
Hybridisation mixes were then transferred to the binding solution, mixed with gentle pipetting and incubated at the optimal annealing temperature for 20 min.
Following hybridisation, the dynabeads were concentrated, re-suspended in a wash buffer (50 mM HEPES, 0.04% PVP, 10 mM MgCl2, 6.8 mM 2-MercapthoEthanol. pH 8.5). and incubated at a predefined washing temperature for 5 min.
The dynabeads were concentrated, re-suspended in: 1× RNase If buffer (NEB); 50 U RNase If (NEB) (unless stated); and 1% Triton X-100 (Fluka) (total volume 50 pd); incubated at 37° C. for 15 min, and finally incubated at the predefined wash temperature for 5 min.
The dynabeads were again concentrated, re-suspended in a proprietary wash buffer and incubated at a predefined washing temperature for 5 min.
Finally, the dynabeads were concentrated, re-suspended in 50 μl 10 mM Tris HCl (pH8.5).
qPCR
Control curve: An aliquot of the fragment library used for this investigation was initially diluted to 1000 ng/μl. An aliquot was further diluted to 500 ng/μl. These samples were serially diluted by a factor of 1 in 10 to cover the range from 1 ng/μl to 0.0005 ng/μl.
Primary PCR: Duplicate 25 μl PCRs contained 1× Maxima SYBR Green hot start qPCR master mix (Maxima HS) (Thermo), 0.96 μM Rapid A PCR primer, 0.96 μM Rapid B PCR primer and 10 μl vortexed test dynabeads (see above) or control DNA. PCRs were heated to 95° C. for 10 min followed by 7 cycles of (95° C. for 30 sec; 64° C. for 30 sec and 72° C. for 3 min). Finally PCRs were incubated at 72° C. for 5 min.
Secondary PCR: 25 μl PCRs contained 1× Maxima HS, 0.96 μM Rapid A PCR primer, 0.96 μM Rapid B PCR primer and 1 μl primary PCR following magnetic concentration of the beads (concentration not required for the control PCRs). PCRs were performed on the Light Cycler 480 (Roche). PCRs were heated to 95° C. for 10 min followed by 30 cycles of (95° C. for 30 sec; 64° C. for 30; 72° C. for 3 min; and imaging).
Analysis of the qPCR Data
A standard curve was plotted for the control series. The mass of gDNA library bound to each 0.2 mg of dynabeads was determined relative to the standard curve. The recovered mass was used to calculate the percentage of library fragment recovery caused by interactions with the dynabeads surface.
Results Cot-1DNA offered no significant reduction in bead surface to DNA fragment interaction when compared to un-blocked beads. Background recovery in both cases was >0.3%.
Samples containing combinations of Salmon gDNA with Cot-1DNA, R.Block-Hc or R.Block-Hg reduced background DNA fragment recovery to <0.01%. 5 μg and 10 μg of R.Block based on salmon gDNA (“R.Block-Sg”) was also tested. Results indicated that the efficacy of blocking non-specific capture of DNA was, in order, R.Block-Sg>R.Block-Hg>R.Block-Hc.
For this investigation, the hybridisation and binding protocol of Example 10 was used, with varying combinations of blocking agent, one for use in the hybridisation mix (as network blocker during the hybridisation step of Example 10) and the other in the surface blocking mix (binding mix during the binding step of Example 10).
The target DNA comprised 1 μg of a gDNA fragment library (average fragment size ˜1 kb). MyOne Streptavidin C1 paramagnetic dynabeads were used instead of MyOne Streptavidin T1 (Invitrogen). The dynabeads were washed three times in 1× hybridisation buffer at room temperature. Finally the dynabeads were re-suspended in 20 μl to 65 μl of 1× hybridisation buffer containing 1 U/μl SUPERase .IN (Ambion) and 5 μg of the relevant blocking agent. This was incubated at 55° C. for 30 min prior to being heated to a pre-determined binding temperature and the addition of the hybridisation mix. Next generation sequencing was performed on the Illumina MiSeq platform.
The results for mixes 1, 2, 3, 5 and 7 are shown in
The combination of network blocking with R.Block-Hg and surface blocking with R.Block-Sg was in fact approximately 4 times as effective as using Cot-1 DNA blocker alone and approximately 2 times as effective as using Cot-1 DNA and salmon DNA (as network and surface blockers respectively), as shown in
In addition, several potential advantages of the R.Block have been identified over the use of DNA based blockers. For example R.Block-Hc, -Sg and -Hg not only block surface interactions, but also mask interspersed repetitive sequences. This is beneficial when performing in solution target capture.
The above examples and embodiments are described by way of example only. Many variations are possible without departing from the scope of the invention as defined in the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
1509395.8 | Jun 2015 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/GB2016/051531 | 5/26/2016 | WO | 00 |