The present invention is in the field of biochemistry and diagnostics. It is more particularly in the field of molecular biology and generation of nucleic acid libraries and more specifically relates to generation of nucleic acid library templates for sequencing.
Many methods in basic research, diagnostics, therapeutics and forensics are based on the analysis of nucleic acids. More recently, highly parallel processes such as microarrays and next generation sequencing (also known as “massive parallel sequencing”) have been developed that allow analysis of thousands, millions and even beyond billions of different nucleic acid molecules simultaneously. For this purpose, nucleic acids need to be transformed into a common format to be processed in parallel as a nucleic acid library. The optimal library format is to supply unknown nucleic acids with common flanking sequences, so called adapters. Since many of the analysis processes require nucleic acid amplification steps such as PCR, the flanking sequences represent binding sites for at least two different specific primers. The classic method to add adapters to unknown nucleic acids is to first generate blunt-ended double-stranded DNA fragments. These fragments are then incubated with two different blunt-ended double-stranded adapters to be ligated with a ligase such as T4 DNA ligase. In order to avoid excess adapter dimers, only the blunt ended double-stranded DNA fragments are phosphorylated and only one end of the adapters can be ligated. However, the ligation efficiency using blunt end DNA is lower than sticky ends. In addition, due to permutations of fragments being ligated to two different adapters, only one half of the ligation products yield different adapters at both ends. Therefore, only one half of the ligation products can be efficiently amplified in PCR. Newer approaches are directed at using one adapter with two different primer binding sites that are ligated to both 5′ and 3′-ends of each terminus of a given double-stranded DNA fragment. Thus, each of the two strands of a double-stranded DNA fragment is supplied with different primer binding sites for efficient PCR amplification. By adding a defined 3′ single base overhang to the double-stranded DNA fragment and utilising adapters with a complementary single base overhang, the ligation reaction is much more efficient and does not yield any dimers or other multimers. Currently, two commercial approaches solve these problems associated with the classical adapters. One approach disclosed in WO 2007/052006 A1 is based on a Y-shaped adapter format which represents two annealed oligonucleotides comprising a complementary double-stranded region at the ligatable end and a forked, or tailed end of unpaired nucleotides comprising distinct sequences that can be used as specific primer binding sites. Another approach disclosed in WO 2009/133466 A2, is based on a hairpin-forming adapter comprising one partially annealed oligonucleotide which comprises a complementary double-stranded region at the ligatable 5′ and 3′ termini and a looping, connecting part comprising distinct sequences that can be used as specific primer binding sites. The loop comprises a scissile moiety which has to be cleaved after ligation in an additional processing step in order to be compatible with downstream nucleic acid library processing steps such as PCR. The hairpin and Y-adapter formats yield common sequences flanking both ends of the nucleic acid library insert, which requires longer primers in sequencing in order to render the sequencing reaction specific for only one of the ends. Another, yet non-commercial adapter format disclosed in WO 2009/133466 A2 is based on an adapter that yields different restriction enzyme cleavage sites downstream of the primer binding site after the first primer extension step. This allows ligation with a second adapter comprising a different primer binding site after cleavage. However, these methods comprise additional and cumbersome processing steps or require long adapter oligonucleotides which are less efficient in chemical synthesis and ligation compared to simple double-stranded adapters. There is a need to provide a method that does not require complicated enzymatic processing steps after ligation prior to PCR and utilises shorter, double-stranded adapter molecules without complex structures that do not yield common flanking sequences at the generated nucleic acid library.
The present invention solves the technical problem of providing a method based on a single adapter molecule without elongated oligonucleotides comprising forked or hairpin structures that is well accepted by ligating enzymes and does not require additional steps after ligation for generating a nucleic acid library without extensive common sequences at the termini.
A solution to the technical problem is providing an at least partially double-stranded adapter, which sequence is converted after ligation, said adapter comprising one or more modified bases in the region of a primer binding site that is ligated to both ends of a nucleic acid fragment and converted to at least two distinct primer binding sites for an efficient amplification reaction with more than one specific primer.
The method according to the invention has the advantage of reducing the length of adapter nucleotides to the primer binding sites. Alternatively, a substantially single-stranded adapter can be used of which only one strand needs to be ligated, which is advantageous in combination with ligating enzymes such as topoisomerases and transposases that can only join one strand. Adapter size reduction results in a more efficient and economic synthesis of the oligonucleotides and minimises common flanking sequences in a nucleic acid library to optional elements such as barcode and enzyme recognition sites. Therefore, a sequencing primer may not comprise 3′ ends that could anneal to both ends of a template, which also reduces the requirement for stronger and specific binding of the 5′ part of the primer. Ligating enzymes require defined double stranded nucleic acids for efficient ligation, which is achieved by omitting problematic structures such as single-stranded tails and loops. The whole library generation process may not require separate and intermediate enzymatic polishing steps after ligation as the sequence conversion by a polymerase represents an integral part of the mandatory amplification step.
A first aspect of the invention relates to a method of producing an asymmetrically tagged nucleic acid fragment, said method comprising the steps:
In certain embodiments, the primer-binding region of the adapter comprises one, preferably two or more base(s) that are directly converted in step ii) of the method of producing an asymmetrically tagged nucleic acid fragment to differently coding base(s).
In certain embodiments, the direct conversion of the base(s) in the primer binding region in the method of producing an asymmetrically tagged nucleic acid fragment is performed by an enzymatic, chemical or photochemical reaction.
In certain embodiments, the enzymatic direct conversion of bases in the method of producing an asymmetrically tagged nucleic acid fragment is performed by a nucleic acid repair enzyme, an acetyl esterase and/or penicillin G acylase.
In certain embodiments, the chemical direct conversion of bases in the method of producing an asymmetrically tagged nucleic acid fragment is performed by removal of a masking group by a Staudinger reaction.
In certain embodiments, the method of the invention of producing an asymmetrically tagged nucleic acid fragment comprises an adapter comprising one or more universal bases and/or convertible bases in the primer binding region that are not unmodified adenine, cytosine, guanine, thymidine, or uracil.
In certain embodiments, the universal base is select from inosine (hypoxanthine), xanthine, oxanine, 8-oxo-guanine, nebularine, 3-nitropyrrole, 5-nitroindole, 4-methylindole, O4-methyl-thymidine (O4mT), O4-ethyl-thymidine (O4eT), 6H,8H-3,4-dihydro-pyrimido[4,5-c][1,2]oxazin-7-one (P), N6-methoxy-2,6-diaminopurine (K), a 5-fluoroindole base, pyrrolidine, a dSpacer (1′,2′-dideoxyribose) or an abasic site.
In another embodiment, the method the invention of producing an asymmetrically tagged nucleic acid fragment comprises an adapter comprising one or more universal bases and/or convertible bases in the primer binding region, wherein particularly one of more bases comprised within the adapter, particularly one ore more universal bases and/or convertible bases, are modified bases.
In certain embodiments, the convertible base comprises O4-methyl-thymidine, O6-methyl-guanine, S4-methyl-thio-uridine, S4-methyl-thio-thymidine, S6-methyl-thio-guanine, N1-methyl-adenine, N3-methyl-adenine, C8-methyl-guanine, N2,3-etheno-guanine, O6-methyl-hypoxanthine, O4-ethyl-thymidine, O6-ethyl-guanine, S4-ethyl-thio-uridine, S4-ethyl-thio-thymidine, S6-ethyl-thio-guanine, N1-ethyl-adenine, N3-ethyl-adenine, C8-ethyl-guanine, O6-ethyl-hypoxanthine, S4-thio-uridine, S4-thio-thymidine, S6-thio-guanine, 1-ethynyl-dSpacer (abasic site with an alkyne group), or a 1-thiol-dSpacer (abasic site with a thiol group).
In certain embodiments, the first adapter sequence comprising one or more universal bases and/or convertible bases in the primer binding region is converted by a first polymerase with a first base bias in step ii) of the method of producing an asymmetrically tagged nucleic acid fragment; and wherein a primer is specifically hybridised to the primer binding site converted by a second polymerase with a second base bias in step iii).
In certain embodiments, the primer binding region of the adapter of producing an asymmetrically tagged nucleic acid fragment is converted by at least two polymerases with different base incorporation bias, wherein the different bias preferably obeys the “A” rule or “C” rule.
In certain embodiments, the adapter primer-binding region in the method of producing an asymmetrically tagged nucleic acid fragment comprises one or more universal and/or convertible bases is double-stranded and both strands are ligated to both ends of the nucleic acid fragment.
In certain embodiments, the one or more universal base in the method of producing an asymmetrically tagged nucleic acid fragment is converted by specific primer annealing thereto and subsequent polymerase extension.
In certain embodiments, the method of producing an asymmetrically tagged nucleic acid fragment comprises the steps of:
In certain embodiments, the at least one convertible base and the at least one converted base preferably pair with different bases.
In certain embodiments, the conversion of the at least one convertible base comprised within the primer binding region of the first and/or said second adapter nucleic acid molecule is performed by an enzymatic, chemical or photochemical reaction, particularly by a nucleic acid repair enzyme, an acetyl esterase and/or penicillin G acylase.
In certain embodiments, the method of the invention comprises
In certain embodiments, the method of the invention comprises
In certain embodiments, the first extension step is performed by a first polymerase with a first base bias, and the second extension steps is by a second polymerase with a second base bias, e.g. the first polymerase follows the A-rule and the second polymerase follow the C-rule or vice versa.
In certain embodiments, a primer anneals to the strand of the at least partially double stranded second adapter nucleic acid molecule ligated to the template strand and to the first and/or third antisense adapter nucleic acid molecule at different temperatures, particularly differing by at least 2° C., 5° C. or 10° C.
In certain embodiments, a primer anneals to the strand of said at least partially double stranded first adapter nucleic acid molecule ligated to the complementary strand and to the second and/or fourth antisense adapter nucleic acid molecule at different temperatures, particularly differing by at least 2° C., 5° C. or 10° C.
In certain embodiments, the adapter ligation to the nucleic acid fragment in step i) of the method of producing an asymmetrically tagged nucleic acid fragment is performed by a ligase, a topoisomerase, a recombinase or a transposase.
In certain embodiments, the adapter, particularly the first and/or second adapter nucleic acid molecule, of the method of producing an asymmetrically tagged nucleic acid fragment comprises one or more of the following: a barcode, a recombination site, a topoisomerase recognition site, or a transposase recognition site.
In certain embodiments, the sequences in the method of the invention of producing an asymmetrically tagged nucleic acid fragment the converted primer-binding sites have lower annealing temperatures with the respective non-cognate primer than at least 2° C., preferably more than 5° C. and most preferably more than 10° C.
In certain embodiments, the first extension step for converting the first adapter sequence into a primer binding site to generate an asymmetrically tagged nucleic acid fragment comprises annealing conditions that differ from later amplification steps, wherein preferably the temperature is lower in said first extension step.
In certain embodiments, the method of the invention of producing an asymmetrically tagged nucleic acid fragment uses primers with 3′ blocking ends that can be unblocked upon specific hybridisation to the cognate primer-binding site, said unblocking preferably performed by a nucleic acid repair enzyme.
In certain embodiments, the method of the invention of producing an asymmetrically tagged nucleic acid fragment uses RNAseH or Endonuclease IV for highly specific 3′ unblocking of annealed primers.
In certain embodiments, the method of the invention of producing an asymmetrically tagged nucleic acid fragment comprises the use of a polymerase for extension and/or amplification selected from the group consisting of: a RNA polymerase, a mesophilic DNA polymerase, a reverse transcriptase, a repair polymerase (family X), and a thermophilic DNA polymerase capable of copying a universal base in a second nucleic acid strand of the adapter.
In certain embodiments, the method of the invention of producing an asymmetrically tagged nucleic acid fragment comprises synthesis steps performed on a solid phase, preferably a bridge amplification using immobilised primers.
In certain embodiments, the method of the invention of producing an asymmetrically tagged nucleic acid fragment is used for generating nucleic acid libraries suitable for massive parallel sequencing.
In a second aspect, the invention relates to an at least partially double-stranded adapter comprising:
In certain embodiments, the one or more universal and/or convertible base of the at least partially double-stranded adapter base is a modified base and not an unmodified base such as adenine, cytosine, guanine, thymidine or uracil.
In certain embodiments, the at least partially double-stranded adapter nucleic acid molecule comprises:
In certain embodiments, the universal base is select from inosine (hypoxanthine), xanthine, oxanine, 8-oxo-guanine, nebularine, 3-nitropyrrole, 5-nitroindole, 4-methylindole, O4-methyl-thymidine (O4mT), O4-ethyl-thymidine (O4eT), 6H,8H-3,4-dihydro-pyrimido[4,5-c][1,2]oxazin-7-one (P), N6-methoxy-2,6-diaminopurine (K), a 5-fluoroindole base, pyrrolidine, a dSpacer (1′,2′-dideoxyribose) or an abasic site.
In certain embodiments, the convertible base comprises O4-methyl-thymidine, O6-methyl-guanine, S4-methyl-thio-uridine, S4-methyl-thio-thymidine, S6-methyl-thio-guanine, N1-methyl-adenine, N3-methyl-adenine, C8-methyl-guanine, N2,3-etheno-guanine, O6-methyl-hypoxanthine, O4-ethyl-thymidine, O6-ethyl-guanine, S4-ethyl-thio-uridine, S4-ethyl-thio-thymidine, S6-ethyl-thio-guanine, N1-ethyl-adenine, N3-ethyl-adenine, C8-ethyl-guanine, O6-ethyl-hypoxanthine, S4-thio-uridine, S4-thio-thymidine, S6-thio-guanine, 1-ethynyl-dSpacer (abasic site with an alkyne group), or a 1-thiol-dSpacer (abasic site with a thiol group).
In certain embodiments, the at least partially double-stranded adapter, particularly the at least partially double-stranded adapter nucleic acid molecule, comprises a blocking modification between primer-binding regions in one strand.
In certain embodiments, the at least partially double-stranded adapter, particularly the at least partially double-stranded adapter nucleic acid molecule, comprises one or more backbone and/or base modifications that increase the melting temperature at the primer binding site.
In certain embodiments, the at least partially double-stranded adapter, particularly the at least partially double-stranded adapter nucleic acid molecule, comprises one or more of the following: a barcode, a restriction enzyme site, a recombination site, topoisomerase recognition site.
In certain embodiments, the at least partially double-stranded adapter, particularly the at least partially double-stranded adapter nucleic acid molecule, comprises a sequence which can be converted to one of the sequences of SEQ-ID NO:1-11 according to the method of producing an asymmetrically tagged nucleic acid fragment with a homology of less than 2, preferably 1 and most preferably no base exchanges.
In a third aspect, the invention relates to the use of an at least partially double-stranded adapter nucleic acid molecule, particularly an at least partially double-stranded adapter nucleic acid molecule according to the above aspect or embodiments of the invention, in an amplification reaction is provided, wherein the at least partially double-stranded adapter nucleic acid molecule particularly comprises one or more universal and/or convertible base in a primer binding region, wherein the sequences of the converted primer-binding sites have a lower melting temperature when hybridised to each other than with the cognate primer; and a ligation site positioned on a first end of said adapter configured to allow ligation of said adapter to a compatible end of a double stranded nucleic acid fragment in an amplification reaction of a nucleic acid.
In a fourth aspect, the invention relates to a kit for use in preparing a library of asymmetrically tagged nucleic acid molecules comprising:
In certain embodiments, the kit may also comprise at least one converting reagent, wherein the converting reagent is an enzyme, chemical or photochemical reagent.
In certain embodiments, the kit may also comprise at least one converting reagent, wherein the converting enzyme is a nucleic acid repair enzyme, an acetyl esterase and/or penicillin G acylase.
In certain embodiments, the kit may also comprise at least one converting reagent, wherein the converting nucleic acid repair enzyme is an alkylation damage repair enzyme, preferably selected from one of the following: AlkA, AlkB, Ada, Aid and Ogt.
In certain embodiments, the kit may also comprise at least one polymerase with a bias for cytosine incorporation opposite of universal bases, preferably opposite of an abasic site or hydrophobic stacking bases.
In certain embodiments, the kit may also comprise at least one blocked primer and at least one unblocking enzyme, wherein said unblocking enzyme is preferably an RNAseH or endonucleaseIV.
In a fifth aspect, the invention relates to a method of producing an asymmetrically tagged nucleic acid fragment for bisulfite sequencing comprising the following steps:
Ligase
The term “ligase” used in its broadest sense refers to an enzyme belonging to the category EC: 6.5
(Enzyme Commission number 6.5) that is able to join ends of nucleic acids. Ligases may utilise nucleotide triphosphates, nicotinamide adenine dinucleotide (NAD), ADP-ribosylated 5′-ends, or cyclic 2′,3′ phosphates at 3′ ends for joining the ends of nucleic acids. Also truncated or otherwise mutated variants of ligases such as that are only able to ligate pre-activated ends such as 5′-adenylated termini are within the scope of the invention. Preferred are ligases that are able to ligate sticky ends or single base overhangs. Particularly preferred is the DNA ligase of bacteriophage T4.
Topoisomerase
The term “topoisomerase” refers to an enzyme belonging to the categories EC: 5.99.1.2 and 5.99.1.3. Preferably, the term refers to a type I topoisomerase (EC: 5.99.1.2). More preferably, the topoisomerase used for ligating an adapter to a nucleic acid fragment is a type IB topoisomerase. Most preferably, the enzyme is related to the vaccinia virus topoisomerase and recognises a 5′-(C/T)CCTT; target site for a site-specific cleavage. A suicide substrate bearing a terminal target site in double-stranded form with a single-stranded 3′ extension can be cleaved by the topoisomerase resulting in a covalent enzyme adduct which can be transferred to 5′ OH-groups of double-stranded nucleic acid fragments.
Polymerase
The term “polymerase” used in its broadest sense refers to an enzyme belonging to the category EC: 2.7.7 (Enzyme Commission number 2.7.7) that is able to catalyse an extension of the 3′-end of a nucleic acid strand by one nucleotide at a time. Preferred are DNA-directed RNA polymerases (EC: 2.7.7.6), DNA nucleotidylexotransferases (EC: 2.7.7.31) and RNA-directed RNA polymerases (EC: 2.7.7.48). More preferred are DNA-directed DNA polymerase (EC: 2.7.7.7) and RNA-directed DNA polymerases (EC: 2.7.7.49). Most preferred are polymerases that are able to maintain their activity at elevated temperature such as Thermus aquaticus (Taq) DNA polymerase, or Thermus thermophilis (Tth) DNA polymerase, Pyrococcus furiosus (Pfu) DNA polymerase, Thermococcus sp. (9° N) DNA polymerase, Thermococcus gorgonarius (Tgo) DNA polymerase, Thermococcus litoralis DNA polymerase (Vent), truncations and domain fusions thereof. Especially preferred are mutant enzymes thereof which are able to copy universal bases like inosine, 8-oxo-guanine and/or can efficiently read through lesions such as abasic sites and may display different bias for base incorporation such as the “C” rule. In addition, natural polymerases that have translesion synthesis activity such as Family Y polymerases or primases are preferred polymerases to be used in combination with more processive enzymes. It is preferred that polymerases involved in the copying step of a universal base lack a 3′-exonuclease and/or lyase activity.
Nucleic Acid
The Term “nucleic acid” refers to a polymer of nucleotides. The polymer may include natural nucleosides (i.e., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine), nucleoside analogs (e.g. 2-aminopurine, 2-aminoadenosine, 2-thiothymidine, inosine (hypoxanthine), pyrrolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5-propynyl-uridine, C5-propynyl-cytidine, C5-methylcytidine, 5-hydroxybutynyl-2′-deoxyuridine, 6-thioguanine, S6-methyl-thioguanine, 7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, O6-methylguanine, and 2-thiocytidine), chemically modified bases, biologically modified bases (e.g., methylated bases), intercalated bases, abasic sites, ribose sugars (RNA), 2′-deoxyribose sugars (DNA), terminal 3′-deoxyribose or 2′,3′-dideoxyribose sugars, modified sugars (e.g., 2′-fluororibose, arabinose, and hexose), or modified phosphate groups (e.g., phosphorothioates and 5′-N-phosphoramidite linkages). Furthermore, the backbone may include modified locked (LNA), unlocked (UNA), bridged (BNA), glycine (GNA) sugars, triazole-sugar (made by click chemistry), or a peptide backbone (PNA) or a mixture thereof.
Oligonucleotide
The term “oligonucleotide” refers to a nucleic acid with a length of 4 to 100 nucleotides. Oligonucleotide may be modified to include labels for detection or identification such as fluorescent dyes or radioactive isotopes, haptens for capture, detection, or immobilisation such as biotin or digoxigenin, or reactive groups such as hydroxyl, phosphate, sulfonate ester, thiol, alkyne, azide, or EDC, for immobilisation, crosslinking, or derivatisation.
Primer
The term “primer” herein may denote target-specific primer, each set comprising a forward and a reverse primer, random primers, degenerate primers, or random exonuclease-resistant primers depending on the number of nucleic acids to be amplified and the amplification method used. Primers can specifically anneal to their complementary target sequence under physiological buffer conditions. A free 3′-end may serve as a starting point for polymerases to elongate the primer strand using nucleotides as building blocks. However, primers may also comprise a blocked 3′-end which can be unblocked upon hybridisation for highly specific priming of target sequences. Primers herein also refer to low melting point primers, which usually have a length between 4 and 10 nucleotides with on average 6 nucleotides in length. The term “primer” also comprises a reaction mixture of nucleotides and a primase, which may be used to synthesize primers. Preferred primers bind specifically to their cognate targets at temperatures higher than 30° C. and have a length of about 10-100, more preferably about 15-50, most preferably about 18-40 bases. Primers may be prepared using any suitable method, such as, for example, the phosphotriester and phosphodiester methods or automated embodiments thereof. In one such automated embodiment diethylophosphoramidites are used as starting materials. It is also possible to use a primer which has been isolated from a biological source (such as a restriction endonuclease digest).
Primer-Binding Site
The term “primer-binding site” refers to a site within a nucleic acid strand to which a primer can specifically hybridise under stringent conditions. Preferably, the primer is bound to the primer-binding site by a sense-antisense interaction without base mismatches. In addition, the primer-binding site may also comprise one or more modified bases such as abasic sites which do not form hydrogen bonds.
Primer-Binding Region
The term “primer-binding region” refers to a region within an adapter comprising a sequence or base composition representing a primer-binding site, or give rise to a primer binding site by copying and/or conversion. The primer-binding region of an adapter may be single-stranded or double-stranded and may comprise more than one primer-binding site. Preferably, the primer-binding region comprises one or more universal or directly convertible base(s).
Nucleic Acid Triphosphate
The term “nucleic acid triphosphate” or “nucleotide” refers to any of the naturally occurring ribonucleotides (ATP, CTP, GTP, and UTP), deoxyribonucleotides (dATP, dCTP, dGTP, TTP, and dUTP), their derivatives, and combinations thereof. Derivatives may include but are not limited to base modifications of nucleic acid triphosphates such as 5-methyl-cytosine, 7-deaza-guanine, 5-bromo-uracil, 8-oxo-guanine, 2-aminopurine, N6-Methyl-adenine, Cy5-uracil, Cy3-uracil, Cy5-cytosine, and Cy3-cytosine. Furthermore, sugar modifications may include locked nucleic acids (L-NTP), unlocked nucleic acids, (U-NTP), 2′fluoro-NTP (2F-NTP), 2′O-methyl-NTP (2OMe-NTP), 2′azido-NTP, arabinose-NTP (ara-NTP), and dideoxides (ddNTP). Also phosphate modifications of nucleotides such as alpha-phosphorthioates may be included.
Universal Base
The term “universal base” refers to a base or analogue thereof which is able to pair with more than one of the four natural bases or does not lead to a mismatch which destabilises the helical structure of a nucleic acid duplex. A universal base may interact with its opposite base by hydrogen bonds, form hydrophobic stacking interaction with adjacent bases. In the case of abasic sites, no direct interaction with a base in the opposite strand is necessary. Base interactions can occur by Watson-Crick or Hoogsteen pairing. Preferred universal bases are nucleotides comprising inosine (hypoxanthine), xanthine, oxanine, 8-oxo-guanine, nebularine, 3-nitropyrrole, 5-nitroindole, 4-methylindole, O4-methyl-thymidine (O4mT), O4-ethyl-thymidine (O4eT), 6H,8H-3,4-dihydro-pyrimido[4,5-c][1,2]oxazin-7-one (P), N6-methoxy-2,6-diaminopurine (K), and 5-fluoroindole bases, pyrrolidine, and nucleotides with missing bases (abasic sites) such as dSpacer (1′,2′-dideoxyribose), and other spacers such as diethyleneglycol (DEG), and butanol (C4 spacer). More preferred are universal bases such as inosine, 8-oxo-guanine and abasic site, that can be copied by a polymerase with a preference for only one of the naturally occurring bases.
Convertible Base
The term “convertible base” or “directly convertible base” refers to a base or analogue thereof which can be directly converted without a copying step to another base with a different coding property. Natural bases such as cytosine and adenine can be converted by deamination to uracil and inosine. However, convertible bases are preferably modified and placed in predefined sequences or oligonucleotides such as adapters. Such specifically convertible bases may comprise groups masking one or more of the positions in a base that are able to form hydrogen bonds with an opposite base in Watson-Crick and/or Hoogsteen basepairings. Such masking groups may be simple alkyl groups and/or protection groups known from chemical synthesis. Preferably, the protection group for amine is an hydrazine (N2), resulting in an azido-group (—N3), for oxygen and sulfur groups an O- or S-azidomethyl group, respectively. N3-protection groups can be cleaved off in a Staudinger-reaction by strongly reducing reagents such as TCEP. Alternatively, the convertible base may have groups that can be selectively modified with masking groups. Yet another directly convertible base is a cleavable base that gives rise to an abasic site by conversion. Alternatively, an abasic site may comprise a modification at the sugar moiety that allows specific attachment of a base such as 1-ethynyl-dSpacer that can be reacted with an azide-bearing base analogue by click chemistry. The base conversion can be preformed enzymatically, chemically and photochemically. Preferably, the direct conversion reaction is compatible with enzymatic activities such as ligation or nucleic acid polymerase reactions and the respective buffers. Preferred directly convertible bases are nucleotides comprising O4-methyl-thymidine, O6-methyl-guanine, S4-methyl-thio-uridine, S4-methyl-thio-thymidine, S6-methyl-thio-guanine, N1-methyl-adenine, N3-methyl-adenine, C8-methyl-guanine, N2,3-etheno-guanine, O6-methyl-hypoxanthine, O4-ethyl-thymidine, O6-ethyl-guanine, S4-ethyl-thio-uridine, S4-ethyl-thio-thymidine, S6-ethyl-thio-guanine, N1-ethyl-adenine, N3-ethyl-adenine, C8-ethyl-guanine, O6-ethyl-hypoxanthine, S4-thio-uridine, S4-thio-thymidine, S6-thio-guanine, 1-ethynyl-dSpacer (abasic site with an alkyne group), and 1-thiol-dSpacer (abasic site with a thiol group).
The invention relates to a method and a kit comprising single or at least partially double-stranded adapters comprising sequences with base modifications that are ligated to nucleic acid fragments and converted to generate asymmetric ends for specific recognition. The method based on two main sequence conversion mechanisms, a direct conversion of specific bases and an indirect conversion by copying, wherein both mechanisms may be combined.
I) Indirect Sequence Conversion
The indirect sequence conversion mechanism for universal bases is based upon differential reading and/or annealing to bases that are incorporated in one or both of the ligated universal base adapter strands. Universal bases can bind to more than one base in the opposite strand without mismatch and can be converted to a different base by a polymerase or primer binding. For instance, abasic sites and other universal bases that do not form hydrogen bonds can be converted to A or C in the opposite strand depending on the polymerase. Furthermore, the bias of a polymerase can be forced toward incorporation of a desired base by changing the balance of nucleotides supplied in the reaction.
A convertible adapter with universal bases of which both strands are ligated to a nucleic acid fragment preferably comprises two annealed strands comprising a double-stranded region. The double-stranded region comprises at least a ligatable end. The first of the annealed strands comprises a primer binding site and a ligatable 5′ end. The primer-binding site may comprise one or more universal bases and thus may give rise to a new primer binding site after conversion in an primer extension reaction. The second annealed strand of the universal adapter may comprise a precursor primer binding site in the primer-binding region and a ligatable 3′ end. The second strand may comprise one or more universal bases in a precursor primer binding site which may also give rise to a new primer binding site after conversion by a polymerase. The primer-binding region of the convertible adapter comprises both the primer binding site in the first adapter strand and if present, the precursor primer binding site in the second strand. The two annealed strands may be joined at their ends by a linker or a nucleic acid loop. Such loops can be useful to generate a covalently closed double-stranded ligation product that can be amplified by a rolling circle mechanism or represent a template for single molecule sequencing according to the single molecule real time (SMRT) DNA sequencing technology. However, it is preferred to use two discontinuous strands that are annealed to form an at least partially double-stranded adapter according to the invention.
Universal bases are converted by two different copy mechanisms depending on their incorporation in one of the strands of a ligated adapter. The primer-binding strand is ligated to the nucleic acid fragment via its 5′ and has a free, preferably blocked and non-ligatable, 3′-terminus. The primer-binding site of the first strand is not copied by a polymerase, but converted to any sequence depending on the first primer that anneals to the first strand at the primer binding site of the adapter molecule (see
The universal base conversion can also be conducted by differently biased polymerases after ligation. A different incorporation bias for one or more universal base(s) can be exploited to generate different primer-binding sites at both ends of a ligated nucleic acid fragment in subsequent amplification cycle(s). This can be achieved by using polymerases with different incorporation bias in the first and following amplification steps. Ideally, such polymerases follow different rules in non-templated base incorporation that will convert abasic sites and/or universal bases with hydrophobic stacking interaction accordingly. Preferably, the first DNA polymerase is thermolabile and the second polymerase is a hot-start polymerase. For example, the first polymerase is human DNA polymerase beta (C-rule) and the second is a hot-start Taq DNA polymerase (A-rule). Thus, the first extension step is performed by the first polymerase at a lower temperature, and after heating all further amplification steps are performed by the second polymerase. Preferably, the first two extension cycles are preformed longer than in following cycles in order to allow efficient translesion synthesis to occur.
Convertible Adapter Design
Depending on the desired adapter sequences, different approaches can be chosen. In case that only one of the two asymmetric ends needs to yield a predefined sequence, the second strand is mainly converted to the desired sequence when copied and the primer binding strand of the adapter is sufficiently different from the copied second strand to allow specific second primer binding and extension. The primer binding strand of the adapter can be altered to incorporate modifications that tolerate base pairing or wobble which preferably a higher melting temperature than the same sequence of the upper strand and/or its copy.
In some cases, both ends of the ligated adapter need to result in predefined sequences to be compatible with existing formats such as those already used in next generation sequencers (see SEQ ID NO:1-7), or for cDNA transcription/expression using specific promoter and/or terminator sequences (see SEQ ID NO:8-11). One way to generate such defined sequences is to generate an adapter as described above for the first converted sequence and simply add the required secondary sequence as an extension. This extension can be made by fusing the sequence 3′ to the lower adapter strand, or by using a primer that comprises the desired region to be added by polymerase extension. Yet another approach can be chosen to yield defined ends in which a common, unchanged sequence is present that is directly ligated to the template. In order to allow differential amplification of the end by two different primers, the first primer binding strand comprises several modifications and a unique single stranded overhang that preferentially allows binding to only one primer under stringent conditions.
The universal base can be chosen depending on which sequences need to be generated after conversion. For instance, an inosine can be used in the double stranded region of the adapter as a matched I·A base pair that can be converted to a mismatching G (via incorporation of a C in the opposite strand) and A which does not pair at all. Slightly different choices can be made for oxanine or xanthine by matching the universal bases with T which can be converted to a mismatching G (via incorporation of a C in the opposite strand) and T. 8-oxoguanine can match with A and can be converted to yield a mismatching G·A. Other universal bases that use hydrophobic stacking rather than hydrogen bonds for pairing can be used as wildcards to replace any base but T or U when using a polymerase that follows the “A-rule”. Since some polymerases such as human DNA polymerase beta, follow the “C-rule”, any base but G can be replaced in a similar manner to yield converted sequences that mismatch in the primer binding site of the second strand. Polymerase mutants can be also selected for following a different rule. However, it is preferred that polymerases lack a proofreading function such as 3′ exonuclease and/or lyase activity. Thus, abasic sites can be used according to the rule of the polymerase to be used in extension and/or amplification. In contrast to other universal bases it is preferred to use only one abasic site in combination with neighbouring bases in order to keep a duplex structure under annealing and ligation conditions. Abasic sites should not be placed directly at the terminal base ligation site. Preferably, the abasic site is placed 2 bases and more preferably more than 3 bases away from the ligation site. Additional guidance for using universal bases for sequence conversion and compensating for lower annealing temperature is given in further detail below and in Table 1.
Requirements for Inosine
As a universal base, inosine can essentially pair with all bases. However, inosine pairs with the following bias: I˜C>I·A>I·T, I·G>I·I. As all polymerases incorporate C opposite of inosine, the adapter sequence can be exchanged at essentially all G to I and still yield the same sequence after conversion. Both strands can be modified with inosine and yield very distinct primer binding sequences for an efficient PCR. Preferably, the base opposite to inosine in the universal base adapter is exchanged to an A to preserve a double-stranded conformation while also maintaining a high melting temperature.
Requirements for Abasic Site
Abasic sites can form natural duplex structures without mismatches or bulges. Especially, pyrimidines opposite the abasic sites result in stable helical conformations. Most polymerases follow the so-called “A-rule” which yields incorporation of dATP opposite to abasic sites. Thus, it is preferred to incorporate a guanine opposite to an abasic site in the adapter in order to provide a sequence conversion, especially when copied by a polymerase. However, at temperatures typically used for ligation reactions, all natural bases are well tolerated opposite abasic sites. Still, berberine alkaloids, flavines or other abasic site-binding small molecules can enhance the duplex structure and elevate melting temperature. This also applies to purines in the opposite strand. As it is known that berberine and related molecules do not inhibit polymerases, they can be used in the amplification reaction and enhance sequence conversion of abasic sites without introducing deletions.
Compensating for Low Melting Temperature
Although it is preferred to use bases with higher melting temperatures in the opposite strand of universal bases, it is possible to compensate for lower affinities or melting temperatures. It is known that different nucleic acid backbones result in different melting temperatures even with the same base composition. For example, PNA>LNA>RNA>DNA is the general order melting temperatures of different nucleic acids annealed to a given DNA sequence. As PNA cannot be copied by a polymerase and LNA requires specific polymerase mutants, it is preferred to restrict such backbone modifications to the primer binding site of the primer binding strand of the adapter. In addition, some modified bases can be used to increase the melting temperature as well. Non-limiting examples are 2,6-diaminopurine (2-amino-dA), 5-methyl dC, and Super T (5-hydroxybutynyl-2′-deoxyuridine). These base modifications are faithfully copied by most polymerases and can therefore be incorporated instead of the natural bases in both of the adapter strands, preferably adjacent to a potentially destabilising universal base.
II) Direct Sequence Conversion
Universal bases can be introduced after the ligation step by site-specific modification of bases in the primer binding region. For example, uracil can pair with adenine (U·A) in a duplex which is recognised by Uracil-DNA glycosylase, also known as UNG or UDG, leading to excision of the uracil and generation of an abasic site. By using a polymerase not following the “A-rule”, a different pair other than T·A will be generated by conversion. The deglycosylated uracil site (abasic site) can also be used in the first strand of the universal base adapter to bind to any base in a hybridised primer. Many other bases that specifically pair with opposite bases can be excised specifically to yield abasic sites. However, it must be considered that this may also convert modified bases present in the ligated nucleic acid fragment of a given sample. It is also possible to utilise a chemically or photocleavable base which is removed after ligation. For example, it is known that alkylated bases such as N7-alkylguanine are much more susceptible to deglycosylation than natural bases.
Alternatively, a base is directly converted to a different base by an enzyme, a chemical or photochemical reaction. A preferred approach is to differentially expose groups of a base capable of forming hydrogen bonds with other bases by either Watson Crick or Hoogsteen interactions in order to alter the coding properties. This can be achieved by using enzymes that specifically recognise and change groups of the base such as activation-induced cytidine deaminase (AID) and APOBEC1 that convert cytosine into uracil.
Yet another, more specific enzymatic conversion is possible using O4-methylthymidine (O4mT) or O4-ethylthymidine (O4eT) which can pair with G and is preferably read as a C by polymerases. It has been discovered that O4eT is more useful as it is more efficiently read as a C despite a lesser bypass efficiency by most polymerases. Another useful alkylated base according to the invention is O6-methylguanine (O6mG) which pairs well with the natural bases T or U and is read as a A by polymerases. The alkyl-groups can be removed enzymatically using Ogt or Ada methyltransferases in order to convert O4mT or O6mG into a normal bases (T and G, respectively). Especially the Klenow fragment mutant M747K of Taq DNA polymerase (KlenTaq M747K) is preferred for efficient incorporation of a non-cognate base opposite of an alkylated base and may also be chosen to convert alkylated thiol-bases disclosed below. In addition, Vent exo- is also able to preferentially incorporate T opposite of O6mG even at 37° C. Also Bacillus stearothermophilus DNA polymerase I large fragment (Bst) efficiently prefers incorporating C vs. T opposite of O6mG.
However, it is preferred to use a chemical or photochemical method to convert bases by differential exposure of functional groups. One example for chemical conversion of bases applied in so-called bisulfite sequencing. The nucleic acid in a sample is exposed to bisulfite to determine its pattern of methylation. Treatment of DNA with bisulfite converts cytosine residues to uracil, but leaves 5-methylcytosine residues unaffected. Thus, bisulfite treatment introduces specific changes in the DNA sequence that depend on the methylation status of individual cytosine residues, yielding single-nucleotide resolution information about the methylation status of a segment of DNA. Surprisingly, it was discovered that it is possible to design a perfectly matched adapter molecule that can be ligated to nucleic acid fragments and treated with bisulfite to yield new sequences useful for generating an asymmetrically tagged nucleic acid fragment. By using 5-methyl C (mC) in one strand of the adapter, the sequence remains unaltered, whereas the all C are converted to A in the other strand. Preferably, an adapter suitable for bisulfite sequencing comprises at least two non-methlyated cytosines. By introducing I in the non-5-methyl C strand, preferably opposite A, the lower melting temperature by C to A conversion can be compensated. Alternatively, other modifications for higher melting temperature can be employed as well. Such specifically designed directly convertible adapters provide the advantage that only completely converted nucleic acid fragments are amplified. Thus, incompletely bisulfite-treated nucleic acid fragments that may cause false positives in methylation assignment can be avoided.
Another more adapter-specific modification for post-ligation base conversion can be made by incorporating 6-thioguanine (6-tG) pairing with C (6-tG·C) in an adapter which can be methylated preferably after ligation to form S6-methylthioguanine (m6-tG). Specific 6-tG methylation can be achieved chemically using S-adenosylmethionine or other alkylating agents. The methyl-group of me6-TG acts as a protection group that hinders the formation of a hydrogen bond with an cytosine.
As polymerases incorporate T opposite of me6-tG, a specific base exchange can be introduced after conversion (m6-tGT versus G·C) to generate an asymmetrically tagged nucleic acid fragment according to the invention. Alternatively, the alkylated form of 6-tG (preferably with cyanoethyl-group) is used for coding first and deprotected by NH4OH in the presence of NaSH. However, it is preferred to use a protection group forming a disulfide bond with the 6-thiol group. Preferably, the thiol group is reacted with N-((3-chloropropyl)thio)phthalimide to attach a —SCH2CH2CH2Cl protection group. The protection group can be removed with reducing agents such as TCEP, DTT or mercaptoethanol which are compatible with many enzymes including polymerases.
The commercially available building block 4-thio-uridine (4-tU) can be treated in basically the same manner as described above for 6-tG. Thus, the uridine derivative can also be switched from a C-analogue to a T-analogue and back depending on the reaction conditions. However, 4-tU (and 6-tG) can also be cross-linked with a reactive group upon irradiation in the UVA range (315-400 nm). In the presence of oxygen it can react to form guanine-6-sulfinate and guanine-6-sulfonate or form crosslinks with adjacent or opposite bases. Although crosslinks can be partially avoided by reducing agents, it is preferred to use deprotection of groups forming disulfide bonds to change the coding properties of thiolated bases. Typical protection groups for bases in oligonucleotide synthesis are N-benzoyl (Bz), N-acetyl (Ac), N-isobutyryl (iBu), N-phenoxyacetyl (PAC) and N-tert-butylphenoxyacetyl (tBPAC), of which Ac and iBu are preferred as less interfering in functional base pairing. Many ways for base deprotection are known. Aqueous methylamine effectively cleaves all of these protecting groups from the exocyclic amines. Ethanolic ammonia shows the highest selectivity between standard protecting groups (Ac, Bz, iBu) and fast-deprotecting groups (PAC, tBPAC). In addition, some of these protection groups can be cleaved off enzymatically. For example, N-phenylacetyl-groups can be removed by acetyl esterase or penicillin G acylase.
Some examples for useful base conversions are listed in Table 2 based on naturally occurring examples. However, many protection groups are known which can mask one or more functional groups of a base. Especially oxygen groups may be simply protected by azidomethyl groups that can easily be cleaved off by TCEP. Notably, photochemical cleavage of sensitive protection groups is a favoured method to change the coding properties of a given base. The skilled person would be aware of methods for differentially exposing groups capable of forming hydrogen bonds in a base to alter the coding properties by enzymatic, chemical or photochemical treatment.
In order to increase the efficiency of asymmetric end generation by post-ligation base conversion, the adapter may comprise one or more universal base(s) in the primer binding region before the ligation.
Ligation Methods
The universal base adapter may be blunt ended at the ligatable end. In this case, the nucleic acid fragments to be ligated have to be generated with blunt termini as well. This can be achieved by 3′ polymerase fill in reactions, 3′ single-strand digests or by restriction enzymes that generate blunt ends. The person skilled in the art will be familiar with multiple protocols to generate nucleic acid fragments with blunt ends from nucleic acid samples for subsequent ligation. However, it is preferred to generate nucleic acid fragments with sticky ends that are much more efficient in ligation with complementary overhangs at adapter ends. Single base overhangs can be introduced to blunt ends by using the non-templated base-adding activity of polymerases such as Taq DNA polymerase or Klenow fragment exo-. Polymerases following the A-rule most efficiently add single-base A-overhangs. Such A-tailed nucleic acid fragments can be ligated to adapters with single-base T or U overhangs. However, restriction enzymes can also be used to generate defined or even random overhangs that can be ligated to complementary overhangs at adapters.
Universal base adapter ligation is preferably performed using a ligase such as the T4 DNA ligase. Alternatively, the adapter can be reacted with a site-specific topoisomerase, preferably by a Vaccinia virus topoisomerase. In order to generate an universal base adapter with a 3′ covalently linked topoisomerase, it is preferred to introduce an enzyme-specific target site outside the primer-binding region in the adapter. By cleavage at the target site a single-stranded overhang is released and the enzyme becomes covalently linked at the ligatable end at the 3′ end. The methods how to generate such topoisomerase adapters for ligation are well known to the skilled person, especially those based on Vaccinia virus topoisomerase. The nucleic acid fragment to be ligated may be blunt-ended or comprise short 3′ overhangs. Importantly, the 5′ end of the nucleic acid fragment has to remain unphosphorylated or otherwise unblocked for ligation to the 3′-end of the ligatable end of the universal base adapter. As the topoisomerase ligates only one of the two strands of the universal base adapters, the other strand may need to be ligated by another method. In order to fill potential gaps between the 5′ end of the first strand of the adapter and the 3′ end of the nucleic acid fragment, a polymerase without strand displacing activity and preferably without 3′ exonuclease activity may be used. A non-limiting example for a suitable DNA polymerase to filling such gaps is T4 DNA Polymerase. The remaining nick can be sealed by a ligase provided that the 3′ end of the nucleic acid fragment is unphosphorylated and the 5′ end of the adapter is phosphorylated.
Alternatively, the base conversion is performed with an adapter requiring only one adapter strand to be ligated as disclosed above for post-ligation base conversion methods. This allows skipping the enzymatic filling and nick sealing reactions.
Transposases have been applied previously for adapter ligation by strand invasion. Especially useful transposases are the Tn5 transposase and the bacteriophage Mu transposase which are commercially available for ligating tags to nucleic acid fragments for amplification and/or sequencing.
Transposases recognise specific double-stranded target sites such as 5′-AGATGTGTATAAGAGACAG-3′ for Tn5 transposase that are bound to the enzyme. The adapter for transposase ligation comprises the double stranded target site and a 5′ extension thereto comprising a primer-binding region. In the case of Tn5 adapter, it is preferred that the 5′ extension comprises a sequence which can be converted to yield 5′-GCCTCCCTCGCGCCATC-3′ and optionally 5′-GCCTTGCCAGCCCGCTC-3′ by using universal base incorporation and/or post-ligation base conversion. Alternatively, the transposase target-sequence itself can be modified to include universal bases or convertible bases in order to generate asymmetric ends. Known variable bases in the target-sequence can be used to generate a new adapter wherein the ligation site may completely overlap with the primer-binding site.
Another method for ligating adapters to given nucleic acid fragments is to use enzymatic recombination. However, target-sequences for recombinases are non-random and may severely compromise a statistical adapter ligation approach. Instead, it is preferred that a recombination site in an adapter is used for circulation of ligated nucleic acid fragments for rolling circle amplification or mate-pair sequencing.
As target sites for topoisomerases, transposases and recombinases become directly attached to the nucleic acid fragment by enzymatic ligation, they can also be considered as ligation sites or ligatable ends.
Additional Functional Sequences in an Adapter
Other than the above mentioned primer-binding sites, topoisomerase target sites, transposase target sites and recombination sites, it is preferred that tags or barcodes are introduced as well. Tags or barcodes represent short sequences (less than 50 bases, preferably less than 20 bases) that can be used to assign a given sequence to a specifically tagged sample. This allows for economic multiplexing of samples in sequencing reactions. In some cases, random sequences may be used instead of defined sequence tags as well.
So-called key sequences may also confer the advantage to evaluate the correctness of library generation and estimation of the correct start of the ligated nucleic acid fragment in sequencing.
For transcription and expression of nucleic acid fragments it is required to have promoter sites for specific RNA polymerase binding and initiation. Preferred RNA polymerases include bacteriophage T7 RNA polymerase, bacteriophage SP6 RNA polymerase and cyanophage Syn5 RNA polymerase and their respective promoter sequences (SEQ ID NO:9-11).
Further Important Amplification Parameters
Since some universal bases and directly convertible bases may lower the annealing temperature in a primer binding site and cannot always be rescued by compensating base modifications in adjacent bases, the annealing of the first primer at the first primer-binding site may be lower than in later amplification cycles. In order to optimise the amplification it is recommended to identify the specific annealing temperature empirically. After conversion, conventional primer annealing temperature calculations may be sufficiently correct. Thus, the sequences of the converted primer-binding sites will have a lower melting temperature when hybridised to each other than with the cognate primer. Preferably, the melting temperature of the converted primer binding sites is at least 2° C. lower, more preferably 5° C. lower, and most preferably lower than 10° C. for the non-cognate versus cognate primer in order to achieve a highly specific priming in amplification and sequencing reactions.
In case only one or more bases are exchanged in the primer binding site, these are preferably exchanged at or close to the 3′ end of the respective primer binding site. Several priming methods are known in the art that can result in specific priming despite little melting temperature differences. Generally, mismatch types within the three bases closest to the 3′end affect specificities of primers. In the third base, C·A and T·G belong to weak destabilization strength mismatches. The mismatches G·A, T·C, T·T, and C·C located at the fourth base away from the converted site belong to the strong destabilization strength mismatches. According to the combination rules, polymorphic efficiency between T·T (mismatch in 3′end of primer, strong destabilization strength) and C·A (weak destabilization strength) are typically higher than A·A (mismatch in 3′end of primer, medium destabilization strength) and C.A. However, oligonucleotide synthesis may not always be absolutely correct and can produce artefacts that may lead to amplicons without any nucleic acid fragment inserts. Furthermore, in order to make amplification more efficient by preventing end loop formation, it is preferred to add different 5′ extensions to the primers that are not present in the adapter. In addition, mispriming can be avoided by providing a blocking moiety such as a dideoxy-terminated base at the 3′ end of the primer that prevents extension. Only upon correct annealing, the blocking moiety is removed and the primer can be extended on the cognate template. Non-limiting examples for such template-specific unblocking is the introduction of cleavable moieties at the base(s) that differ(s) in the primer binding sites. Especially nucleic acid repair enzymes are suitable. For instance, a thermostable RNAseH (more specifically RNAseH2) and a single RNA base in the 3′ region of the blocked primer can only cleave and unblock the primer if the primer is fully annealed. In addition, endonuclease IV can cleave 5′ from an internal C3 spacer and that 3′-exonuclease activity of a thermostable polymerase can cleave off a 3′-C3 blocker when 3′ends of primers form a perfect Watson-Crick pairing with the templates. Similar approaches have been established for other modifications as well. However, it the proper enzyme should be chosen in order not to cleave any of the modified base(s) in the primer-binding region of the adapter.
Workflow for Adapter Ligation and Conversion
An example workflow for adapters with a single-stranded primer-binding region comprising directly convertible bases comprises several steps starting with a ligation step (see
An example workflow for adapters with a single-stranded primer-binding region comprising universal bases can be performed by using at least two polymerases with different conversion bias (see
By ligating two strands of the adapter to both ends of a double-stranded nucleic acid fragment, less steps are necessary for conversion. As disclosed for single-stranded adapter ligation above, different mechanisms for conversion are possible.
For example, a direct conversion method for adapters with a double-stranded primer-binding region comprising directly convertible bases may comprise several steps starting with a ligation step for attaching both strands to the double-stranded nucleic acid fragment (see
An example for an indirect conversion method for adapters with a double-stranded primer-binding region comprising indirectly convertible bases may also comprise several steps starting with a ligation step for attaching both strands to the double-stranded nucleic acid fragment. Depending on the placement of universal base(s), the conversion is not performed by polymerase alone, but also includes conversion by primer hybridisation and elongation (see
In addition to these examples, many combinations are possible such as using both convertible and universal bases and even polymerases with different bias for base incorporation. However, based on the disclosed properties of universal bases and convertible bases and guidance for their incorporation in adapter strands, the person skilled in the art will know how to design convertible adapters to arrive at a protocol for generating asymmetrically tagged nucleic acid fragments.
Applications
The main application for universal base adapters according to the invention is ligation to essentially unknown nucleic acid fragments. Such fragments may be derived from living or dead organisms, or of synthetic origin. Preferred are nucleic acid samples of human origin to be used in diagnostics, theranostics and forensics. Libraries generated by universal base adapter ligations can serve as templates for nucleic acid amplification, transcription and translation. A library can be sub-cloned into a vector or introduced into a genome for in vivo studies or more preferably used for sequencing.
Convertible Adapter Design
A standard adapter for sequencing was chosen as a template to design a convertible adapter according to the invention. The conventional IonTorrent adapter comprises two different double-stranded subunits to generate asymmetrically tailed nucleic acid libraries:
Incorporation of inosine (I) instead of G does not lead to a sequence change of a template in PCR. I can pair with A and therefore it is preferred to exchange G in the Adapter A sequence to I while exchanging the opposite C to A. A ligation with T-A overhangs is preferred over blunt ends due to higher efficiency. The following sequence was designed to meet the requirements for a convertible adapter:
Melting temperatures were calculated based on standard values using the online tool ‘OligoAnalyzer 3.1’ 2016 (http://eu.idtdna.com/calc/analyzer). Surprisingly, the melting temperature of the sequence comprising inosine does not rate much lower than the unmodified sequence comprising G:
The calculated melting temperature of the converted strands is:
Primer A′ can bind to the newly converted primer-binding site:
Primer A and Primer A′ cannot form dimers and can bind specifically to their respective primer binding sites.
The ISP beads may comprise the entire P1 sequence. However, the adapter does not require the entire sequence as it can be added during amplification using a tailed primer. Thus, a truncated sequence Delta5-P1 is proposed:
Thus, a chimeric primer can be used to amplify the convertible I-adapter for merging the sequence with the P1 sequence for ISP beads:
Convertible Adapter Ligation and PCR
Experiments were conducted using a PCR fragment as a ligation template. The above described adapters were ligated to the fragment, ligation efficiency was evaluated by capillary electrophoresis and PCR and compared with the commercially available adapters in the IonXpress kit from Thermo Fisher.
Material and Methods
(1) A-Tailing of Template DNA Fragments
A 5′-phosphorylated 200 bp PCR fragment was used as a template. The A overhangs were generated by using 1 μg template DNA together with 2.5 u DreamTaq DNA Polymerase (Thermo Fisher), 2.5 μl DreamTaq DNA polymerase buffer, 0.02 mM dATP (final concentration) and adjusted to 25 μl volume with PCR grade water. The sample was incubated at 72° C. for 20 min. The product was purified using the Agencourt AMPure XP System according to manufacturers protocol.
(2) Convertible Adapter Hybridisation
Hybridisation of the convertible IA adapter was performed by adding matching molecular amounts of Barcode Adapter oligonucleotide IA+T and Barcode Adapter oligonucleotide IA′. A the sample was adjusted to a concentration of 20 pmol/μl and incubated at room temperature for 10 min. The same was performed for Adapter A-T and Adapter P1-T at a concentration of 25 pmol.
(3) Ligation of Adapters to Template DNA
A ligation sample was set up with 2 u T4-DNA ligase (Thermo Fisher), 2 μl 10×T4 ligase buffer and 2 μl PEG. 20 pmol convertible IA adapter (25 pmol for each A and P1 adapters were added in the control) and 100 ng A-tailed template DNA. The sample volume was adjusted to 20 μl using PCRgrade water and incubated for 1 h at 20° C. and heated for 10 min at 65° C.
(4) Purification of Ligation Product
The Agencourt AMPure XP system was applied according to manufacturers protocol and 2 μl of the ligation product was analysed by capillary electrophoresis (Fragment Analyzer, AATI) using the High Sensitivity NGS Fragment Analysis kit according to manufacturers protocol.
(5) PCR of Ligation Products and Analysis
1 μl ligation product was used as a template in a PCR sample with 1 u DreamTaq DNA polymerase, 2.5 μl 10× DreamTaq DNA polymerase buffer, 0.2 mM dNTP (final concentration), 0.4 μM of respective primers and volume was adjusted to 25 μl by adding PCRgrade water. The Primer A and Delta5P1A′ pair were used for the convertible IA adapter, and primer minA and primer minP1 pair for IonTorrent adapters. The PCR for convertible adapter was conducted using the following protocol: 95° C. 3 min; 4× [95° C. 30 sec, 56° C. 30 sec, 72° C. 30 sec]; 10× [95° C. 30 sec, 58° C. 30 sec, 72° C. 30 sec]; 72° C. 5 min. The PCR protocol for the IonTorrent adapter was 95° C. 3 min; 14× [95° C. 30 sec, 58° C. 30 sec, 72° C. 30 sec]; 72° C. 5 min. PCR analysis was performed by capillary electrophoresis using the dsDNA 905 Reagent kit according to manufacturers protocol.
Results
According to capillary electrophoresis, the ligation efficiency of the convertible adapter was 3-fold higher than the IonTorrent adapters with T-overhang (see
Efficiency of Convertible Adapters
In order to determine the efficiency of the convertible adapter versus the standard adapter, ligation experiments were made. The protocol followed the steps (1) to (4) of the previous chapter for the convertible adapter and steps (2) to (4) for the standard adapter with varying PCR fragment concentrations. The absolute concentrations were determined by droplet PCR using a QX200™ Droplet Digital™ PCR System (Bio-Rad). The resulting values for the standard (blunt end) adapter, convertible adapter with both blunt and T-overhangs were compiled in
Preparation of Sequencing Library
Genomic DNA from an E. coli strain was used to prepare a shotgun sequencing library. The workflow comprises typical steps such as genomic DNA fragmentation using ultrasound, end repair and adapter ligation. The convertible adapter requires ligation of both strands which was facilitated by T/A overhangs to prevent adapter dimer formation. Standard adapter was used without any barcode as provided by the IonXpressPlus gDNA Fragment Library Preparation Kit. The convertible adapter was applied comprising a barcode for distinction in a sequencing experiment. The individual steps for general DNA preparation and the individual ligation with the different adapter formats are outlined in the materials and methods section below.
Materials and Methods
Oligonucleotides for preparation of convertible adapter IA+T with barcode (* designates phosphorothioate linkages):
Genomic DNA Preparation Steps:
(1) Fragmentation
The input for fragmentation was 3 μg genomic E. coli DH10B DNA. A use Bioruptor was chosen for fragmentation (3×15 min ultrasonic treatment)
(2) Purification
Purification was performed with PureLink PCR Purification Kit (Thermo Fisher Scientific) (3) End repair
End repair was performed with 1 μg fragmented DNA with End-Repair-Enzyme and End-Repair-Buffer, reagents are part of IonXpressPlus gDNA Fragment Library Preparation Kit (Thermo Fisher Scientific)
(4) Purification
1.8× volumes AMPureXP beads (Beckman Coulter) were applied for DNA purification and the yield of end-repaired DNA was 970 ng.
Convertible Adapter Library Generation Steps:
(5) A-Tailing
An amount of ˜460 ng end-repaired fragmented DNA of step (4) was applied for A-tailing with Klenow-Fragment exo—(NEB).
(6) Purification
1.8× volumes AMPureXP beads (Beckman Coulter) were used for purification and the yield of A-tailed DNA was determined as 312.5 ng.
(7) Convertible Adapter Ligation
50 ng A-tailed DNA was mixed with 75 pmole convertible adapter, 0.4 Unit T4 DNA Ligase, 2 μl reaction buffer, 2 μl PEG and 13.1 μl H2O. The mix was incubated for 20 min @25° C.
(8) Size Selection
AMPureXP beads (Beckman Coulter) were used for a two sided size selection by which small and large fragments were sequentially depleted. The first bead selection was performed with 0.7× volumes, and the second bead selection 0.15× volumes. Elution was performed with 30 μl LowTE buffer.
(9) Ligation Product Amplification
2 μl ligation product of step (8) were mixed with 2.5 μl 10× DreamTaq Buffer, 0.5 μl dNTPS (10 mM), 1.5 μl SeqPrimerSet (10 μM), 1 Unit DreamTaq DNA Polymerase and 18.3 μl H2O (Thermo Fisher Scientific). Amplification was performed by PCR as follows: 1 min @ 98° C., [15 sec @ 98° C., 15 sec @ 61° C., 15 sec @ 72° C.]×15, 5 min @ 72° C., hold 4° C.
(10) Purification
For final purification, 1.0× volumes AMPureXP beads (Beckman Coulter) were applied. A total yield of 76 ng library was observed.
Standard Adapter Library Generation Steps:
(11) Library Generation
A total of 50 ng end-repaired fragmented DNA from step (4) was applied for Library preparation using the IonXpressPlus gDNA Fragment Library Preparation Kit (Thermo Fisher Scientific). The 50 ng DNA was mixed with 10 μl 10× Ligase Buffer, 2 μl Adapters, 2 μl dNTP Mix, 51 μl Nuclease-free Water, 2 μl DNA Ligase, 8 μl Nick Repair Polymerase and incubated for 15 min @ 25° C. and 5 min @ 72° C.
(12) Size Selection
AMPureXP beads (Beckman Coulter) were used for two sided size selection. The first bead selection was performed with 0.7× volumes and the second bead selection with 0.15× volumes. Elution was achieved with 30 μl LowTE.
(13) Ligation Product Amplification
2 μl ligation product from step (12) were supplied with 100 μl Platinum PCR SuperMix High Fidelity, 5 μl Library Amplification Primer Mix and 23 μl H2O; reagents are derived from IonXpressPlus gDNA Fragment Library Preparation Kit (Thermo Fisher Scientific). The PCR amplification protocol was: 5 min @ 95° C., [15 sec @ 95° C., 15 sec @ 58° C., 1 min @ 70° C.]×15, hold 4° C.
(14) Purification
1.0× volumes of AMPureXP beads (Beckman Coulter) were applied for purification and the total yield was determined as 9 ng.
Final Preparation Steps Before Sequencing:
(15) Library Evaluation
The library size and concentration was determined by capillary gel electrophoresis. The concentration of the standard library was 1869, 6 pmole/l and the convertible adapter library was 18186, 791 pmole/l.
(16) Final Sequencing Library Generation Step
Both libraries were mixed equimolar and diluted to a total of 100 μM for sequencing by an IonTorrent PGM sequencer.
Results and Discussion
The size distribution of both library preparations after PCR is shown in
The standard sequencer software sorted the reads according to the adapter used on the basis of the barcode. The reads were mapped against the reference genome E. coli DH10B. It is obvious that the use of the convertible adapter IA+T (
The obtained read lengths were evaluated as well as shown in
Taken together, the sequencing of the convertible IA+T adapter demonstrates the very high efficiency of library generation and otherwise identical performance with respect to the standard adapter in sequencing experiments.
Library Generation with Universal Bases in Only One Adapter Strand
The power of conversion is sufficient for sequencing library generation even if only one strand is modified by universal bases. As an example, inosines were incorporated in the template (PF_Ad142v2BC4_r) strand, whereas the primer binding (PF_Ad142v2BC4_f) strand was kept identical to the adapter A sequence. This new convertible adapter IN2 comprising a barcode was designed to be ligated to blunt ended fragments.
The PCR of the new IN2 adapter was performed with the standard A primer in combination with the primer binding to the converted sequence (PF_Apl_prim).
The convertible adapters IA and IN2 were compared by ligation to a blunt ended PCR fragment according to the protocol of chapter “Convertible adapter ligation and PCR” steps (2) to (5).
The efficiency of the adapter with one inosine modified strand is comparable to the adapter with inosines in both strand (
Number | Date | Country | Kind |
---|---|---|---|
16188292.3 | Sep 2016 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2017/072862 | 9/12/2017 | WO | 00 |