The present invention relates to nucleic acid chemistry. In particular, the invention relates to methods for sequencing nucleic acids that have an unnatural base pair.
Watson-Crick base pairings, A-T and G-C, are among the most fundamental rules defining not only the central dogma of all living organisms on Earth but also current genetic engineering technology. However, this exclusive base pairing rule limits further advancements in biotechnology, because relying on only a four-letter genetic alphabet restricts the functionalities of nucleic acids and proteins. To overcome this limitation, genetic alphabet expansion of DNA by creating extra artificial base pairs (unnatural base pairs, UBPs) has attracted researchers' attention.
Recently, several types of UBPs that function as a third base pair in replication, transcription and/or translation have been created. Among them, Ds-Px (Ds: 7-(2-thienyl)-imidazo[4,5-b]pyridine and Px: diol-modified 2-nitro-4-propynylpyrrole) pair and P-Z pair have been subjected to an evolutionary engineering method, SELEX (Systematic Evolution of Ligands by EXponential enrichment), to generate unnatural base-containing DNA (UB-DNA) aptamers that specifically bind to target proteins and cells. The hydrophobic Ds bases in UB-DNA aptamers play an important role in augmenting the aptamers' affinities to targets. Semi-synthetic bacteria have also been created by incorporating a series of their UBPs, including 5SICS-NaM. The bacteria with the expanded genetic alphabet can produce proteins containing unnatural amino acids.
These advancements in genetic alphabet expansion technology are rapidly increasing the demands for a DNA sequencing method involving UBPs. In particular, the UB-DNA aptamer generation by SELEX requires a sequencing method that can determine the sequences of each aptamer candidate containing UBs in an enriched library, which is a mixture of different sequences obtained after several rounds of selection and amplification procedures in SELEX. Previously, a modified Sanger sequencing method was developed for a single DNA clone containing Ds bases. In the modified Sanger sequencing method, Ds positions appear as a gap over the natural base peak patterns. This sequencing method has been used for not only UB-DNA aptamer generation but also the creation of semi-synthetic bacteria to confirm the UB positions. However, to perform this sequencing method, each aptamer candidate clone must be isolated from the enriched library. In other words, to perform the sequencing method in the art, it is necessary to know the Ds positions in advanced. If the position of the Ds bases are not known, the sequencing method in the art would not be able to sequence the UBPs-containing DNAs. Therefore, there is a need to provide an alternative method of sequencing UBPs-containing DNAs.
In one aspect, there is provided a method of sequencing a nucleic acid containing an unnatural base pair (UBP), comprising performing two or more replacement replication reactions wherein the nucleic acid is replicated using two or more intermediate of the unnatural base pair; sequencing the nucleic acid resulting from the replacement replication reactions; clustering the sequenced nucleic acid and identifying a candidate position of the unnatural base pair; determining a ratio of conversion of the intermediate to each one of a natural base pair at the candidate position of the unnatural base pair; comparing the ratio of conversion of the intermediate to a library of pre-determined conversion rate based on the sequences of one or more natural base pair adjacent to the candidate position of the unnatural base pair; wherein a substantial match of the ratio of conversion of the intermediate to a value in the library of the pre-determined conversion rate confirms the position of the unnatural base pair, thereby determining the sequence of the nucleic acid containing the unnatural base pair.
In some examples, the method comprises two replacement replication reactions.
In some examples, the two replacement replication reactions comprise performing a first replacement replication reaction wherein the nucleic acid is replicated using a first intermediate of the unnatural base pair; and performing a second replacement replication reaction wherein the nucleic acid is replicated using a second intermediate of the unnatural base pair.
In some examples, the two replacement reactions are performed concurrently, sequentially, and/or separately.
In some examples, the first intermediate and the second intermediate are different intermediate of an unnatural base pair.
In some examples, the intermediate of the unnatural base pair is selected from the group consisting of Pa′, Pa, Pn, and Px.
In some examples, the unnatural base pair is composed of a nucleobase selected from the group consisting of:
a Ds derivative:
wherein R and R′ each independently represent any moiety represented by the following formula:
wherein n1=2 to 10; n2=1 or 3; n3=1, 6, or 9; n4=1 or 3; n5=3 or 6; R1=Phe (phenylalanine), Tyr (tyrosine), Trp (tryptophan), His (histidine), Ser (serine), or Lys (lysine); and R2, R3, and R4=Leu (leucine), Leu, and Leu, respectively, or Trp, Phe, and Pro (proline), respectively.
In some examples, the natural base pair is composed of a nucleobase selected from the group consisting of A, G, C, U, and T.
In some examples, the nucleic acid is a DNA strand.
In some examples, the library of pre-determined conversion rate comprises a ratio of the conversion of an unnatural base pair to either one of a natural base pair.
In some examples, the library of pre-determined conversion rate comprises a ratio of the conversion of an unnatural base pair to either one of a natural base pair based on the sequence of one or more adjacent base pair.
In some examples, the replacement replication reaction further comprises replicating the nucleic acid using natural base pairs.
In some examples, the replacement replication reaction is a replacement polymerase chain reaction (PCR).
In some examples, the replacement replication reaction comprises
performing a first nucleic acid replication reaction using a first replication substrate containing an intermediate of the unnatural base pair to thereby replace the unnatural base pair with the intermediate of the unnatural base pair; and
performing a second nucleic acid replication reaction using a second replication substrate containing natural base pair to thereby replace the intermediate of the unnatural base pair with a natural base pair.
In some examples, the replacement replication reaction further comprises
replicating or amplification of the nucleic acid from the second nucleic acid replication reaction to thereby have a plurality of nucleic acid with natural base pair resulting from the second nucleic acid replication reaction.
In some examples, the sequencing is performed using deep sequencing method.
In some examples, the identifying the candidate position of the unnatural base pair comprises aligning the sequenced nucleic acid and determining a position that contains varying nucleobase.
In some examples, the ratio of conversion of the intermediate to each one of a natural base pair at the candidate position of the unnatural base pair is calculated using the formula:
% rA (at position i)=CR(A,i)=S(A,i)/[S(A,i)+S(G,i)+S(C,i)+S(T,i)]×100
where S(n, i) is the read numbers of sequences which has natural base n at position i.
In some examples, the substantial match of the ratio of conversion of the intermediate is a value that is within about 10% of the value in the library of the pre-determined conversion rate.
In another aspect, there is provided an apparatus for performing the method of any one of the preceding claims.
Exemplary embodiments of the invention will be better understood and readily apparent to one of ordinary skill in the art from the following written description, by way of example only, and in conjunction with the drawings, in which:
The creation of unnatural base pairs (UBPs) has rapidly advanced the genetic alphabet expansion technology of DNA, requiring a new sequencing method for UB-containing DNAs with five or more letters. The hydrophobic UBP, Ds-Px, exhibits high fidelity in PCR and has been applied to DNA aptamer generation involving Ds as a fifth base. The present disclosure describes a sequencing method for UBP (such as Ds-Px)-containing DNAs, in which the UBP (such as Ds-Px) bases are replaced with natural bases by PCR using intermediate UB substrates (replacement PCR) for conventional deep sequencing. The inventors of the present disclosure found that the composition rates (i.e. conversion rates) of the natural bases converted from the UBs (such as Ds) significantly varied (or is unique) depending on the sequence contexts around the UB (such as Ds) and one or more different intermediate substrates. Using the finding that the composition rate or conversion rate of natural bases converted from UBs (such as Ds) varies (or is unique) to the sequence context around the UB, the inventors of the present disclosure developed an encyclopedia (or library) of the natural-base composition (or conversion) rates corresponding to all of the sequence contexts for each replacement PCR method using different intermediate substrates. The inventors found that using the encyclopedia/library, the UBPs positions in DNAs can be determined by comparing the natural-base composition/conversion rates in both the actual and encyclopedia data (i.e. library data), at each position of the DNAs obtained by deep sequencing after replacement PCR.
Therefore, in one aspect, there is provided a method of sequencing a nucleic acid containing an unnatural base pair (UBP), comprising performing two or more replacement replication reactions wherein the nucleic acid is replicated using two or more intermediate of the unnatural base pair; sequencing the nucleic acid resulting from the replacement replication reactions; clustering the sequenced nucleic acid and identifying a candidate position of the unnatural base pair; determining a ratio of conversion of the intermediate to each one of a natural base pair at the candidate position of the unnatural base pair; comparing the ratio of conversion of the intermediate to a library of pre-determined conversion/composition rate based on the sequences of one or more natural base pair adjacent to the candidate position of the unnatural base pair; wherein a substantial match of the ratio of conversion of the intermediate to a value in the library of the pre-determined conversion/composition rate confirms the position of the unnatural base pair, thereby determining the sequence of the nucleic acid containing the unnatural base pair.
In some examples, wherein the method further comprises a second replacement replication reaction wherein the nucleic acid is replicated using a second intermediate of the unnatural base pair. In some examples, the method may comprise two replacement replication reactions. In such examples, the two replacement replication reactions may comprise performing a first replacement replication reaction wherein the nucleic acid is replicated using a first intermediate of the unnatural base pair; and performing a second replacement replication reaction wherein the nucleic acid is replicated using a second intermediate of the unnatural base pair. As such, in some examples, the two replacement reactions may be performed concurrently, sequentially, and/or separately.
In some examples, the method of sequencing a nucleic acid containing an unnatural base pair (UBP) of the present disclosure may comprise performing a first replacement replication reaction wherein the nucleic acid is replicated using a first intermediate of the unnatural nucleobase; performing a second replacement replication reaction wherein the nucleic acid is replicated using a second intermediate of the unnatural nucleobase; sequencing the nucleic acid resulting from the first and second replacement replication reactions; clustering the sequenced nucleic acid and identifying a candidate position of the unnatural nucleobase; determining a first ratio of conversion of the first intermediate to each nucleobase of a natural nucleobase at the candidate position of the unnatural nucleobase; determining a second ratio of conversion of the second intermediate to each nucleobase of a natural nucleobase at the candidate position of the unnatural nucleobase; comparing the first ratio and the second ratio to a library of pre-determined composition rate based on the sequences of the natural nucleobases adjacent to the candidate position of the unnatural nucleobase; wherein a substantial match of the first ratio and the second ratio to the pre-determined composition rate confirms the position of the unnatural base pair, thereby determining the sequence of the nucleic acid containing the unnatural base pair.
In some examples, the present disclosure also provides a method of identifying the position of an unnatural base pair (UBP) in a nucleic acid sequence, comprising the steps as described above. For example, the method may comprise performing a first replacement replication reaction wherein the nucleic acid is replicated on a first template comprising a first intermediate of the unnatural base pair; performing a second replacement replication reaction wherein the nucleic acid is replicated on a second template comprising a second intermediate of the unnatural base pair; sequencing the nucleic acid resulting from the first and second replacement replication reactions; clustering the sequenced nucleic acid and identifying a candidate position of the unnatural base pair; determining a first ratio of conversion of the first intermediate to each base of a natural base pair at the candidate position of the unnatural base pair; determining a second ratio of conversion of the second intermediate to each base of a natural base pair at the candidate position of the unnatural base pair; comparing the first ratio and the second ratio to a library of pre-determined composition rate based on the sequences of the natural base pair adjacent to the candidate position of the unnatural base pair; wherein a substantial match of the first ratio and the second ratio to the pre-determined composition rate confirms the position of the unnatural base pair, thereby identifying the position of the unnatural base pair.
Conversely, the method as described herein may comprise three, or four, or five or more replacement replication reactions wherein the nucleic acid is replicated using a third intermediate, or a fourth intermediate, or fifth intermediate, or more intermediate of the unnatural base pair.
The use of the intermediate substrate of the unnatural base pair was found to be useful by the inventors of the present disclosure. For example, when replacement PCR is performed without an intermediate substrate of the unnatural base pair, the replacement PCR was found to have greatly reduced conversion efficiency (see
To provide an additional parameter that can be utilized to determine the sequence of a nucleic acid containing an unnatural base pair, in some examples, the one or more intermediate may be different intermediate of the same unnatural base pair. For example, the first intermediate and the second intermediate are different intermediate of an unnatural base pair. In some examples, where the unnatural base pair is composed of an unnatural base 7-(2-thienyl)imidazo[4,5-b]pyridin-3-yl group (i.e. Ds), the intermediate of the unnatural base may include, but is not limited to, Pa′, Pa, Pn, Px, and the like. The intermediate of are as follows:
wherein R may be any one of the following functional groups:
where R may be any one of:
or
a Pn derivatives, such as:
where R represents any moiety represented by the following formula:
wherein n1=1 or 3, n2=2 to 10, n3=1, 6, 9; n4=1 or 2, n5=3 or 6; R1=Phe, Tyr, Trp, His, Ser, or Lys; and R2, R3, and R4=Leu, Leu, and Leu, respectively, or Trp, Phe, and Pro, respectively; or
a Pa derivative such as
wherein R represents any moiety represented by the following formula:
wherein n1=1 or 3; n2=2 to 10; n3=1, 6, or 9; n4=1 or 3; n5=3 or 6; R1=Phe, Tyr, Trp, His, Ser, or Lys; and R2, R3, and R4=Leu, Leu, and Leu, respectively, or Trp, Phe, and Pro, respectively.
As would be appreciated by the person skilled in the art, Pn is R═H (no propynyl group/triple bond), 2-nitropyrrole; and wherein, Px is used for the derivatives with the triple bond.
In some examples, the intermediate may be provided as substrates suitable for replacement replication reaction (for example replacement PCR). In some examples, the intermediate may be a triphosphate substrate of an unnatural base pair. In some examples, the intermediate may be provided as substrates such as, but is not limited to, dPa′TP, dPaTP, dPnTP and/or dPxTP. In some examples, the first intermediate and the second intermediate are not the same intermediate of the unnatural base pair. In some examples, one of the first or second intermediate may be dPa′TP. In some examples, one of the first or second intermediate may be dPxTP. When the first intermediate is dPa′TP, the second intermediate will be dPxTP, and vice versa.
As used herein, the term “unnatural base pair” refers to a nucleic acid base pair composed of artificially made or non-standard pair of nucleobases. Thus, in some examples, the unnatural base pair is composed of a nucleobase (or an unnatural base) such as, but is not limited to:
a Ds derivatives, such as:
wherein R and R′ each independently represent any moiety represented by the following formula:
wherein n1=2 to 10; n2=1 or 3; n3=1, 6, or 9; n4=1 or 3; n5=3 or 6; R1=Phe (phenylalanine), Tyr (tyrosine), Trp (tryptophan), His (histidine), Ser (serine), or Lys (lysine); and R2, R3, and R4=Leu (leucine), Leu, and Leu, respectively, or Trp, Phe, and Pro (proline), respectively.
However, it would be understood by the person skilled in the art that the method as described herein may be used on any unnatural base pairs known in the art, provided the intermediate of the unnatural base pairs is known.
In some example, the unnatural base pair may be a Ds-Px pair as follows:
In contrast to the term “unnatural base pair”, as used herein, the term “natural base pair” that refers to a nucleic acid base composed of standard or naturally occurring pair of nucleobases such as adenine (A), guanine (G), thymine (T), uracil (U), and cytosine (C). Thus, in some examples, the natural base pair may be composed of a nucleobase selected from the group consisting of A, G, C, U, and T.
In some examples, the nucleic acid as described herein includes nucleic acid sequences that comprises one or more natural base pair and one or more unnatural base pair. In some examples, the nucleic acid described herein includes nucleic acids with no more than 20% unnatural base pairs, or no more than 15% unnatural base pairs, or no more than 14% unnatural base pairs, or no more than 13% unnatural base pairs, or no more than 12% unnatural base pairs, or no more than 11% unnatural base pairs, or no more than 10% unnatural base pairs, or no more than 9% unnatural base pairs, or no more than 8% unnatural base pairs, or no more than 7% unnatural base pairs, or no more than 6% unnatural base pairs, or no more than 5% unnatural base pairs, or no more than 4% unnatural base pairs, or no more than 3% unnatural base pairs, or no more than 2% unnatural base pairs, or no more than 1% unnatural base pairs.
In some examples, the nucleic acid having a template of 5′-N+2N+1XYN−1N−2-3′ may include no more than 20% unnatural base pairs, or no more than 15% unnatural base pairs, or no more than 14% unnatural base pairs, or no more than 13% unnatural base pairs, or no more than 12% unnatural base pairs, or no more than 11% unnatural base pairs, or no more than 10% unnatural base pairs, or no more than 9% unnatural base pairs, or no more than 8% unnatural base pairs, or no more than 7% unnatural base pairs, or no more than 6% unnatural base pairs, or no more than 5% unnatural base pairs, or no more than 4% unnatural base pairs, or no more than 3% unnatural base pairs, or no more than 2% unnatural base pairs, or no more than 1% unnatural base pairs.
In some examples, the nucleic acid having a template of 5′-N+3N+2N+1XYN−1N−2N−3-3′ may include no more than 15% unnatural base pairs, or no more than 14% unnatural base pairs, or no more than 13% unnatural base pairs, or no more than 12% unnatural base pairs, or no more than 11% unnatural base pairs, or no more than 10% unnatural base pairs, or no more than 9% unnatural base pairs, or no more than 8% unnatural base pairs, or no more than 7% unnatural base pairs, or no more than 6% unnatural base pairs, or no more than 5% unnatural base pairs, or no more than 4% unnatural base pairs, or no more than 3% unnatural base pairs, or no more than 2% unnatural base pairs, or no more than 1% unnatural base pairs.
It is believed that the method as presently disclosed may be used for the sequencing of either DNA and/or RNA strand. Thus, the method of the present disclosure may be performed on nucleic acid that is a DNA and/or RNA strand. In some examples, the nucleic acid may be a DNA and/or RNA strand. In some examples, the nucleic acid is a DNA strand. When the nucleic acid is a DNA strand, the natural base pair is composed of natural nucleobases such as A, G, C, and T. In some examples, the natural base pair may be as follows:
The inventors of the present disclosure found that the ratio of the conversion/composition of an unnatural base pair to either one of a natural base pair varies (and is unique) depending on the sequence of the natural base pair immediately adjacent to the position of the unnatural base pair. Thus, the variation and the uniqueness of the ratio of the conversion can be used as a reference when determining the presence or absence of an unnatural base pair.
As used herein, the term “composition rate” or “conversion rate” may be used interchangeably to refer to the probability (or rate) of an unnatural base pair being replaced (in a replacement PCR) by one of four natural nucleobases in context (or depending on) the sequence of the one or more natural nucleobase immediately adjacent to the position of the unnatural base pair.
As exemplified in the Experimental section below and in
In some examples, the library of pre-determined conversion/composition rate may be generated by (1) providing a plurality of template nucleic acid containing natural nucleobase (i.e. natural-base) randomized sequences and an unnatural base pair (such as a Ds); (2) performing a replacement replication reaction on the plurality of template nucleic acid with one intermediate of the unnatural base pair (or nucleobase); (3) performing further replacement replication reaction on the nucleic acid from (2) with natural base pair (or nucleobase) to thereby have a plurality of nucleic acid with no unnatural base pair (or nucleobase); (4) sequencing the resulting nucleic acid from (3); (5) clustering the sequences of the nucleic acid obtained from the sequencing step and/or identifying the position of the unnatural base pair (or nucleobase); (6) determining a ratio (or rate or probability) of conversion of the unnatural base pair (or nucleobase) to each of the natural base pair (or nucleobase); wherein the ratio is a value point (data point) in the library of pre-determined conversion/composition rate that is unique to the sequence of the template nucleic acid. The value point/ratio/rate/data point in the library of each template nucleic acid sequence serves as a unique identification point of the nucleic acid sequence that contains the unnatural base pair (or nucleobase). In order to build the library, it would be advantages if the sequence of the plurality of the template nucleic acid in (1) is known or pre-determined or pre-designed. In some examples, the plurality of template nucleic acid may be in the format of 5′-N+1XYN−1-3′, 5′-N+2N+1XYN−1N−2-3′, 5′-N+3N+2N+1XYN−1N−2N−3-3′, 5′-N+MN+(M−1) . . . N+2N+1XYN−1N−2 . . . N−(M−1)N−M-3′, and the like, wherein X is an unnatural nucleobase (for example a Ds), N is independently any one of A, G, C, or U/T, Y is an integer having a value of 1 to 3, and M is an integer having a value of 1 to 50. In some examples, M may be 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40.
Thus, the library of pre-determined conversion/composition rates includes the conversion rate of an unnatural base pair to either one of a natural base pair based on the sequence of one or more natural base pair immediately adjacent to the position of the unnatural base pair. In some examples, the library of pre-determined conversion/composition rate comprises a ratio of the conversion of an unnatural base pair to either one of a natural base pair based on the sequence of one, or two, or three, or four, or five, or six, or seven, or eight, or nine, or ten natural base pair (immediately) adjacent to the unnatural base pair. In some examples, the library of pre-determined conversion/composition rates may include the conversion rate of 5′-N+1XYN−1-3′, the conversion rate of 5′-N+2N+1XYN−1N−2-3′, the conversion rate of 5′-N+3N+2N+1XYN−1N−2N−3- 3′, the conversion rate of 5′-N+MN+(M−1) . . . N+2N+1XYN−1N−2 . . . N−(M−1)N−M-3′, and the like, wherein X is an unnatural nucleobase (for example a Ds), N is independently any one of A, G, C, or U/T, Y is an integer having a value of 1 to 3, and M is an integer having a value of 1 to 50. In some examples, M may be 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40.
In some examples, the library of pre-determined composition rate comprises a ratio or the probability of the conversion of an unnatural nucleobase to either one of a natural nucleobase depending on the sequence of one or more adjacent nucleobase. In some examples, the composition rate may be calculated using the following formula:
where S(n, i) is the read numbers of sequences which has natural base n at position i, and CR(n, i) is the composition rate to natural base n at position i.
In some examples, the composition rate may be calculated using the formula: CR (n, i)=% rN (at position i)=S(n, i)/[S(A, i)+S(G, i)+S(C, i)+S(T, i)]×100, where S(n, i) is the read numbers of sequences which has natural base n at position i, and CR(n, i) is the composition rate to natural base n at position i.
In some examples, the replacement replication reaction further comprises replicating the nucleic acid using natural base pairs.
In some examples, the replacement replication reaction may be a replacement polymerase chain reaction (PCR). In some examples, where the nucleic acid is an RNA strand, the replacement replication reaction may include a reverse transcription followed by a replacement polymerase chain reaction (PCR). In some examples, where the nucleic acid is a strand of RNA, reverse transcription may be included, and primer extension may also be utilised.
As illustrated in
For avoidance of doubt, if two replacement replication reactions are performed, the replacement replication reactions may include the following steps (a) performing a first nucleic acid replication reaction using a first replication substrate containing a first intermediate of the unnatural base pair to thereby replace the unnatural base pair with the first intermediate of the unnatural base pair; (b) performing a second nucleic acid replication reaction using a second replication substrate containing natural base pair to thereby replace the first intermediate of the unnatural base pair with a natural base pair, (c) performing a third nucleic acid replication reaction using a third replication substrate containing a second intermediate of the unnatural base pair to thereby replace the unnatural base pair with the second intermediate of the unnatural base pair; (d) performing a fourth nucleic acid replication reaction using a fourth replication substrate containing natural base pair to thereby replace the second intermediate of the unnatural base pair with a natural base pair. It would be understood that steps (a) to (b) and (c) to (d) are sequential steps. That is, step (a) is to be followed by step (b) and step (c) is to be followed by step (d). However, (a) to (b) and (c) to (d) can be performed separately, concurrently or together. That is, (a) to (b) can be performed at the same time but in a different reaction as (c) to (d).
In some examples, the replacement replication reaction may further comprise replicating or amplification of the nucleic acid from the second nucleic acid replication reaction to thereby have a plurality of nucleic acid with natural base pair resulting from the second nucleic acid replication reaction. This replicating or amplification step is to assist the sequencing of the nucleic acid that has been processed through the replacement PCR.
In some examples, the sequencing may be performed using any high-throughput sequencing methods known in the art. For example, the sequencing may be performed using deep sequencing method or any type of conventional next-generation sequencing to handle enormous amounts of reads without cloning process.
In some examples, the identifying the candidate position of the unnatural base pair may comprise aligning the sequenced nucleic acid and determining a position that contains varying nucleobase. As would be understood by the person skilled in the art, the process of clustering and/or alignment of the sequenced nucleic acids to identify the candidate position of the unnatural base may be performed using a data processing device, such as a data processor.
In some examples, the ratio of conversion of the intermediate to each one of a natural base pair at the candidate position of the unnatural base pair is calculated using the formula:
% rA (at position i)=CR(A,i)=S(A,i)/[S(A,i)+S(G,i)+S(C,i)+S(T,i)]×100
where S(n, i) is the read numbers of sequences which has natural base n at position i.
In some examples, a substantial match of the ratio of conversion of the intermediate would result in about 70% or more detection sensitivity, or about 80% or more detection sensitivity, or about 85% or more detection sensitivity, about 90% or more detection sensitivity, or about 91% or more detection sensitivity, or about 92% or more detection sensitivity, or about 93% or more detection sensitivity, or about 94% or more detection sensitivity, or about 95% or more detection sensitivity, or about 96% or more detection sensitivity, or about 97% or more detection sensitivity, or about 98% or more detection sensitivity, or about 99% or more detection sensitivity. In some examples, the substantial match of the ratio of conversion of the intermediate is a value that is not more than (or less than) about 1%, or not more than (or less than) about 2%, or not more than (or less than) about 3%, or not more than (or less than) about 4%, or not more than (or less than) about 5%, or not more than (or less than) about 6%, not more than (or less than) about 7%, or not more than (or less than) about 8%, or not more than (or less than) about 9%, or not more than (or less than) about 10% of the value in the library of the pre-determined conversion/composition rate. In some examples, the substantial match is calculated based on the % rA difference/deviation. In some examples, the % rA difference/deviation may be calculated based on the difference between the value in the library of a pre-determined conversion/comparison rate and the ratio of conversion of the intermediate/actual value from replacement PCR (see for example in
In some examples, wherein a substantial match of the ratio of conversion of the intermediate to a value in the library of the pre-determined conversion/composition rate is not achieved, the position of the unnatural base pair may be determined by comparing the ratio of conversion of a first intermediate with the ratio of conversion of a second intermediate. In such examples, an acceptable deviation/difference of the ratio of conversion of a first intermediate from the ratio of conversion of a second intermediate would result in about 90% or more detection sensitivity, or about 91% or more detection sensitivity, or about 92% or more detection sensitivity, or about 93% or more detection sensitivity, or about 94% or more detection sensitivity, or about 95% or more detection sensitivity, or about 96% or more detection sensitivity, or about 97% or more detection sensitivity, or about 98% or more detection sensitivity, or about 99% or more detection sensitivity. In such examples, a varying ratio of conversion of a first intermediate differs from the ratio of conversion of a second intermediate indicates and/or confirms the position of the unnatural base pair. In such example, the varying ratio of conversion of a first intermediate to the ratio of the second intermediate (i.e. % deviation/difference) is a value that is not more than about 10%, or nor more than about 9%, or not more than about 8%, or not more than about 7%, or not more than about 6%, or not more than about 5%, or not more than about 4%, or not more than about 3%, or not more than about 2%, or not more than about 1% of one value to another. In some examples, the varying difference may be calculated using the formula:
VR(i)=|CRp(A,i)−CRq(A,i)|
where CRp(A, i) is the composition rate of a first intermediate to natural base A at position CRq(A, i) is the composition rate of a second intermediate to natural base A at position i, and VR(i) is % deviation/difference at position i.
In another aspect of the present invention, there is provided an apparatus for performing the methods as described herein. For example, the apparatus may include a device for performing the replacement replication reaction (such as a PCR). In some examples, the apparatus may include a device for performing the data clustering, the data point management, and/or data comparison as required in the methods as described herein. In some examples, the apparatus may be an integrated device having all the components required for preforming the methods as described herein.
In some examples, there is provided an apparatus for sequencing a nucleic acid containing an unnatural base pair (UBP), wherein the apparatus comprises a system or device configured to perform one or more replacement replication reaction; a system or device configured to sequence the nucleic acid resulting from the replacement replication reaction; a system or device configured to cluster the sequenced nucleic acid; a system or device configured to identify a candidate position of the unnatural base pair; a system or device configured to determine a ratio of conversion of the intermediate to each one of the natural base pair at the candidate position of the unnatural base pair; a system or device configured to compare the ratio of conversion of the intermediate to a library of pre-determined conversion/composition rate based on the sequences of one or more natural base pair adjacent to the candidate position of the unnatural base pair; and/or a system or device configured to determine the deviation/difference between the ratio of conversion of the intermediate to a value in the library of the pre-determined conversion/composition rate confirms the position of the unnatural base pair, thereby determining the sequence of the nucleic acid containing the unnatural base pair.
It will be appreciated by a person skilled in the art that other variations and/or modifications may be made to the specific embodiments without departing from the scope of the invention as broadly described. For example, in the description herein, features of different exemplary embodiments may be mixed, combined, interchanged, incorporated, adopted, modified, included etc. or the like across different exemplary embodiments. The present embodiments are, therefore, to be considered in all respects to be illustrative and not restrictive.
UB triphosphate substrates (dPxTP (Diol1-dPxTP), dPaTP and dPa′TP) for PCR and dDs-CE-phosphoramidite were chemically synthesized, as described previously (5,8,24,26,27). DNA libraries containing Ds (NDsN2-49 and NDsN3-49, Table 1) were prepared by the conventional phosphoramidite method with an H-8-SE DNA/RNA Synthesizer (K&A Laborgeraete). DNA primers were purchased from Gene Design and Integrated DNA Technologies, or chemically synthesized. DNAs were purified by denaturing gel electrophoresis. Taq DNA polymerase (pol) and AccuPrime Pfx DNA pol were purchased from New England Biolabs and Life Technologies, respectively.
TAGCGCATAGGTGGGATG
GATACAATCCTGATCCAT
ATCCTCACCGATGTACTG
GATACAATCCTGATCCAT
TGACTCGAACGGATTAGT
GACTAC
ATCCGCCATACTTACGTTG
TCCGTGACCNNNNNNNN
GAATCTTAAGTGAAGTCG
Replacement PCR for the Conversion from Ds to Natural Bases
To characterize and optimize the replacement PCR, the present disclosure employed two DNA libraries, NDsN2-49 and NDsN3-49, which contain randomized regions with NNDsNN (where N=A, G, C or T) and NNNDsNNN, respectively. For the demonstration using the actual enriched libraries, the present disclosure used the final round of the DNA libraries for anti-IFNγ aptamer generation (N43Ds-P001 mix, Kimoto et al. (24)) and anti-vWF aptamer generation (N30Ds-S6-006, Matsunaga et al. (12)). The Ds bases in each sequence of the DNA libraries were replaced with natural bases through 12 cycles of PCR amplification without dDsTP, which is two-step cycling [94° C. for 15 sec-65° C. for 3 min 30 sec], after 2 min at 94° C. for the initial denaturation step. PCR (100 μl) was performed by using each library (1 pmol) as the template, with 1 μM of each corresponding primer set (Table 1) and each DNA pol at the manufacturer's recommended concentration (AccuPrime Pfx, 0.05 U/μl; Taq, 0.025 U/μl) in the 1× reaction buffer accompanying each DNA pol. In PCR using AccuPrime Pfx DNA pol, 0.1 mM each dNTP and 0.5 mM MgSO4 were added to the reaction buffer, and the final concentrations of each dNTP and MgSO4 were 0.4 mM and 1.5 mM, respectively. In PCR using Taq DNA pol, 0.3 mM of each dNTP was used for the reaction. As an intermediate UB substrate, dPa′TP, dPxTP or dPaTP was further added (0.05 mM final concentration). The inventors of the present disclosure examined six different conditions by changing the DNA pols and UB substrates: AccuPrime Pfx DNA pol in the absence of UB substrate (cond. 1), in the presence of dPa′TP (cond. 2), dPaTP (cond. 3) or dPxTP (cond. 4) and Taq DNA pol in the absence of UB substrate (cond. 5) or in the presence of dPa′TP (cond. 6).
The amplified DNAs obtained by replacement PCR were purified with a QIAquick Gel Extraction Kit (QIAGEN) and sequenced with the IonPGM sequencing system (Life Technologies), according to the manufacturers' instructions. Adapter sequences were ligated to the amplified DNAs using an Ion Plus Fragment Library Kit, and emulsion PCR was performed on a Life Technology OneTouch 2 instrument with the Ion PGM Hi-Q or Hi-Q View OT2 Kit. Enriched template beads were loaded on Ion PGM chips and sequenced with an Ion PGM Hi-Q or Hi-Q View Sequencing Kit. The list of the chips used and the obtained sequencing reads are summarized in Table 2.
Sequences were extracted from the deep sequencing data with the following criteria: 5′-(full sequence of the forward primer)-[N bases (N=1-20)]-(complementary sequence of the last six bases of the reverse primer)-3′. The extraction was performed against the complementary sequences as well. The total of both extracted sequences was defined as the “total read counts”. The sequences containing the constant region, 5′-ATGT-(5 bases)-GTCA-3′ for NDsN2-49 and 5′-ATG-(7 bases)-TCA-3′ for NDsN3-49, were retained for further analysis. The composition rates (%) of each natural base converted from Ds (% rN, N=A, T, G, and C) were determined for all of the sequence contexts around Ds (total 44 sequences for NDsN2-49 and 46 for NDsN3-49). For easy comparison across samples, the read count for each sequence context was normalized to reads per million (RPM). For NDsN3-49, replacement PCR reactions with AccuPrime Pfx DNA pol and dPa′TP (cond. Pa′, equal to cond. 2) or dPxTP (cond. Px, equal to cond. 4), as well as the following sequence analyses, were performed in triplicate to calculate the average and variability. The averaged % rN values obtained by this sequencing were employed in the encyclopedia data.
At first, the deep sequencing data were obtained using the N43Ds-P001 mix and N30Ds-S6-006 libraries that were isolated by ExSELEX targeting interferon-γ (IFNγ) and von Willebrand factor A1-domain (vWF), respectively. The sequences were extracted with the following criteria: 5′-(full sequence of the forward primer)-[45 bases (N43Ds-P001 mix) or 42 bases (N30Ds-S6-006)]-(complementary sequence of the last six bases of the reverse primer)-3′. Similarly, the complementary sequences were extracted. To simplify the analysis for the N43Ds-P001 mix libraries, only the aptamer sequences containing the two-base tag (2 bases+43 randomized bases) were extracted. Next, the extracted sequences were clustered into 10-20 families based on the sequence similarities, using in-house Perl scripts (clustered into the same family if the mismatch between the sequence and the top sequence is less than six). Analyses of the N43Ds-P001 libraries were performed in triplicate, and those of the N30Ds-S6-006 libraries were performed twice, to confirm the reproducibility. The obtained % rN values were then compared with the values in the encyclopedia.
The sensitivity and selectivity of the sequencing method in the present disclosure were evaluated by a ROC analysis. The use of % rA of the encyclopedia in the anti-IFNγ aptamer selection (criteria 1, see
The composition rates of the natural bases converted from Ds by replacement PCR greatly depend on the natural base sequence contexts around Ds. To simultaneously determine the natural-base composition rates for all of the sequence contexts, the present study used DNA libraries containing natural-base randomized sequences and Ds (
The amplified double-stranded DNAs after 12 cycles of replacement PCR were subjected to deep sequencing with the IonPGM system. All of the extracted sequences with the correct length were classified into each sequence context around Ds, and the natural-base composition rates at the initial Ds position were determined in each sequence context. The data were then compiled as the encyclopedia, ENBRE (
First, the replacement PCR of the NNDsNN library was examined using AccuPrime Pfx DNA pol without any intermediate UB substrates (
Next, dPa′TP was added as an intermediate substrate for replacement PCR using AccuPrime Pfx DNA pol (
dPaTP (Pa: pyrrole-2-carbaldehyde) and dPxTP were also examined as other UB intermediate substrates for replacement PCR with AccuPrime Pfx DNA pol (
Besides AccuPrime Pfx DNA pol, Taq DNA pol was tested for replacement PCR in the presence and absence of dPa′TP (
Overall, replacement PCR in the presence of dPa′TP using AccuPrime Pfx DNA pol was the best combination for all of the sequence contexts, and the replacement PCR in the presence of dPxTP was the second best (
Based on the above results using the NNDsNN library, two sets of the encyclopedias of the natural-base composition rates was prepared for each sequence context in replacement PCR in the presence of either dPa′TP or dPxTP, using NNNDsNNN (46=4,096 combinations) and AccuPrime Pfx DNA pol, to increase the accuracy of ENBRE (
Furthermore, from the difference in the % rA values between the two replacement PCRs with dPa′TP and dPxTP, the present study could confirm the existence of Ds in each aptamer candidate obtained from the final round of ExSELEX. If the mutation from Ds to natural bases occurred during the ExSELEX procedures, then the differences in the % rA values obtained by the two replacement PCRs would not be observed.
Evaluation of the Sequencing Method Using UB-DNA Aptamer Sequences from Enriched Libraries Obtained by ExSELEX
To verify the accuracy of ENBRE, the sequencing method was tested by using two actual enriched libraries, which were obtained by ExSELEX procedures targeting interferon-γ (IFNγ) and von Willebrand factor A1-domain (vWF). From the libraries, high-affinity Ds-containing DNA aptamers were obtained for both targets. The anti-IFNγ aptamer (KD=38 pM) was obtained as one of the first Ds-containing aptamers, using a predetermined library comprised of ˜20 sub-libraries. The aptamer contained three Ds bases, and two Ds bases were essential for the tight binding to IFNγ. Previously, the Ds positions in the aptamer sequence were determined using the specific barcode that was embedded into each sub-library. The anti-vWF aptamer (KD=75 pM) was obtained by ExSELEX using six different batches (#1-#6) of the chemically synthesized DNA library with randomized sequences including Ds bases. The inventors of the present disclosure previously obtained two aptamer families from libraries #1 and #4 and determined the Ds positions in each aptamer family by modified Sanger sequencing using each aptamer candidate, which was isolated by hybridization with a specific probe from the enriched library.
First, to analyze the sequences of the anti-IFNγ aptamer, replacement PCR was performed in the presence of dPa′TP or dPxTP using the enriched library (N43Ds-P001 mix in Table 1) that was previously obtained after seven rounds of ExSELEX (11) (
Next, two enriched libraries #1 and #4 obtained by ExSELEX targeting vWF was analyzed using the Ds-randomized library (12) (
To assess the accuracy of the ENBRE data for DNA sequencing involving Ds bases, the present study broadly explored the % rA values of the sequencing data for the anti-IFNγ aptamer generation, in which the library containing Ds bases was used at defined positions. The differences of the % rA values between the actual data of the enriched library and the ENBRE data was analysed using 20 Ds positions in the top ten families of the anti-IFNγ aptamer sequences (
To develop a sequencing method for Ds-DNA aptamer generation, the replacement PCR method was optimised, and it was found that the two replacement PCR methods using AccuPrime Pfx DNA pol and either dPa′TP or dPxTP as an intermediate substrate efficiently convert Ds to natural bases in the amplified DNAs. The natural-base composition rates converted from Ds significantly varied, depending on the use of the intermediate substrates and the sequence contexts around Ds. Thus, two ENBRE databases were made corresponding to all of the sequence contexts for both dPa′TP- and dPxTP-replacement PCRs. In general, replacement PCR with dPa′TP converts Ds to A>>T>>C≈G in most of the sequence contexts. In contrast, replacement PCR with dPxTP increased the conversion rates from Ds to T, as compared with that with dPa′TP. These differences in the conversion tendencies between the two intermediate substrates increased the accuracy for the determination of the Ds positions in the Ds-DNA aptamer candidate sequences.
This approach facilitates the deep sequencing method to identify a single clone containing Ds bases from enriched libraries containing different sequences obtained by ExSELEX. The present disclosure has demonstrated the DNA sequencing of Ds-DNA aptamer candidates in the enriched libraries obtained by ExSELEX targeting IFNγ and vWF. This sequencing method could simplify the process and thus shorten the time required for Ds-DNA aptamer generation using libraries with randomized sequences containing Ds. In addition, besides the Ds-Px pair, this method could be applied to other unnatural base pair systems.
This study also provides valuable information about replication fidelity involving UBPs. The replacement PCR in the absence of intermediate UB-substrates greatly reduced the conversion efficiency from Ds to natural bases. This fact confirmed the high fidelity of the Ds-Px pair in replication. In addition, these data are useful to design an efficient Ds-containing sequence context for replication. For example, the replacement PCR in the absence of intermediate UB-substrates predominantly replaced Ds in the NYDsTN sequence contexts with natural bases, but was not efficient for Ds in the NYDsCN sequence contexts. Since both of the NYDsTN and NYDsCN sequence contexts exhibited high efficiency in PCR amplification, the NYDsCN sequence contexts among them might exhibit the highest efficiency and fidelity in PCR. Furthermore, the present disclosure found that each sequence context yielded varied natural-base composition rates by replacement PCR with dPa′TP. In particular, the NADsAN or NTDsAN sequence contexts tended to increase the misincorporation of dGTP and dCTP opposite Ds. This indicated that the Ds conformation in such sequences might be different from those in other sequences within the polymerase active site. Furthermore, the present disclosure found that Taq DNA pol (family A pol) caused the deletion mutation during replacement PCR, although AccuPrime Pfx and Deep Vent DNA pols (family B pol) rarely observed such a mutation during PCR in the presence of dDsTP and dPxTP. Since the Ds-Px pair functions in PCR using family B pol, the results using family A pol could provide an insight for UBP replication together with the information of structural data of the ternary complex of KlenTaq DNA poly (family A pol) with a Ds-template/primer duplex bound to dPxTP. These data will be useful for further studies to create improved UBPs with higher fidelity and efficiency.
Number | Date | Country | Kind |
---|---|---|---|
10201900941T | Jan 2019 | SG | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/SG2019/050597 | 12/4/2019 | WO | 00 |