The present invention relates to a method for producing a DNA library that can be used for analyzing a DNA marker, for example, and a method for genomic DNA analysis using such DNA library.
In general, genomic analysis is performed to conduct comprehensive analysis of genetic information contained in the genome, such as nucleotide sequence information. However, an analysis aimed at determination of the nucleotide sequence for whole genome is disadvantageous in terms of the number of processes and the cost. In cases of organisms with large genomic sizes, in addition, genomic analysis based on nucleotide sequence analysis has limitations because of genome complexity.
Patent Literature 1 discloses an amplified fragment length polymorphism (AFLP) marker technique wherein a sample-specific index is incorporated into a restriction-enzyme-treated fragment that had been ligated to an adapter and only a part of the sequence of the restriction-enzyme-treated fragment is to be determined. According to the technique disclosed in Patent Literature 1, the complexity of genomic DNA is reduced by treating genomic DNA with a restriction enzyme, the nucleotide sequence of a target part of the restriction-enzyme-treated fragment is determined, and the target restriction-enzyme-treated fragment is thus determined sufficiently. The technique disclosed in Patent Literature 1, however, requires processes such as treatment of genomic DNA with a restriction enzyme and ligation reaction with the use of an adapter. Thus, it is difficult to achieve a cost reduction.
Meanwhile, Patent Literature 2 discloses as follows. That is, a DNA marker for identification that is highly correlated with the results of taste evaluation was found from among DNA bands obtained by amplifying DNAs extracted from a rice sample via PCR in the presence of adequate primers by the so-called RAPD (randomly amplified polymorphic DNA) technique. The method disclosed in Patent Literature 2 involves the use of a plurality of sequence-tagged sites (STSs, which are primers) identified by particular sequences. According to the method disclosed in Patent Literature 2, a DNA marker for identification amplified with the use of an STS primer is detected via electrophoresis. However, the RAPD technique disclosed in Patent Literature 2 yields significantly poor reproducibility of PCR amplification, and, accordingly, such technique cannot be generally adopted as a DNA marker technique.
Patent Literature 3 discloses a method for producing a genomic library wherein PCR is carried out with the use of a single type of primer designed on the basis of a sequence that appears relatively frequently in the target genome, the entire genomic region is substantially uniformly amplified, and a genomic library can be thus produced. While Patent Literature 3 describes that a genomic library can be produced by conducting PCR with the use of a random primer containing a random sequence, it does not describe any actual procedures or results of experimentation. Accordingly, the method described in Patent Literature 3 is deduced to require nucleotide sequence information of the genome so as to identify the genome appearing frequency, which would increase the number of procedures and the cost. According to the method described in Patent Literature 3, in addition, the entire genome is to be amplified, and complexity of genomic DNA cannot be reduced, disadvantageously.
For a technique for genome information analysis, such as genetic linkage analysis conducted with the use of a DNA marker, production of a DNA library in a more convenient and highly reproducible manner is desired. As described above, a wide variety of techniques for producing a DNA library are known. To date, however, there have been no techniques known to be sufficient in terms of convenience and/or reproducibility. Under the above circumstances, it is an object of the present invention to provide a method for producing a DNA library with more convenience and higher reproducibility, and it is another object to provide a method for analyzing genomic DNA with the use of such DNA library.
The present inventors have conducted concentrated studies in order to attain the above objects. As a result, they discovered that high reproducibility could be achieved by conducting PCR with the use of a random primer while designating the concentration of such random primer within a designated range in a reaction solution. This has led to the completion of the present invention.
The present invention includes the following.
(1) A method for producing a DNA library, comprising conducting a nucleic acid amplification reaction in a reaction solution containing genomic DNA and a random primer at a high concentration using genomic DNA as a template to obtain DNA fragments.
(2) The method for producing a DNA library according to (1), wherein the reaction solution comprises the random primer at a concentration of 4 to 200 μM.
(3) The method for producing a DNA library according to (1), wherein the reaction solution comprises the random primer at a concentration of 4 to 100 μM.
(4) The method for producing a DNA library according to (1), wherein the random primer comprises 9 to 30 nucleotides.
(5) The method for producing a DNA library according to (1), wherein the DNA fragments each comprise 100 to 500 nucleotides.
(6) A method for analyzing genomic DNA, comprising using a DNA library produced by the method for producing a DNA library according to any one of (1) to (5) as a DNA marker.
(7) The method for analyzing genomic DNA according to (6), which comprises determining the nucleotide sequence of the DNA library produced by the method for producing a DNA library according to any one of (1) to (5) and confirming the presence or absence of the DNA marker based on the nucleotide sequence.
(8) The method for analyzing genomic DNA according to (7), wherein the presence or absence of the DNA marker is confirmed based on the number of reads of the nucleotide sequence of the DNA library in the step of confirming the presence or absence of the DNA marker.
(9) The method for analyzing genomic DNA according to (7), wherein the nucleotide sequence of the DNA library is compared with known sequence information or with the nucleotide sequence of a DNA library produced using genomic DNA from a different organism or tissue, and the presence or absence of the DNA marker is confirmed based on differences in the nucleotide sequences.
(10) The method for analyzing genomic DNA according to (6), which comprises:
a step of preparing a pair of primers for specifically amplifying the DNA marker based on the nucleotide sequence of the DNA marker;
a step of conducting a nucleic acid amplification reaction using genomic DNA extracted from a target organism as a template and the pair of primers; and
a step of confirming the presence or absence of the DNA marker in the genomic DNA based on the results of the nucleic acid amplification reaction.
(11) A method for producing a DNA library, comprising:
a step of conducting a nucleic acid amplification reaction in a first reaction solution comprising genomic DNA and a random primer at a high concentration to obtain first DNA fragments by the nucleic acid amplification reaction using the genomic DNA as a template; and
a step of conducting a nucleic acid amplification reaction in a second reaction solution comprising the obtained first DNA fragments and a nucleotide, as a primer, which has a 3′-end nucleotide sequence having 70% identity to at least a 5′-end nucleotide sequence of the random primer to ligate the nucleotides to the first DNA fragments, thereby obtaining second DNA fragments.
(12) The method for producing a DNA library according to (11), wherein the first reaction solution comprises the random primer at a concentration of 4 to 100 μM.
(13) The method for producing a DNA library according to (11), wherein the first reaction solution comprises the random primer at a concentration of 4 to 100 μM.
(14) The method for producing a DNA library according to (11), wherein the random primer comprises 9 to 30 nucleotides.
(15) The method for producing a DNA library according to (11), wherein the first DNA fragments each comprise 100 to 500 nucleotides.
(16) The method for producing a DNA library according to (11), wherein the primer for amplifying the second DNA fragments comprises a region used for a nucleotide sequencing reaction, or the primer used for a nucleic acid amplification reaction using the second DNA fragments as templates or a nucleic acid amplification reaction to be conducted repeatedly comprises a region used for a nucleotide sequencing reaction.
(17) A method for analyzing a DNA library, comprising a step of determining a nucleotide sequence for a second DNA fragment obtained by the method for producing a DNA library according to any one of (11) to (15) or a DNA fragment obtained using a primer comprising a region complementary to a sequencer primer to be used in a nucleotide sequencing reaction in the method for producing a DNA library according to (16).
(18) A method for analyzing genomic DNA, comprising using a DNA library produced by the method for producing a DNA library according to any one of (11) to (17) as a DNA marker.
(19) The method for analyzing genomic DNA according to (18), which comprises determining the nucleotide sequence of the DNA library produced by the method for producing a DNA library according to any one of ((11) to (17) and confirming the presence or absence of the DNA marker based on the nucleotide sequence.
(20) The method for analyzing genomic DNA according to (19), wherein the presence or absence of the DNA marker is confirmed based on the number of reads of the nucleotide sequence of the DNA library in the step of confirming the presence or absence of the DNA marker.
(21) The method for analyzing genomic DNA according to (19), wherein the nucleotide sequence of the DNA library is compared with known sequence information or with the nucleotide sequence of a DNA library produced using genomic DNA from a different organism or tissue, and the presence or absence of the DNA marker is confirmed based on differences in the nucleotide sequences.
(22) The method for analyzing genomic DNA according to (18), which comprises: a step of preparing a pair of primers for specifically amplifying the DNA marker based on the nucleotide sequence of the DNA marker; a step of conducting a nucleic acid amplification reaction using genomic DNA extracted from a target organism as a template and the pair of primers; and a step of confirming the presence or absence of the DNA marker in the genomic DNA based on the results of the nucleic acid amplification reaction.
(23) A DNA library, which is produced by the method for producing a DNA library according to any one of (1) to (5) and (11) to (16).
The present description includes part or all of the contents as disclosed in the descriptions and/or drawings of Japanese Patent Application Nos. 2016-129048, 2016-178528, and 2017-071020, which are priority documents of the present application.
A DNA library can be produced in a very convenient manner by the method for producing a DNA library according to the present invention because the method is based on a nucleic acid amplification method using random primers. In addition, reproducibility of a nucleic acid fragment to be amplified is excellent in the method for producing a DNA library according to the present invention even though the method is a nucleic acid amplification method using random primers. Therefore, according to the method for producing a DNA library of the present invention, the produced DNA library can be used as a DNA marker and thus can be used for genomic DNA analysis such as genetic linkage analysis.
The method for analyzing genomic DNA with the use of a DNA library according to the present invention involves the use of a DNA library produced in a simple manner with excellent reproducibility. Accordingly, genomic DNA can be analyzed in a cost-effective manner with high accuracy.
Hereafter, the present invention is described in detail.
According to the method for producing a DNA library of the present invention, a nucleic acid amplification reaction is conducted in a reaction solution, which is prepared to contain a primer having an arbitrary nucleotide sequence (hereafter, referred to as “random primer”) at a high concentration, and the amplified nucleic acid fragment is determined to be a DNA library. The expression “high concentration” used herein means that the concentration is higher than the primer concentration in a general nucleic acid amplification reaction. Specifically, the method for producing a DNA library of the present invention is characterized in that a random primer is used at a higher concentration than a primer used in a general nucleic acid amplification reaction. As a template contained in a reaction solution, genomic DNA prepared from a target organism for which a DNA library is produced can be used.
In the method for producing a DNA library of the present invention, a target organism species is not particularly limited, and a target organism species can be any organism species such as an animal including a human, a plant, a microorganism, or a virus. In other words, according to the method for producing a DNA library of the present invention, a DNA library can be produced from any organism species.
In the method for producing a DNA library of the present invention, the concentration of a random primer is specified as described above. Thus, a nucleic acid fragment (or nucleic acid fragments) can be amplified with high reproducibility. The term “reproducibility” used herein means an extent of concordance among nucleic acid fragments amplified by a plurality of nucleic acid amplification reactions carried out with the use of the same template and the same random primer. That is, the term “high reproducibility (or the expression “reproducibility is high”)” means that the extent of concordance among nucleic acid fragments amplified by a plurality of nucleic acid amplification reactions carried out with the use of the same template and the same random primer is high.
The extent of reproducibility can be evaluated by, for example, conducting a plurality of nucleic acid amplification reactions with the use of the same template and the same random primer, calculating the Spearman's rank correlation coefficient for the fluorescence unit (FU) obtained as a result of electrophoresis of the resulting amplified fragments, and evaluating the extent of reproducibility on the basis of such coefficient. The Spearman's rank correlation coefficient is generally represented by the symbol p. When p is greater than 0.9, for example, the reproducibility of the amplification reaction of interest can be evaluated to be sufficient.
[Random Primer]
A sequence constituting a random primer that can be used in the method for producing a DNA library according to the present invention is not particularly limited. For example, a random primer comprising nucleotides comprising 9 to 30 nucleotides can be used. In particular, a random primer may be composed of any nucleotide sequence comprising 9 to 30 nucleotides, a nucleotide type (i.e., a sequence type) is not particularly limited, and a random primer may be composed of 1 or more types of nucleotide sequences, preferably 1 to 10,000 types of nucleotide sequences, more preferably 1 to 1,000 types of nucleotide sequences, further preferably 1 to 100 types of nucleotide sequences, and most preferably 1 to 96 types of nucleotide sequences. With the use of nucleotides (or a group of nucleotides) within the range mentioned above for a random primer, an amplified nucleic acid fragment can be obtained with higher reproducibility. When a random primer comprises a plurality of nucleotide sequences, it is not necessary that all nucleotide sequences comprise the same number of nucleotides (9 to 30 nucleotides). A random primer may comprise a plurality of nucleotide sequences composed of a different number of nucleotides.
In general, in order to obtain a specific amplicon by a nucleic acid amplification reaction, the nucleotide sequence of a primer corresponding to the amplicon is designed. For example, a pair of primers are designed such that the primers sandwich a site corresponding to an amplicon of a template DNA of genomic DNA or the like. In such case, as the primers are designed to be hybridized to a specific region included in a template, they may be referred to as “specific primers.”
Meanwhile, a random primer is different from a primer that is designed to obtain a specific amplicon, and it is designed to obtain a random amplicon but not to be hybridized to a specific region of a template DNA. A random primer may have any nucleotide sequence and can contribute to random amplicon amplification when it is incidentally hybridized to a region included in template DNA.
In other words, a random primer can be regarded as nucleotides involved in random amplicon amplification comprising an arbitrary sequence as described above. Here, such arbitrary sequence is not particularly limited. However, it may be designed as, for example, a nucleotide sequence randomly selected from the group consisting of adenine, guanine, cytosine, and thymine or a specific nucleotide sequence. Examples of a specific nucleotide sequence include a nucleotide sequence including a restriction enzyme recognition sequence or a nucleotide sequence having an adapter sequence used for a next-generation sequencer.
When designing plural types of nucleotides for random primers, it is possible to use a method for designing a plurality of nucleotide sequences having certain lengths by randomly selecting from the group consisting of adenine, guanine, cytosine, and thymine. In addition, when designing different types of nucleotides for random primers, it is also possible to use a method for designing a plurality of nucleotide sequences each comprising a common part consisting of a specific nucleotide sequence and a non-common part consisting of an arbitrary nucleotide sequence. Here, the non-common part may consist of a nucleotide sequence randomly selected from the group consisting of adenine, guanine, cytosine, and thymine or all or one of combinations of four types of nucleotides which are adenine, guanine, cytosine, and thymine. The common part is not particularly limited, and it may consist of any nucleotide sequence. It may consist of, for example, a nucleotide sequence including a restriction enzyme recognition sequence, a nucleotide sequence having an adapter sequence used for a next-generation sequencer, or a nucleotide sequence common in a specific gene family.
When designing plural types of nucleotide sequences having certain lengths by randomly selecting nucleotides from four types of nucleotides for a plurality of random primers, 30% or more, preferably 50% or more, more preferably 70% or more, and further preferably 90% or more of the entire such sequences exhibit 70% or less, preferably 60% or less, more preferably 50% or less, and most preferably 40% or less identity. By designing different types of nucleotide sequences having certain lengths by randomly selecting nucleotides from different types of nucleotides for a plurality of random primers exhibiting the identity within such range, an amplified fragment can be obtained over the entire genomic DNA of the target organism species. Thus, uniformity of the amplified fragment can be enhanced.
When designing a plurality of nucleotide sequences each comprising a common part consisting of a specific nucleotide sequence and a non-common part consisting of an arbitrary nucleotide sequence for a plurality of random primers, it is possible to design, for example, a nucleotide sequence comprising a non-common part consisting of several nucleotides on the 3′ end side and a common part consisting of the remaining nucleotides on the 5′ end side. By allowing a non-common part to consist of n number of nucleotides on the 3′ end side, it is possible to design 4n types of random primers. Here, the expression “n number” may refer to 1 to 5, preferably 2 to 4, and more preferably 2 to 3.
For example, it is possible to design, as a random primer comprising a common part and a non-common part, 16 types of random primers in total, each of which has an adapter sequence (common part) used for a next-generation sequencer on the 5′ end side and two nucleotides (non-common part) on the 3′ end side in total. It is possible to design 64 types of random primers in total by setting the number of nucleotides on the 3′ end side to 3 nucleotides (non-common part). The more types of random primers, the more comprehensively the amplified fragments can be obtained throughout the genomic DNA of the target organism species. Therefore, when designing a random primer consisting of a common part and a non-common part, it is preferable that 3 nucleotides exist on the 3′ end side.
However, for example, after designing 64 types of nucleotide sequences each comprising a common part and a non-common part consisting of 3 nucleotides, not more than 63 types of random primers selected from these 64 types of nucleotide sequences may be used. In other words, as compared with the case of using all 64 types of random primers, in the case of using not more than 63 types of random primers, excellent results can be obtained in a nucleic acid amplification reaction or analysis using a next generation sequencer. Specifically, when 64 types of random primers are used, the number of reads of a specific nucleic acid amplification fragment might become remarkably large. In such case, favorable analysis results can be obtained by using the remaining 63 random primers excluding one or more random primers involved in the amplification of the specific nucleic acid amplification fragment from 64 types of random primers.
Similarly, in the case of designing 16 types of random primers each comprising a common part and a non-common part of 2 nucleotides, when not more than 15 types of random primers selected from 16 types of random primers are used, favorable analysis results may be obtained in a nucleic acid amplification reaction or analysis using a next generation sequencer.
Nucleotides constituting a random primer are preferably designed such that the G-C content is 5% to 95%, more preferably 10% to 906, further preferably 15% to 80%, and most preferably 20% to 70%. With the use of a set of nucleotides having a G-C content within the above range as a random primer, amplified nucleic acid fragments can be obtained with enhanced reproducibility. The G-C content is the percentage of guanine and cytosine contained in the whole nucleotide chain.
Further, nucleotides constituting a random primer are designed such that consecutive nucleotides account for preferably 80% or less, more preferably 70% or less, further preferably 60% or less, and most preferably 50% or less with respect to the entire sequence length. Alternatively, nucleotides constituting a random primer are designed such that the number of consecutive nucleotides is preferably 8 or less, more preferably 7 or less, further preferably 6 or less, and most preferably 5 or less. An amplified nucleic acid fragment can be obtained with enhanced reproducibility with the use of a set of nucleotides constituting a random primer, for which the number of consecutive nucleotides falls within the above range.
In addition, it is preferable that nucleotides constituting a random primer be designed not to constitute a complementary region of 6 or more, more preferably 5 or more, and further preferably 4 or more nucleotides in a molecule. When the nucleotides designed not to constitute a complementary region within the above range, double strand formation occurring in a molecule can be prevented, and amplified nucleic acid fragments can be obtained with enhanced reproducibility.
Further, when plural types of nucleotides are designed for a random primer, in particular, it is preferable that a plurality of nucleotides be designed not to constitute a complementary region of 6 or more, more preferably 5 or more, and further preferably 4 or more nucleotides while forming a plurality of nucleotide sequences. When different types of nucleotide sequences are designed Thus, double strand formation occurring between nucleotide sequences can be prevented, and amplified nucleic acid fragments can be obtained with enhanced reproducibility.
When plural types of nucleotides are designed for random primers, it is preferable that the nucleotides be designed not to constitute a complementary sequence of 6 or more, more preferably 5 or more, and further preferably 4 or more nucleotides at the 3′ end side. When they are designed not to form a complementary sequence within the above range at the 3′ end side, double strand formation occurring between nucleotide sequences can be prevented, and amplified nucleic acid fragments can be obtained with enhanced reproducibility.
The terms “complementary region” and “complementary sequence” refer to, for example, a region and a sequence exhibiting 80% to 100% identity (e.g., a region and a sequence each comprising 5 nucleotides in which 4 or 5 nucleotides are complementary to each other) or a region and a sequence exhibiting 90% to 100% identity (e.g., a region and a sequence each comprising 5 nucleotides in which 5 nucleotides are complementary to each other).
Further, nucleotides constituting a random primer are preferably designed to have a Tm value suitable for thermal cycle conditions (in particular, an annealing temperature) in a nucleic acid amplification reaction. A Tm value can be calculated by a conventional method, such as the nearest neighbor base pair approach, the Wallace method, or the GC % method, although a method of calculation is not particularly limited thereto. Specifically, nucleotides used for a random primer are preferably designed to have a Tm value of 10° C. to 85° C., more preferably 12° C. to 75° C., further preferably 14° C. to 70° C., and most preferably 16° C. to 65° C. By designing Tm values for nucleotides within the above range, amplified nucleic acid fragments can be obtained with enhanced reproducibility under given thermal cycle conditions (in particular, at a given annealing temperature) in a nucleic acid amplification reaction.
Furthermore, when different types of nucleotides constituting a random primer are designed, in particular, a variation for Tm among a plurality of nucleotides is preferably 50° C. or less, more preferably 45° C. or less, further preferably 40° C. or less, and most preferably 35° C. or less. When the nucleotides are designed such that a variation for Tm among a plurality of nucleotides falls within the above range, amplified nucleic acid fragments can be obtained with enhanced reproducibility under given thermal cycle conditions (in particular, at a given annealing temperature) in a nucleic acid amplification reaction.
[Nucleic Acid Amplification Reaction]
According to the method for producing a DNA library of the present invention, many amplification fragments are obtained via a nucleic acid amplification reaction conducted with the use of the random primer and genomic DNA as a template described above. In particular, in such a nucleic acid amplification reaction, the concentration of a random prime in a reaction solution is set higher than the primer concentration in a usual nucleic acid amplification reaction. Thus, many amplification fragments can be obtained using genomic DNA as a template while achieving high reproducibility. The thus obtained many amplification fragments can be used for a DNA library that can be applied to genotyping and the like.
A nucleic acid amplification reaction is a reaction for synthesizing amplification fragments in a reaction solution containing genomic DNA as a template, the above-mentioned random primers. DNA polymerase, deoxynucleoside triphosphate as a substrate (i.e., dNTP, which is a mixture of dATP, dCTP, dTITP, and dGTP), and a buffer under given thermal cycle conditions. As it is necessary to add Mg2+ at a given concentration to a reaction solution in a nucleic acid amplification reaction, the buffer of the above composition contains MgCl2. When the buffer does not contain MgCl2, MgCl2 is further added to the above composition.
In particular, in a nucleic acid amplification reaction, it is preferable to adequately set the concentration of a random primer in accordance with the nucleotide length of the random primer. When different types of nucleotides constitute random primers with different nucleotide lengths, the average of nucleotide lengths of random primers may be set as the nucleotide length (the average may be a simple average or the weight average taking the amount of nucleotides into account).
Specifically, a nucleic acid amplification reaction is conducted using a random primer comprising 9 to 30 nucleotides at a random primer concentration of 4 to 200 μM and preferably at 4 to 100 μM. Under such conditions, many amplified fragments, and in particular, many amplified fragments comprising 100 to 500 nucleotides via a nucleic acid amplification reaction can be obtained while achieving high reproducibility.
More specifically, when a random primer comprises 9 to 10 nucleotides, the random primer concentration is preferably 40 to 60 μM. When a random primer comprises 10 to 14 nucleotides, it is preferable that the random primer concentration satisfy 100 μM or less and y>3E+08x−6.974, provided that the nucleotide length of the random primer is represented by “y” and the concentration of the random primer is represented by “x.” When a random primer comprises 14 to 18 nucleotides, the random primer concentration is preferably 4 to 100 μM. When a random primer comprises 18 to 28 nucleotides, the random primer concentration satisfies preferably 4 μM or more and y<8E+08x−5.533. When a random primer comprises 28 to 29 nucleotides, the random primer concentration is preferably 6 to 10 μM. By setting the random primer concentration in accordance with the nucleotide length of a random primer as described above, many amplified fragments can be obtained with improved certainty while achieving high reproducibility.
As described in the Examples below, the above inequations (y>3E+08x−6.94 and y<8E+08x−5.533) are developed to be able to represent the random primer concentration at which many DNA fragments comprising 100 to 500 nucleotides can be obtained with favorable reproducibility as a result of thorough inspection of the correlation between the random primer length and the random primer concentration.
The amount of genomic DNA as a template in a nucleic acid amplification reaction is not particularly limited. However, it is preferably 0.1 to 1000 ng, more preferably 1 to 500 ng, further preferably 5 to 200 ng, and most preferably 10 to 100 ng, when the amount of the reaction solution is 50 μl. By setting the amount of genomic DNA as a template within the above range, many amplified fragments can be obtained without inhibiting the amplification reaction with a random primer, while achieving high reproducibility.
Genomic DNA can be prepared in accordance with a conventional technique without particular limitations. With the use of a commercially available kit, genomic DNA can be easily prepared from a target organism species. Genomic DNA extracted from an organism in accordance with a conventional technique or with the use of a commercially available kit may be used as is, genomic DNA extracted from an organism and purified may be used, or genomic DNA subjected to restriction enzyme treatment or ultrasonic treatment may be used.
DNA polymerase used in a nucleic acid amplification reaction is not particularly limited, and an enzyme having DNA polymerase activity under thermal cycle conditions for a nucleic acid amplification reaction can be used. Specifically, heat-stable DNA polymerase used for a general nucleic acid amplification reaction can be used. Examples of DNA polymerase include thermophilic bacteria-derived DNA polymerase such as Taq DNA polymerase, and hyperthermophilic Archaea-derived DNA polymerase such as KOD DNA polymerase or Pfu DNA polymerase. In a nucleic acid amplification reaction, it is particularly preferable to use Pfu DNA polymerase as DNA polymerase in combination with the random primer described above. With the use of such DNA polymerases, many amplified fragments can be obtained with improved certainty while achieving high reproducibility.
In a nucleic acid amplification reaction, the concentration of deoxynucleoside triphosphate as a substrate (i.e., dNTP, which is a mixture of dATP, dCTP, dTTP, and dGTP) is not particularly limited, and it can be 5 μM to 0.6 mM, preferably 10 μM to 0.4 mM, and more preferably 20 μM to 0.2 mM. By setting the concentration of dNTP as a substrate within such range, errors caused by incorrect incorporation by DNA polymerase can be prevented, and many amplified fragments can be obtained while achieving high reproducibility.
A buffer used in a nucleic acid amplification reaction is not particularly limited. For example, a solution comprising MgCl2 as described above, Tris-HCl (pH 8.3), and KCl can be used. The concentration of Mg2+ is not particularly limited. For example, it can be 0.1 to 4.0 mM, preferably 0.2 to 3.0 mM, more preferably 0.3 to 2.0 mM, and further preferably 0.5 to 1.5 mM. By designating the concentration of Mg2+ in the reaction solution within such range, many amplified fragments can be obtained while achieving high reproducibility.
Thermal cycling conditions of a nucleic acid amplification reaction are not particularly limited, and a common thermal cycle can be adopted. A specific example of a thermal cycle comprises a first step of thermal denaturation in which genomic DNA as a template is dissociated into single strands, a cycle comprising thermal denaturation, annealing, and extension repeated a plurality of times (e.g., 20 to 40 times), a step of extension for a given period of time according to need, and the final step of storage.
Thermal denaturation can be performed at, for example, 93° C. to 99° C., preferably 95° C. to 98° C., and more preferably 97° C. to 98° C. Annealing can be performed at, for example, 30° C. to 70° C., preferably 35° C. to 68° C., and more preferably 37° C. to 65° C., although it varies depending on the Tm value of a random primer. Extension can be performed at, for example, 70° C. to 76° C., preferably 71° C. to 75° C., and more preferably 72° C. to 74° C. Storage can be performed at, for example, 4° C.
The first step of thermal denaturation can be performed within the temperature range described above for a period of, for example, 5 seconds to 10 minutes, preferably 10 seconds to 5 minutes, and more preferably 30 seconds to 2 minutes. In the cycle comprising “thermal denaturation, annealing, and extension,” thermal denaturation can be carried out within the temperature range described above for a period of, for example, 2 seconds to 5 minutes, preferably 5 seconds to 2 minutes, and more preferably 10 seconds to 1 minute. In the cycle comprising “thermal denaturation, annealing, and extension,” annealing can be carried out within the temperature range described above for a period of, for example, 1 second to 3 minutes, preferably 3 seconds to 2 minutes, and more preferably 5 seconds to 1 minute. In the cycle comprising “thermal denaturation, annealing, and extension,” extension can be carried out within the temperature range described above for a period of, for example, 1 second to 3 minutes, preferably 3 seconds to 2 minutes, and more preferably 5 seconds to 1 minute.
According to the method for producing a DNA library of the present invention, amplified fragments may be obtained by a nucleic acid amplification reaction that employs a hot start method. The hot start method is intended to prevent mis-priming or non-specific amplification caused by primer-dimer formation prior the cycle comprising “thermal denaturation, annealing, and extension.” The hot start method involves the use of an enzyme in which DNA polymerase activity has been suppressed by binding an anti-DNA polymerase antibody thereto or chemical modification thereof. Thus, DNA polymerase activity can be suppressed and a non-specific reaction prior to the thermal cycle can be prevented. According to the hot start method, a temperature is set high in the first thermal cycle, DNA polymerase activity is thus recovered, and the subsequent nucleic acid amplification reaction is then allowed to proceed.
As described above, many amplified fragments can be obtained with the use of genomic DNA as a template and a random primer by conducting a nucleic acid amplification reaction with the use of a random primer comprising 9 to 30 nucleotides and setting the concentration thereof to 4 to 200 μM in a reaction solution. With the use of the random primer comprising 9 to 30 nucleotides while setting the concentration thereof to 4 to 200 μM in a reaction solution, a nucleic acid amplification reaction can be performed with very high reproducibility. According to the nucleic acid amplification reaction, specifically, many amplified fragments can be obtained while achieving very high reproducibility. Therefore, the thus obtained many amplified fragments can be used for a DNA library in genetic analysis targeting genomic DNA.
By performing a nucleic acid amplification reaction with the use of the random primer comprising 9 to 30 nucleotides and setting the concentration thereof in a reaction solution to 4 to 200 μM, in particular, many amplified fragments comprising about 100 to 500 nucleotides can be obtained with the use of genomic DNA as a template. Such many amplified fragments comprising about 100 to 500 nucleotides are suitable for mass analysis of nucleotide sequences with the use of, for example, a next-generation sequencer, and highly accurate sequence information can thus be obtained. According to the present invention, a DNA library including DNA fragments comprising about 100 to 500 nucleotides can be produced.
By performing a nucleic acid amplification reaction with the use of the random primer comprising 9 to 30 nucleotides and setting the concentration thereof to 4 to 200 μM in a reaction solution, in particular, amplified fragments can be obtained uniformly across genomic DNA. In other words, DNA fragments are amplified in a distributed manner across the genome but not in a localized manner in a specific region of genomic DNA in a nucleic acid amplification reaction with the use of such random primer. That is, according to the present invention, a DNA library can be produced uniformly across the entire genome.
After performing the nucleic acid amplification reaction using the above-mentioned random primer, restriction enzyme treatment, size selection treatment, sequence capture treatment, and the like can be performed on the obtained amplified fragments. By carrying out restriction enzyme treatment, size selection treatment, and sequence capture treatment on the amplified fragments, specific amplified fragments (a fragment having a specific restriction enzyme site, an amplified fragment with a specific size range, and an amplified fragment having a specific sequence) can be obtained from among the obtained amplified fragments. Then, specific amplified fragments obtained by these treatments can be used for a DNA library.
[Method of Genomic DNA Analysis]
With the use of the DNA library produced in the manner described above, genomic DNA analysis such as genotyping can be performed. Such DNA library has very high reproducibility, the size thereof is suitable for a next-generation sequencer, and it has uniformity across the entire genome. Accordingly, the DNA library can be used as a DNA marker (also referred to as “genetic marker” or “gene marker”). The term “DNA marker” refers to a wide range of characteristic nucleotide sequences present in genomic DNA. In addition, a DNA marker may be especially a nucleotide sequence on the genome serving as a marker associated with genetic traits. A DNA marker can be used for, for example, genotype identification, linkage mapping, gene mapping, breeding comprising a step of selection with the use of a marker, back crossing using a marker, quantitative trait locus mapping, bulked segregant analysis, variety identification, or discontinuous imbalance mapping.
For example, the nucleotide sequence of a DNA library prepared as described above is determined using a next generation sequencer or the like, and the presence or absence of a DNA marker can be confirmed based on the obtained nucleotide sequence.
As an example, the presence or absence of a DNA marker can be confirmed from the number of reads of the obtained nucleotide sequence. While a next-generation sequencer is not particularly limited, such sequencer is also referred to as a “second-generation sequencer,” and such sequencer is an apparatus for nucleotide sequencing that allows simultaneous determination of nucleotide sequences of several tens of millions of DNA fragments. The sequencing principle of a next-generation sequencer is not particularly limited. For example, sequencing can be carried out in accordance with a method in which sequencing is carried out while amplifying and synthesizing target DNA on flow cells by bridge PCR method and the sequencing-by-synthesis method, or in accordance with a method in which sequencing is carried out by emulsion PCR and the pyrosequencing method for assaying the amount of pyrophosphoric acids released upon DNA synthesis. More specific examples of next-generation sequencers include MiniSeq, MiSeq, NextSeq, HiSeq, and HiSeq X Series (Illumina, Inc.) and Roche 454 GS FLX sequencers (Roche).
In another example, the presence or absence of a DNA marker can be confirmed by comparing the nucleotide sequence obtained for the DNA library prepared as described above with the reference nucleotide sequence. Here, the reference nucleotide sequence means a known sequence as a reference, and it can be, for example, a known sequence stored in a database. That is, a DNA library is prepared as described above for a given organism, its nucleotide sequence is determined, and the nucleotide sequence of the DNA library is compared with the reference nucleotide sequence. A nucleotide sequence that differs from the reference nucleotide sequence can be designated as a DNA marker (a characteristic nucleotide sequence existing in the genomic DNA) related to the organism. For each specified DNA marker, the relevance to the genetic trait (phenotype) can be determined by further analysis according to a conventional method. In other words, a DNA marker related to a phenotype (sometimes referred to as a “selective marker”) can be identified from among the DNA markers identified as described above.
Furthermore, in another example, the presence or absence of a DNA marker can be confirmed by comparing the nucleotide sequence obtained for the DNA library prepared as described above with the nucleotide sequence of a DNA library prepared as described above using genomic DNA from a different organism or tissue. In other words, a DNA library is prepared as described above for each of two or more organisms or two different tissues, the nucleotide sequences thereof are determined, and the nucleotide sequences of the DNA libraries are compared with each other. Then, a nucleotide sequence that differs between the DNA libraries can be designated as a DNA marker (a characteristic nucleotide sequence existing in the genomic DNA) related to the sampled organism or tissue. For each specified DNA marker, the relevance to the genetic trait (phenotype) can be determined by further analysis according to a conventional method. In other words, a DNA marker related to a phenotype (sometimes referred to as a “selective marker”) can be identified from among the DNA markers identified as described above.
As an aside, it is also possible to design a pair of primers which specifically amplify the DNA marker based on the obtained nucleotide sequence. It is also possible to confirm the presence or absence of the DNA marker in the extracted genomic DNA by performing a nucleic acid amplification reaction using a pair of designed primers and genomic DNA extracted from a target organism as a template.
Alternatively, DNA libraries prepared as described above can be used for metagenomic analysis for examining a wide variety of microorganisms and the like, genome mutation analysis of somatic cells of tumor tissue or the like, genotyping using microarrays, determination and analysis of ploidy, calculation and analysis of the number of chromosomes, analysis of the increase and decrease of chromosomes, analysis of partial insertion/deletion/replication/translocation of chromosomes, analysis of contamination with foreign genome, parentage discrimination analysis, and testing and analysis of crossed seed purity.
[Application to Next Generation Sequencing Technology]
As described above, by conducting a nucleic acid amplification reaction with a random primer contained at a high concentration in a reaction solution, it is possible to obtain many amplified fragments with favorable reproducibility using genomic DNA as a template. Since each obtained amplified fragment has nucleotide sequence at both ends thereof which are the same as those of the random primer, it can be easily applied to the next generation sequence technology by utilizing the nucleotide sequence.
Specifically, as described above, a nucleic acid amplification reaction is conducted in a reaction solution (first reaction solution) containing genomic DNA and a random primer at a high concentration to obtain many amplified fragments (first DNA fragments) using the genomic DNA as a template. Next, a nucleic acid amplification reaction is conducted in a reaction solution (second reaction solution) containing the obtained many amplified fragments (first DNA fragments) and a primer designed based on the nucleotide sequence of the random primer (referred to as “next generation sequencer primer”). A next generation sequencer primer to be used herein is a nucleotide sequence including a region used for a nucleotide sequencing reaction. More specifically, for example, the next-generation sequencer primer may be a nucleotide sequence having a region necessary for a nucleotide sequencing reaction (sequence reaction) by a next-generation sequencer, in which the nucleotide sequence at the 3′ end of the primer is a nucleotide sequence having 70% or more identity, preferably 80% or more identity, more preferably 90% or more identity, still more preferably 95% or more identity, further preferably 97% or more identity, and most preferably 100% identity to the nucleotide sequence on the 5′ end side of the first DNA fragment.
Here, the “region used for a nucleotide sequencing reaction” included in a next-generation sequencer primer is not particularly limited because it varies depending on type of the next-generation sequencer. However, in the case of conducting a nucleotide sequencing reaction using a next-generation sequencer with a sequence primer, such region may be, for example, a nucleotide sequence complementary to the nucleotide sequence of the sequence primer. In a case in which a sequencing reaction is conducted by a next-generation sequencer using capture beads bound to given DNA, the “region used for a nucleotide sequencing reaction” refers to a nucleotide sequence complementary to the nucleotide sequence of the DNA bound to capture beads. Further, in a case in which a next-generation sequencer reads a sequence based on a current change when a DNA chain having a terminal hairpin loop passes through a protein having nano-sized pores, the “region used for a nucleotide sequencing reaction” may be a nucleotide sequence complementary to the nucleotide sequence forming the hairpin loop.
By designing the nucleotide sequence at the 3′ end of a next-generation sequencer primer as described above, the next-generation sequencer primer can be hybridized to the 3′ end of the first DNA fragment under stringent conditions, and the second DNA fragment can be amplified using the first DNA fragment as a template. Stringent conditions mean conditions under which a so-called specific hybrid is formed while a nonspecific hybrid is not formed. For example, such conditions can be appropriately determined with reference to Molecular Cloning: A Laboratory Manual (Third Edition). Specifically, stringency can be determined by setting the temperature and the salt concentration in a solution upon Southern hybridization, and the temperature and the salt concentration in a solution in the washing step of Southern hybridization. More specifically, for example, the sodium concentration is set to 25 to 500 mM and preferably 25 to 300 mM and the temperature is set to 42° C. to 68° C. and preferably 42° C. to 65° C. under stringent conditions. More specifically, the sodium concentration is 5×SSC (83 mM NaCl, 83 mM sodium citrate) and the temperature is 42° C.
In particular, when different types of random primers are used to obtain a first DNA fragment, next-generation sequencer primers may be prepared to correspond to all or some of random primers.
For example, in a case in which a set of different types of random primers (each having an arbitrary 3′-end sequence of several nucleotides) each comprising a common nucleotide sequence except several nucleotides (e.g., about 1 to 3 nucleotides) at the 3′ end is used, all of the obtained many first DNA fragments have a common 5′-end sequence. Accordingly, the 3′-end nucleotide sequence of a next generation sequencer primer is designated to be a nucleotide sequence having 70% or more identity to the 5′-end nucleotide sequence common to the first DNA fragments. By designing next-generation sequencer primers as described above, it is possible to obtain next generation sequencer primers corresponding to all random primers. By using such next generation sequencer primers, it is possible to amplify second DNA fragments using all of the first DNA fragments as templates.
Similarly, even in a case in which a set of different types of random primers (each having an arbitrary 3′-end sequence of several nucleotides) each comprising a common nucleotide sequence except several nucleotides (e.g., about 1 to 3 nucleotides) at the 3′ end is used, it is also possible to obtain second DNA fragments using some of the obtained many first DNA fragments as templates. Specifically, the 3′-end nucleotide sequence of a next generation sequencer primer is designated to be a nucleotide sequence having 70% or more identity to the 5′-end nucleotide sequence common to the first DNA fragments and the sequence comprising several nucleotides following the nucleotide sequence (corresponding to several nucleotides (arbitrary sequence) at the 3′ end of the random primer) such that second DNA fragments can be amplified using some of the first DNA fragments as templates.
Meanwhile, in a case in which first DNA fragments are obtained using different types of random primers each consisting of an arbitrary nucleotide sequence, it is possible to obtain second DNA fragments using different types of next-generation sequencer primers such that the second DNA fragments correspond to all of the first DNA fragments, or it is also possible to obtain second DNA fragments using different types of next-generation sequencer primers such that the second DNA fragments correspond to some of the first DNA fragments.
As described above, the second DNA fragments amplified using next-generation sequencer primers have a region necessary for a nucleotide sequencing reaction (sequence reaction) by a next-generation sequencer, which is included in the next-generation sequencer primers. The region necessary for a sequence reaction is not particularly limited as it varies depending on a next generation sequencer. For example, when a next-generation sequencer primer is used in a next-generation sequencer based on the principle that sequencing is carried out while amplifying and synthesizing target DNA on flow cells by bridge PCR method and the sequencing-by-synthesis method, the next-generation sequencer primer needs to contain a region necessary for bridge PCR and a region necessary for the sequencing-by-synthesis method. The region necessary for bridge PCR is a region that is hybridized to an oligonucleotide immobilized on flow cells and has a length of 9 nucleotides including the 5′ end of the next generation sequencer primer. In addition, a region necessary for the sequencing-by-synthesis method is a region to which a sequence primer used in a sequence reaction is hybridized, and is a region in the middle of the next generation sequencer primer.
In addition, a next-generation sequencer may be an Ion Torrent sequencer. In the case of using the Ion Torrent sequencer, a next-generation sequencer primer has a so-called ion adapter on the 5′ end side and binds to a particle for conducting emulsion PCR. In addition, in the Ion Torrent sequencer, particles coated with a template amplified by emulsion PCR are placed on an ion chip and subjected to a sequence reaction.
Here, a nucleic acid amplification reaction using a next-generation sequencer primer and a second reaction solution containing the first DNA is not particularly limited, and conventional conditions for nucleic acid amplification reaction can be applied. That is, the conditions in [Nucleic acid amplification reaction] described above can be used. For example, the second reaction solution contains first DNA fragments as templates, the above-described next-generation sequencer primer, DNA polymerase, deoxynucleoside triphosphate as a substrate (i.e., dNTP, which is a mixture of dATP, dCTP, dTTP, and dGTP), and a buffer.
In particular, the concentration of the next-generation sequencer primer can be set to 0.01 to 5.0 μM, preferably 0.1 to 2.5 μM, and most preferably 0.3 to 0.7 μM.
While the amount of the first DNA fragments serving as templates in a nucleic acid amplification reaction is not particularly limited, it is preferably 0.1 to 1000 ng, more preferably 1 to 500 ng, further preferably 5 to 200 ng, and most preferably 10 to 100 ng when the amount of the reaction solution is 50 μl.
A method for preparing first DNA fragments as templates is not particularly limited. In the method, the reaction solution obtained after the completion of the nucleic acid amplification reaction using the above-described random primers may be used as is, or the reaction solution may be used after purifying the first DNA fragments therefrom.
Regarding the type of DNA polymerase, the concentration of deoxynucleoside triphosphate as a substrate (dNTP, i.e., a mixture of dATP, dCTP, dTTP and dGTP), the buffer composition, and temperature cycle conditions used for the nucleic acid amplification reaction, the conditions in [Nucleic acid amplification reaction] described above can be used. In addition, in a nucleic acid amplification reaction using next-generation sequencer primers, a hot start method may be employed, or amplified fragments may be obtained by a nucleic acid amplification reaction.
As described above, by using the first DNA fragments obtained using random primers as templates and using the second DNA fragments amplified using next-generation sequencer primers, it is possible to readily prepare a DNA library that can be applied to a next-generation sequencer.
In the above examples, a DNA library is prepared using the first DNA fragments obtained using random primers as templates and amplifying the second DNA fragments using next-generation sequencer primers. However, the scope of the present invention is not limited to Such examples. For example, the DNA library according to the present invention may be prepared by amplifying second DNA fragments using first DNA fragments obtained using random primers as templates and further obtaining third DNA fragments using the second DNA fragments as templates and next-generation sequencer primers, thereby obtaining a DNA library of the third DNA fragments applicable to a next generation sequencer.
Similarly, in order to prepare a DNA library applicable to a next-generation sequencer, after a nucleic acid amplification reaction using second DNA fragments as templates, a nucleic acid amplification reaction is repeatedly conducted using the obtained DNA fragments as templates, and next-generation sequencer primers are used for the final nucleic acid amplification reaction. In such case, the number of nucleic acid amplification reactions to be repeated is not particularly limited, but it is 2 to 10 times, preferably 2 to 5 times, and more preferably 2 to 3 times.
Hereafter, the present invention is described in greater detail with reference to the Examples below, although the scope of the present invention is not limited to these Examples.
In this Example, a DNA library was prepared via PCR using genomic DNAs extracted from various types of organism species as templates and various sets of random primers in accordance with the flow chart shown in
In this Example, genomic DNAs were extracted from the sugarcane varieties NiF8 and Ni9, 22 hybrid progeny lines thereof, and the rice variety Nipponbare using the DNeasy Plant Mini Kit (QIAGEN), and the extracted genomic DNAs were purified. The purified genomic DNAs were used as NiF8-derived genomic DNA, Ni9-derived genomic DNA, genomic DNAs from 22 hybrid progeny lines, and Nipponbare-derived genomic DNA, respectively. In this Example, Human Genomic DNA was purchased as human DNA from TakaraBio and used as human-derived genomic DNA.
In order to design random primers, the GC content was set between 20% and 70%, and the number of consecutive nucleotides was adjusted to 5 or less. The nucleotide length was set at 16 levels (i.e., 8, 9, 10, 11, 12, 14, 16, 18, 20, 22, 24, 26, 28, 29, 30, and 35 nucleotides). For each nucleotide length, 96 types of nucleotide sequences were designed, and a set of 96 types of random primers was prepared for each nucleotide length. Concerning 10-nucleotide primers, 6 sets (each comprising 96 types of random primers) were designed (these 6 sets are referred to as 10-nucleotide primer A to 10-nucleotide primer F). In this Example, specifically, 21 different sets of random primers were prepared.
Tables 1 to 21 show nucleotide sequences of random primers contained in these 21 different sets of random primers.
To the genomic DNA described in 2, above (30 ng, NiF8-derived genomic DNA), random primers (final concentration: 0.6 μM; 10-nucleotide primer A), a 0.2 mM dNTP mixture, 1.0 mM MgCl2, and 1.25 units of DNA polymerase (PrimeSTAR, TAKARA) were added, and a reaction solution was prepared while adjusting the final reaction level to 50 μl. PCR was carried out under thermal cycle conditions comprising 98° C. for 2 minutes and 30 cycles of 98° C. for 10 seconds, 50° C. for 15 seconds, and 72° C. for 20 seconds, followed by storage at 4° C. In this example, numerous nucleic acid fragments obtained via PCR using random primers, including the standard PCR described above, are referred to as a DNA library.
The DNA library obtained in 3.1.2 above was purified with the use of the MinElute PCR Purification Kit (QIAGEN) and subjected to electrophoresis with the use of the Agilent 2100 bioanalyzer (Agilent Technologies) to obtain a fluorescence unit (FU).
To the genomic DNA described in 2, above (30 ng, NiF8-derived genomic DNA), random primers (final concentration: 0.6 μM, 10-nucleotide primer A), a 0.2 mM dNTP mixture, 1.0 mM MgCl2, and 1.25 units of DNA polymerase (PrimeSTAR, TAKARA) were added, and a reaction solution was prepared while adjusting the final reaction level to 50 μl. PCR was carried out under thermal cycle conditions comprising 98° C. for 2 minutes and 30 cycles of 98° C. for 10 seconds, different annealing temperatures for 15 seconds, and 72° C. for 20 seconds, followed by storage at 4° C. In this Example, 37° C., 40° C., and 45° C. were examined as annealing temperatures. The DNA library obtained in this experiment was subjected to purification and electrophoresis in the same manner as in 3.1.3.
To the genomic DNA described in 2, above (30 ng, NiF8-derived genomic DNA), random primers (final concentration: 0.6 μM, 10-nucleotide primer A), a 0.2 mM dNTP mixture, 1.0 mM MgCl2, and 2.5 units or 12.5 units of DNA polymerase (PrimeSTAR, TAKARA) were added, and a reaction solution was prepared while adjusting the final reaction level to 50 pd. PCR was carried out under thermal cycle conditions comprising 98° C. for 2 minutes and 30 cycles of 98° C. for 10 seconds, 50° C. for 15 seconds, and 72° C. for 20 seconds, followed by storage at 4° C. The DNA library obtained in this experiment was subjected to purification and electrophoresis in the same manner as in 3.1.3.
To the genomic DNA described in 2, above (30 ng, NiF8-derived genomic DNA), random primers (final concentration: 0.6 μM, 10-nucleotide primer A), a 0.2 mM dNTP mixture, MgCl2 at a given concentration, and 1.25 units of DNA polymerase (PrimeSTAR, TAKARA) were added, and a reaction solution was prepared while adjusting the final reaction level to 50 μl. PCR was carried out under thermal cycle conditions comprising 98° C. for 2 minutes and 30 cycles of 98° C. for 10 seconds, 50° C. for 15 seconds, and 72° C. for 20 seconds, followed by storage at 4° C. In this Example, two-, three- and four-fold concentrations of a usual concentration were examined as MgCl2 concentrations. The DNA library obtained in this experiment was subjected to purification and electrophoresis in the same manner as in 3.1.3.
To the genomic DNA described in 2, above (30 ng. NiF8-derived genomic DNA), random primers (final concentration: 0.6 μM), a 0.2 mM dNTP mixture, 1.0 mM MgCl2, and 1.25 units of DNA polymerase (PrimeSTAR, TAKARA) were added, and a reaction solution was prepared while adjusting the final reaction level to 50 μl. PCR was carried out under thermal cycle conditions comprising 98° C. for 2 minutes and 30 cycles of 98° C. for 10 seconds, 50° C. for 15 seconds, and 72° C. for 20 seconds, followed by storage at 4° C. In this Example, primers having 8 nucleotides (Table 7), 9 nucleotides (Table 8), 11 nucleotides (Table 9), 12 nucleotides (Table 10), 14 nucleotides (Table 11), 16 nucleotides (Table 12), 18 nucleotides (Table 13), and 20 nucleotides (Table 14) were examined as random primers. The DNA library obtained in this experiment was subjected to purification and electrophoresis in the same manner as in 3.1.3.
To the genomic DNA described in 2, above (30 ng, NiF8-derived genomic DNA), random primers at a given concentration (10-nucleotide primer A), a 0.2 mM dNTP mixture, 1.0 mM MgCl2, and 1.25 units of DNA polymerase (PrimeSTAR, TAKARA) were added, and a reaction solution was prepared while adjusting the final reaction level to 50 μl. PCR was carried out under thermal cycle conditions comprising 98° C. for 2 minutes and 30 cycles of 98° C. for 10 seconds, 50° C. for 15 seconds, and 72° C. for 20 seconds, followed by storage at 4° C. In this Example, 2, 4, 6, 8, 10, 20, 40, 60, 100, 200, 300, 400, 500, 600, 700, 800, 900, and 1000 μM were examined as random concentrations. The DNA library obtained in this experiment was subjected to purification and electrophoresis in the same manner as in 3.1.3. Also, in this experiment, the reproducibility of the repeated data was evaluated on the basis of the Spearman's rank correlation (p>0.9).
To the genomic DNA described in 2, above (30 ng, NiF8-derived genomic DNA), random primers (final concentration: 60 μM, 10-nucleotide primer A), a 0.2 mM dNTP mixture, 1.0 mM MgCl2, and 1.25 units of DNA polymerase (PrimeSTAR, TAKARA) were added, and a reaction solution was prepared while adjusting the final reaction level to 50 μl. PCR was carried out under thermal cycle conditions comprising 98° C. for 2 minutes and 30 cycles of 98° C. for 10 seconds, 50° C. for 15 seconds, and 72° C. for 20 seconds, followed by storage at 4° C. The DNA library obtained in this experiment was subjected to purification and electrophoresis in the same manner as in 3.1.3.
From the DNA library obtained in 3.2.1, a sequence library for MiSeq analysis was prepared using the KAPA Library Preparation Kit (Roche).
With the use of the MiSeq Reagent Kit V2 500 Cycle (Illumina), the sequence library for MiSeq analysis obtained in 3.2.2 was analyzed via 100 base paired-end sequencing.
Random primer sequence information was deleted from the read data obtained in 3.2.3, and the read patterns were identified. The number of reads was counted for each read pattern, the number of reads of the repeated analyses, and the reproducibility was evaluated using the correlational coefficient.
To the genomic DNA described in 2, above (30 ng, Nipponbare-derived genomic DNA), random primers (final concentration: 60 μM, 10-nucleotide primer A), a 0.2 mM dNTP mixture, 1.0 mM MgCl2, and 1.25 units of DNA polymerase (PrimeSTAR, TAKARA) were added, and a reaction solution was prepared while adjusting the final reaction level to 50 μl. PCR was carried out under thermal cycle conditions comprising 98° C. for 2 minutes and 30 cycles of 98° C. for 10 seconds, 50° C. for 15 seconds, and 72° C. for 20 seconds, followed by storage at 4° C. The DNA library obtained in this experiment was subjected to purification and electrophoresis in the same manner as in 3.1.3.
Preparation of a sequence library using the DNA library prepared from Nipponbare-derived genomic DNA, MiSeq analysis, and analysis of the read data were performed in accordance with the methods described in 3.2.2, 3.2.3, and 3.2.4. respectively.
The read patterns obtained in 3.3.2 were mapped to the genomic information of Nipponbare (NC_008394 to NC_008405) using bowtie2, and the genomic positions of the read patterns were identified.
On the basis of the positional information of the read patterns identified in 3.3.3, the sequences of random primers were compared with the genome sequences to which such random primers would anneal, and the number of mismatches was determined.
To the genomic DNA described in 2, above (30 ng, NiF8-derived genomic DNA, Ni9-derived genomic DNA, hybrid progeny-derived genomic DNA, or Nipponbare-derived genomic DNA), random primers (final concentration: 60 μM, 10-nucleotide primer A), a 0.2 mM dNTP mixture, 1.0 mM MgCl2, and 1.25 units of DNA polymerase (PrimeSTAR, TAKARA) were added, and a reaction solution was prepared while adjusting the final reaction level to 50 μl. PCR was carried out under thermal cycle conditions comprising 98° C. for 2 minutes and 30 cycles of 98° C. for 10 seconds, 50° C. for 15 seconds, and 72° C. for 20 seconds, followed by storage at 4° C. The DNA library obtained in this experiment was subjected to purification and electrophoresis in the same manner as in 3.1.3.
Analysis of the DNA libraries prepared in 3.4.1 was consigned to TakaraBio under conditions in which the number of samples was 16 per lane via 100 base paired-end sequencing, and the read data were obtained.
Random primer sequence information was deleted from the read data obtained in 3.4.2, and the read patterns were identified. The number of reads was counted for each read pattern.
On the basis of the read patterns and the number of reads obtained as a results of analysis conducted in 3.4.3. polymorphisms peculiar to NiF8 and Ni9 were detected, and the read patterns thereof were designated as markers. On the basis of the number of reads, the genotypes of the 22 hybrid progeny lines were identified. The accuracy for genotype identification was evaluated on the basis of the reproducibility attained by the repeated data concerning the 22 hybrid progeny lines.
3.5 Experiment for Confirmation with PCR Marker
Primers were designed for a total of 6 markers (i.e., 3 NiF8 markers and 3 Ni9 markers) among the markers identified in 3.4.4 based on the marker sequence information obtained via paired-end sequencing (Table 22).
With the use of the TaKaRa Multiplex PCR Assay Kit Ver. 2 (TAKARA) and the genomic DNA described in 2, above (15 ng. NiF8-derived genomic DNA, Ni9-derived genomic DNA, or hybrid progeny-derived genomic DNA) as a template, 1.25 μl of Multiplex PCR enzyme mix, 12.5 μl of 2× Multiplex PCR buffer, and the 0.4 μM primer designed in 3.5.1 were added, and a reaction solution was prepared while adjusting the final reaction level to 25 μl. PCR was carried out under thermal cycle conditions comprising 94° C. for 1 minute, 30 cycles of 94° C. for 30 seconds, 60° C. for 30 seconds, and 72° C. for 30 seconds, and retention at 72° C. for 10 minutes, followed by storage at 4° C. The amplified DNA fragment was subjected to electrophoresis with the use of TapeStation (Agilent Technologies).
On the basis of the results of electrophoresis obtained in 3.5.2, the genotype of the marker was identified on the basis of the presence or absence of a band, and the results were compared with the number of reads of the marker.
To the genomic DNA described in 2, above (30 ng. NiF8-derived genomic DNA), random primers having given lengths (final concentration: 10 μM), a 0.2 mM dNTP mixture, 1.0 mM MgCl2, and 1.25 units of DNA polymerase (PrimeSTAR, TAKARA) were added, and a reaction solution was prepared while adjusting the final reaction level to 50 μl. In this experiment, 9 nucleotides (Table 8), 10 nucleotides (Table 1, 10-nucleotide primer A), 11 nucleotides (Table 9), 12 nucleotides (Table 10), 14 nucleotides (Table 11), 16 nucleotides (Table 12), 18 nucleotides (Table 13), and 20 nucleotides (Table 14) were examined as random primer lengths. PCR was carried out under thermal cycling conditions comprising 98° C. for 2 minutes and 30 cycles of 98° C. for 10 seconds, 50° C. for 15 seconds, and 72° C. for 20 seconds, followed by storage at 4° C. In the reaction system using random primers each comprising 10 or more nucleotides, PCR was carried out under thermal cycle conditions comprising 98° C. for 2 minutes and 30 cycles of 98° C. for 10 seconds, 50° C. for 15 seconds, and 72° C. for 20 seconds, followed by storage at 4° C. The DNA library obtained in this experiment was subjected to purification and electrophoresis in the same manner as in 3.1.3.
To the genomic DNA described in 2, above (30 ng, NiF8-derived genomic DNA), random primers of a given length were added to a given concentration therein, a 0.2 mM dNTP mixture, 1.0 mM MgCl2, and 1.25 units of DNA polymerase (PrimeSTAR, TAKARA) were added thereto, and a reaction solution was prepared while adjusting the final reaction level to 50 μl. In this experiment, random primers comprising 8 to 35 nucleotides shown in Tables 1 to 21 were examined, and the random primer concentration from 0.6 to 300 μM was examined.
In the reaction system using random primers comprising 8 nucleotides and 9 nucleotides, PCR was carried out under thermal cycle conditions comprising 98° C. for 2 minutes and 30 cycles of 98° C. for 10 seconds, 37° C. for 15 seconds, and 72° C. for 20 seconds, followed by storage at 4° C. In the reaction system using a random primer of 10 or more nucleotides, PCR was carried out under thermal cycle conditions comprising 98° C. for 2 minutes and 30 cycles of 98° C. for 10 seconds, 50° C. for 15 seconds, and 72° C. for 20 seconds, followed by storage at 4° C. The DNA library obtained in this experiment was subjected to purification and electrophoresis in the same manner as in 3.1.3. Also, the reproducibility of the repeated data was evaluated on the basis of the Spearman's rank correlation (p>0.9).
To the genomic DNA described in 2, above (30 ng, NiF8-derived genomic DNA), 1, 2, 3, 12, 24, or 48 types of random primers selected from the 96 types of random primers comprising 10 nucleotides (10-nucleotide primer A) shown in Table 1 were added to the final concentration of 60 μM therein, a 0.2 mM dNTP mixture, 1.0 mM MgCl2, and 1.25 units of DNA polymerase (PrimeSTAR, TAKARA) were added thereto, and a reaction solution was prepared while adjusting the final reaction level to 50 μl. In this experiment, as the 1, 2, 3, 12, 24, or 48 types of random primers, random primers were selected successively from No. 1 shown in Table 1, and the selected primers were then examined. PCR was carried out under thermal cycle conditions comprising 98° C. for 2 minutes and 30 cycles of 98° C. for 10 seconds, 50° C. for 15 seconds, and 72° C. for 20 seconds, followed by storage at 4° C. The DNA library obtained in this experiment was subjected to purification and electrophoresis in the same manner as in 3.1.3. Also, the reproducibility of the repeated data was evaluated on the basis of the Spearman's rank correlation (p>0.9).
To the genomic DNA described in 2, above (30 ng, NiF8-derived genomic DNA), a set of primers selected from the 5 sets of random primers shown in Tables 2 to 6 was added to the final concentration of 60 μM therein, a 0.2 mM dNTP mixture, 1.0 mM MgCl2, and 1.25 units of DNA polymerase (PrimeSTAR, TAKARA) were added thereto, and a reaction solution was prepared while adjusting the final reaction level to 50 μl. PCR was carried out under thermal cycle conditions comprising 98° C. for 2 minutes and 30 cycles of 98° C. for 10 seconds, 50° C. for 15 seconds, and 72° C. for 20 seconds, followed by storage at 4° C. The DNA library obtained in this experiment was subjected to purification and electrophoresis in the same manner as in 3.1.3. Also, the reproducibility of the repeated data was evaluated on the basis of the Spearman's rank correlation (p>0.9).
To the genomic DNA described in 2, above (30 ng, human-derived genomic DNA), random primers (final concentration: 60 μM, 10-nucleotide primer A), a 0.2 mM dNTP mixture, 1.0 mM MgCl2, and 1.25 units of DNA polymerase (PrimeSTAR, TAKARA) were added, and a reaction solution was prepared while adjusting the final reaction level to 50 μl. PCR was carried out under thermal cycle conditions comprising 98° C. for 2 minutes and 30 cycles of 98° C. for 10 seconds, 50° C. for 15 seconds, and 72° C. for 20 seconds, followed by storage at 4° C. The DNA library obtained in this experiment was subjected to purification and electrophoresis in the same manner as in 3.1.3. Also, the reproducibility of the repeated data was evaluated on the basis of the Spearman's rank correlation (p>0.9).
When PCR was conducted with the use of random primers in accordance with conventional PCR conditions (3.1.2 described above), the amplified DNA library size was as large as 2 kbp or more, but amplification of the DNA library of a target size (i.e., 100-bp to 500-bp) was not observed (
The correlation between the annealing temperature (3.1.4 above), the enzyme amount (3.1.5 above), the MgCl2 concentration (3.1.6 above), the primer length (3.1.7 above), and the primer concentration (3.18 above), which are considered to affect PCR specificity, and the DNA library size were examined.
The results of experiment described in 3.1.8 are summarized in Table 23.
With the use of random primers comprising 10 nucleotides, as shown in
In order to confirm the reproducibility for DNA library production, as described in 3.2 above, the DNA library amplified with the use of the genomic DNA extracted from NiF8 as a template and random primers was analyzed with the use of a next-generation sequencer (MiSeq), and the results are shown in
As described in 3.3 above, a DNA library was prepared with the use of genomic DNA extracted from the rice variety Nipponbare, the genomic information of which has been disclosed, as a template, and random primers and subjected to electrophoresis, and the results are shown in
As described in 3.3.3, the obtained read pattern was mapped to the genomic information of Nipponbare. As a result, DNA fragments were found to be evenly amplified throughout the genome at intervals of 6.2 kbp (
As described in 3.4. DNA libraries of the sugarcane varieties NiF8 and Ni9 and 22 hybrid progeny lines were produced with the use of random primers, the resulting DNA libraries were analyzed with the next-generation sequencer (HiSeq), the polymorphisms of the parent varieties were detected, and the genotypes of the hybrid progenies were identified on the basis of the read data. Table 24 shows the results.
As shown in Table 24, 8,683 markers for NiF8 and 11,655 markers for Ni9; that is, a total of 20,338 markers, were produced. In addition, reproducibility for genotype identification of hybrid progeny lines was as high as 99.97%. This indicates that the accuracy for genotype identification is very high. In particular, sugarcane is polyploid (8x+n), the number of chromosomes is as large as 100 to 130, and the genome size is as large as 10 Gbp, which is at least 3 times greater than that of humans. Accordingly, it is very difficult to identify the genotype throughout the genomic DNA. As described above, numerous markers can be produced with the use of random primers, and the sugarcane genotype can thus be identified with high accuracy.
4.5 Experiment for Confirmation with PCR Marker
As described in 3.5 above, the sugarcane varieties NiF8 and Ni9 and 22 hybrid progeny lines were subjected to PCR with the use of the primers shown in Table 22, genotypes were identified via electrophoresis, and the results were compared with the number of reads.
As shown in
As described in 3.6.1, the results of DNA library production with the use of random primers comprising 9 nucleotides (Table 8), 10 nucleotides (Table 1, 10-nucleotide primer A), 11 nucleotides (Table 9), 12 nucleotides (Table 10), 14 nucleotides (Table 11), 16 nucleotides (Table 12), 18 nucleotides (Table 13), and 20 nucleotides (Table 14) are shown in
When random primers were used at a high concentration of 10.0 μM, which is 13.3 times greater than the usual level, as shown in
In order to elucidate the correlation between the density and the length of random primers, as described in 3.6.2 above, PCR was carried out with the use of random primers comprising 8 to 35 nucleotides at the concentration of 0.6 to 300 μM, so as to produce a DNA library. The results are shown in Table 26.
As shown in Table 26, it was found that a low-molecular-weight (100 to 500 nucleotides) DNA fragment could be amplified with high reproducibility with the use of random primers comprising 9 to 30 nucleotides at 4.0 to 200 μM. In particular, it was confirmed that low-molecular-weight (100 to 500 nucleotides) DNA fragments could be amplified assuredly with high reproducibility with the use of random primers comprising 9 to 30 nucleotides at 4.0 to 100 μM.
The results shown in Table 26 are examined in greater detail. As a result, the correlation between the length and the concentration of random primers is found to be preferably within a range surrounded by a frame as shown in
By prescribing the number of nucleotides and the concentration of random primers within given ranges as described above, it was found that low-molecular-weight (100 to 500 nucleotides) DNA fragments could be amplified with high reproducibility. For example, the accuracy of the data obtained via analysis of high-molecular-weight DNA fragments with the use of a next-generation sequencer is known to deteriorate to a significant extent. As described in this Example, the number of nucleotides and the concentration of random primers may be prescribed within given ranges, so that a DNA library with a molecular size suitable for analysis with a next-generation sequencer can be produced with satisfactory reproducibility, and such DNA library can be suitable for marker analysis with the use of a next-generation sequencer.
As described in 3.7 above, 1, 2, 3, 12, 24, or 48 types of random primers (concentration: 60 μM) were used to produce a DNA library, and the results are shown in
As shown in
As described in 3.8 above, DNA libraries were produced with the use of sets of random primers shown in Tables 2 to 6 (i.e., 10-nucleotide primer B, 10-nucleotide primer C, 10-nucleotide primer D, 10-nucleotide primer E, and 10-nucleotide primer F), and the results are shown in
As shown in
As described in 3.9 above, a DNA library was produced with the use of human-derived genomic DNA and random primers at a final concentration of 60 μM (10-nucleotide primer A), and the results are shown in
In this Example, first DNA fragments were prepared by PCR using genomic DNA as a template and random primers according to the schematic diagrams shown in
In this Example, genomic DNAs were extracted from the sugarcane variety NiF8 and the rice variety Nipponbare using the DNeasy Plant Mini Kit (QIAGEN), and the extracted genomic DNAs were purified. The purified genomic DNAs were used as NiF8-derived genomic DNA and Nipponbare-derived genomic DNA, respectively.
In this Example, random primers were designed based on 3′-end 10 nucleotides of the next-generation sequencer adapter (Nextera adapter, Illumina, Inc.). Specifically, in this Example, GTTACACACG (SEQ ID NO: 2041, 10-nucleotide G) was used as a random primer. In addition, next-generation sequencer primers were designed based on the sequence information on the Nextera adapter of Illumina, Inc. in the above manner (Table 29).
A dNTP mixture at a final concentration of 0.2 mM, MgCl2 at a final concentration of 1.0 mM, and DNA Polymerase (TAKARA, PrimeSTAR) at a final concentration of 1.25 units, and a random primer (10-nucleotide G) at a final concentration of 60 μM were added to NiF8-derived genomic DNA (30 ng) described in 2, above. A DNA library (first DNA fragments) was prepared by PCR (treatment at 98° C. for 2 minutes, reaction for 30 cycles of 98° C. for 10 seconds, 50° C. for 15 seconds, and 72° C. for 20 seconds, and storage at 4° C.) in a final reaction volume of 50 μl.
The DNA library obtained in 3.1.2 above was purified with the use of the MinElute PCR Purification Kit (QIAGEN) and subjected to electrophoresis with the use of the Agilent 2100 bioanalyzer (Technologies) to obtain a fluorescence unit (FU). Also, the reproducibility of the repeated data was evaluated on the basis of the Spearman's rank correlation (p>0.9).
A dNTP mixture at a final concentration of 0.2 mM, MgCl2 at a final concentration of 1.0 mM, DNA Polymerase (TAKARA, PrimeSTAR) at a final concentration of 1.25 units, and a next-generation sequencer primer at a final concentration of 0.5 μM were added to the first DNA fragment (100 ng) purified in 3.1.3 above. A next-generation sequencer DNA library (second DNA fragments) was prepared by PCR (treatment at 95° C. for 2 minutes, reaction for 25 cycles of 98° C. for 15 seconds, 55° C. for 15 seconds, 72° C. for 20 seconds, treatment at 72° C. for 1 minutes, and storage at 4° C.) in a final reaction volume of 50 μl. The DNA library for a next-generation sequencer was subjected to purification and electrophoresis in the same manner as in 3.1.3.
The next-generation sequencer DNA library (a second DNA fragment) in 3.1.4 above was analyzed by MiSeq via 100 base paired-end sequencing using MiSeq Reagent Kit V2 500 Cycle (Illumina).
The read patterns were identified from the read data obtained in 3.1.5. The number of reads was counted for each read pattern, the number of reads of the repeated analyses, and the reproducibility was evaluated using the correlational coefficient.
In this Example, random primers were designed based on 10 nucleotides of the 3′ end of the next-generation sequencer adapter Nextera adapter of Illumina, Inc. That is, in this Example, a sequence of 10 nucleotides positioned at the 3′ end of the Nextera adapter and 16 types of nucleotide sequences prepared by adding an arbitrary nucleotide sequence of 2 nucleotides to the 3′ end of the sequence of 10 nucleotides to results in a full length of 12 nucleotides were designed as random primers (Table 30, 12-nucleotide B).
In addition, in this Example, a next-generation sequencer primer designed based on the sequence information on the Nextera adapter of Illumina. Inc. in the same manner as in 3.1.1.
A dNTP mixture at a final concentration of 0.2 mM, MgCl2 at a final concentration of 1.0 mM, and DNA Polymerase (TAKARA, PrimeSTAR) at a final concentration of 1.25 units, and a random primer (12-nucleotide B) at a concentration of 40 μM were added to Nipponbare-derived genomic DNA (30 ng) described in 2, above. A DNA library (first DNA fragments) was prepared by PCR (treatment at 98° C. for 2 minutes, reaction for 30 cycles of 98° C. for 10 seconds, 50° C. for 15 seconds, 72° C. for 20 seconds, and storage at 4° C.) in a final reaction volume of 50 μl.
The DNA library obtained in 3.2.2 above was purified with the use of the MinElute PCR Purification Kit (QIAGEN) and subjected to electrophoresis with the use of the Agilent 2100 bioanalyzer (Technologies) to obtain a fluorescence unit (FU). Also, the reproducibility of the repeated data was evaluated on the basis of the Spearman's rank correlation (p>0.9).
A dNTP mixture at a final concentration of 0.2 mM. MgCl2 at a final concentration of 1.0 mM, DNA Polymerase (TAKARA, PrimeSTAR) at a final concentration of 1.25 units, and a next-generation sequencer primer at a concentration of 0.5 j±M were added to the first DNA fragment (100 ng) purified in 3.2.3 above. A next-generation sequencer DNA library (second DNA fragments) was prepared by PCR (treatment at 95° C. for 2 minutes, reaction for 25 cycles of 98° C. for 15 seconds, 55° C. for 15 seconds, 72° C. for 20 seconds, treatment at 72° C. for 1 minutes, and storage at 4° C.) in a final reaction volume of 50 μl. Purification of the DNA library for next-generation sequencers and electrophoresis were conducted in the same manner as in 3.1.3.
The next-generation sequencer DNA library (second DNA fragment) in 3.2.4 above was analyzed by MiSeq via 100 base paired-end sequencing using MiSeq Reagent Kit V2 500 Cycle (Illumina).
The read patterns in 3.2.5 were mapped to the genomic information of Nipponbare (NC_008394 to NC_008405) using bowtie2, the degree of consistency between the random primer sequence and genomic DNA was confirmed. The read patterns were identified from the read data obtained in 3.2.5. The number of reads was counted for each read pattern, the number of reads of the repeated analyses, and the reproducibility was evaluated using the correlational coefficient.
4. Results and Examination 4.1 Results of examination of the sugarcane variety NiF8
Next,
In addition, as a result of analysis of the DNA library (second DA fragment) by next-generation sequencer MiSeq, 3.5-Gbp read data and 3.6-Gbp read data were obtained. The values indicating accuracy of MiSeq data (>=Q30) were 93.3% and 93.1%. Since the values recommended by the manufacturer were 3.0 Gbp or more for read data and 85.0% or more for >=Q30, the next-generation sequencer DNA library (second DNA fragments) prepared in this Example was considered to be applicable to next-generation sequencer analysis. In order to confirm reproducibility, the number of reads of the repeated analyses were compared for 34,613 read patterns obtained by MiSeq.
As described above, a DNA library (first DNA fragments) was obtained by conducting PCR using random primer comprising 10 nucleotides at the 3′ end of a next-generation sequencer adapter (Nextera Adaptor, Illumina, Inc.) at a high concentration, and then. PCR was conducted using a next-generation sequencer primer comprising the sequence of Nextera Adaptor. Accordingly, it was possible to conveniently produce a next-generation sequencer DNA library (second DNA fragments) comprising many fragments with favorable reproducibility.
Next,
In addition, as a result of analysis of the obtained DNA library (second DNA fragments) by next-generation sequencer MiSeq, 4.0-Gbp read data and 3.8-Gbp read data were obtained. The values indicating accuracy of MiSeq data (>=Q30) were 94.0% and 95.3%. As in the case of 4.1.1, in view of the above results, the next-generation sequencer DNA library (second DNA fragments) prepared in this Example was considered to be applicable to next-generation sequencer analysis.
As described above, a DNA library (first DNA fragments) was obtained by conducting PCR using 16 types of random primers having a full length of 12 nucleotides obtained by adding an arbitrary sequence of 2 nucleotides to the 3′ end of 10 nucleotides at high concentrations, where the 10 nucleotides position at the 3′ end of a next-generation sequencer adapter (Nextera Adaptor, Illumina, Inc.) and then, PCR was conducted using a primer comprising the sequence of Nextera Adaptor. Accordingly, it was possible to conveniently produce a next-generation sequencer DNA library (second DNA fragments) comprising many fragments with favorable reproducibility.
In this Example, genomic DNA was extracted from the rice variety Nipponbare using the DNeasy Plant Mini kit (QIAGEN), and the extracted genomic DNAs were purified. The purified genomic DNA was used as Nipponbare-derived genomic DNA.
To the genomic DNA described in 1.1 above (30 ng, Nipponbare-derived genomic DNA), random primers (final concentration: 60 μM, 10-nucleotide primer A), a 0.2 mM dNTP mixture, 1.0 mM MgCl2, and 1.25 units of DNA polymerase (PrimeSTAR, TAKARA) were added, and a reaction solution was prepared while adjusting the final reaction level to 50 μl. PCR was carried out under thermal cycle conditions comprising 98° C. for 2 minutes and 30 cycles of 98° C. for 10 seconds, 50° C. for 15 seconds, and 72° C. for 20 seconds, followed by storage at 4° C. The DNA library obtained in this experiment was purified by the MinElute PCR Purification Kit (QIAGEN).
From the DNA library obtained in 1.2, a sequence library for MiSeq analysis was prepared using the KAPA Library Preparation Kit (Roche).
With the use of the MiSeq Reagent Kit V2 500 Cycle (Illumina), the sequence library for MiSeq analysis obtained in 1.3 was analyzed via 100 base paired-end sequencing.
Random primer sequence information was deleted from the read data obtained in 1.4, and nucleotide sequence information of each read was identified. Mapping of nucleotide sequence information of each read on genomic information of rice Kasalath (kasalath_genome) was conducted by bowtie2, and single nucleotide polymorphism (SNP) and insertion or deletion mutation (InDel) were identified as markers for each chromosome.
Table 31 shows the results of mapping of nucleotide sequence information of the DNA library prepared using random primers based on the genomic DNA from the rice variety Nipponbare on the genomic information of rice Kasalath.
As shown in Table 31, it was possible to identify 2,694 to 5,579 SNPs (3,812.6 SNPs on average, 45,751 SNPs in total) for each chromosome. As shown in Table 31, it was also possible to identify insertion/deletion (InDel) of 227 to 569 SNPs (349.3 SNPs on average, 4,191 SNPs in total) for each chromosome. The above results revealed that it is possible to identify a DNA marker as a characteristic nucleotide sequence present in the genome of a test organism by comparing nucleotide sequence information on a DNA library prepared using random primers and known nucleotide sequence information in the manner shown in this Example.
All publications, patents and patent applications cited in the present description are incorporated herein by reference in their entirety.
Number | Date | Country | Kind |
---|---|---|---|
2016-129048 | Jun 2016 | JP | national |
2016-178528 | Sep 2016 | JP | national |
2017-071020 | Mar 2017 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2017/013965 | 4/3/2017 | WO | 00 |