This application claims priority from prior Japanese Patent Application No. 2015-199342, filed on Oct. 7, 2015, entitled “Method for detecting rare mutation, detection device and computed program”, the entire contents of which are incorporated herein by reference.
The present invention relates to a method for detecting a rare mutation.
While it has been considered that the genome sequence of an individual is single, it has been revealed that there exists much genomic DNA having slightly different nucleotide sequences in an individual, based on the research using a next-generation sequencer. It is due to a generation of variation in the nucleotide sequence at a constant frequency during the development of reproductive cell, and a generation of variation in the nucleotide sequence at a constant frequency also during cell division and chromosomal replication. It is known that the variation of genome sequence generated as described above can be also one of the causes for onset of diseases.
Cancer is said to be developed by gradual generation of variation in the nucleotide sequence of oncogene and antioncogene. It is known that an individual cancer cell does not have a single genome sequence, but has various variations, by analyzing genomic DNA obtained from a tumor tissue by a next-generation sequencer. Shimizu T. et al., Accumulation of Somatic Mutations in TP53 in Gastric Epithelium With Helicobacter pylori Infection, Gastroenterology, 2014, vol. 147, No. 2, p. 407-417 discloses that whole exome sequencing and deep sequencing are performed for genomic DNA in a tumor tissue of stomach and a non-tumor tissue of stomach, and a somatic mutation is accumulated in various genes of gastric cancer tissue in which inflammation is caused.
When variation recognized at very low frequency in genomic DNA is detected by analysis of nucleotide sequence (hereinafter, also referred to as “sequencing”), a sufficient amount of genomic DNA is usually used as a template such that a genomic DNA molecule having the variation is surely contained in a sample.
For example, about 5 μg of a fragmented DNA is used as a template for DNA sequencing in Shimizu T. et al., Accumulation of Somatic Mutations in TP53 in Gastric Epithelium With Helicobacter pylori Infection, Gastroenterology, 2014, vol. 147, No. 2, p. 407-417. However, in the present technology, an error occurs at a predetermined frequency during nucleic acid amplification of a template DNA and during sequencing, thus variation derived from the error may be contained in the analyzed nucleotide sequence of the genomic DNA. Therefore, it is difficult to distinguish whether the variation of genomic DNA detected by sequencing is mutation or variation due to an error.
The present inventors have surprisingly found that it is possible to distinguish whether variation detected in a template DNA is mutation or variation due to an error, by sequencing using DNA in an amount much less than usual as a template. This finding has led to the completion of the present invention.
The scope of the present invention is defined solely by the appended claims, and is not affected to any degree by the statements within this summary.
The present invention provides a method for detecting a rare mutation. The method comprises the steps of: preparing a sample comprising not more than 1,000 copies of template DNA; amplifying the template DNA to prepare a library, and analyzing a nucleotide sequence of the library; calculating a ratio of variants in a base at a predetermined position, from the analysis result; comparing the calculated ratio of variants with a predetermined cut-off value; and determining that the sample has a rare mutation in the base at the predetermined position when the calculated ratio of variants is not less than the predetermined cut-off value.
The present invention further provides another method for detecting a rare mutation. The method comprises: dividing a sample comprising template DNA to prepare a plurality of aliquots each comprising not more than 1,000 copies of template DNA; amplifying the template DNA in a first aliquot to prepare a library, and analyzing a nucleotide sequence of the library; calculating a ratio of variants in a base at a predetermined position, from the analysis result; comparing the calculated ratio of variants with a predetermined cut-off value; executing the amplification and analysis step, the calculation step, and the comparison step using other aliquots; and determining that the sample has a rare mutation in the base at the predetermined position when the calculated ratio of variants in at least one of the aliquots is not less than the predetermined cut-off value.
The present invention provides another method for detecting a rare mutation. The method comprises the steps of: dividing a sample comprising template DNA to prepare a plurality of aliquots each comprising not more than 1,000 copies of template DNA; amplifying the template DNA in a first aliquot to prepare a library, and analyzing a nucleotide sequence of the library; calculating a ratio of variants in a base at a predetermined position, from the analysis result; comparing the calculated ratio of variants with a predetermined cut-off value; determining that the sample has a rare mutation in the base at the predetermined position when the calculated ratio of variants in the first aliquot is not less than the predetermined cut-off value; executing the amplification and analysis step, the calculation step, the comparison step and the determination step using a second aliquot when the calculated ratio of variants in the first aliquot is less than the predetermined cut-off value, and determining that the sample has a rare mutation in the base at the predetermined position when the calculated ratio of variants in the second aliquot is not less than the predetermined cut-off value.
In this embodiment, a “rare mutation” refers to variation of a base in a nucleic acid, generated in a living body, and intends to variation satisfying the following two conditions:
The variation of the base may be any of substitution, insertion, and deletion, and is preferably substitution. In this embodiment, a base different from the original base at a predetermined position of a template DNA or a read described below is also called as “variant”. The variant may be derived from mutation, or may be derived from variation due to an error occurred in nucleic acid amplification or sequencing.
In this embodiment, SNP (single nucleotide polymorphism) is not included in rare mutations. It is because, while SNP is variation of genomic DNA recognized to appear at a frequency of 1×10−3/base or less, it is one type of genetic polymorphism in which a DNA molecule having SNP is recognized in a ratio of 50% or 100% (either or both of maternal allele and paternal allele), and is different from mutation, in a sample containing a DNA molecule of each individual.
A rare mutation may be generated in a living body due to various causes. For example, cells are exposed to a substance having a risk of causing mutagen or variation, whereby variation may be generated in DNA of a part of the cells. Such variation is also included in the “rare mutation” when the above conditions are satisfied. In diseases such as cancer, it is known that variation is likely to occur in DNA. In the canceration process, at the same time as mutation to be the main cause of disease (also referred to as driver mutation), mutation that does not become the cause of disease may be also generated, and such mutation is generally called as a passenger mutation. The passenger mutation in a non-cancerous tissue is generally said to appear at a frequency of 1×10−3/base or less randomly in various positions on DNA, and may be included in the “rare mutation”.
In the method for detecting a rare mutation of this embodiment (hereinafter simply also referred to as “detection method”), the lower limit of the frequency of rare mutations is theoretically not particularly limited. In this embodiment, as long as at least one rare mutation may be contained in not more than 1,000 copies of template DNA, it is possible to detect even a rare mutation recognized at a frequency of 1×10−4/base or less, 1×10−5/base or less, or 1×10−6/base or less. For example, in the case where a rare mutation with an appearance frequency of 1×10−6/base or less is detected, by analyzing a region of 10,000 bases for 100 copies of genomic DNA, one rare mutation may be theoretically contained in the analyzed region of 100 copies of genomic DNA (1×10−6×10000×100=1).
Hereinbelow, the principle of the detection method of this embodiment will be described with reference to
The right side in
The above point will be more specifically described. With reference to
The principle of the detection method of this embodiment will be described with reference to
The above point will be more specifically described. With reference to
When the method of
According to the method shown in
Each step of the detection method of this embodiment will be described below. In the detection method of this embodiment, first, a sample containing not more than 1,000 copies of template DNA is prepared.
The template DNA is not particularly limited, as long as it is DNA that may contain a rare mutation, and is preferably genomic DNA. The origin of the template DNA is not particularly limited, and may be any species of animals, plant, and microorganisms. Among them, genomic DNA of an organism in which the entire sequence of genomic DNA is analyzed is preferred, and human genomic DNA is particularly preferred. Human genomic DNA can be extracted, for example, from a biological sample. Examples of the biological sample include cells, tissues, body fluids, urine, feces, and the like. Examples of the body fluids include blood, serum, plasma, lymph, bone marrow fluid, ascites, amniotic fluid, semen, nipple discharge, and the like. DNA extracted from an FFPE (formalin-fixed paraffin-embedded) sample of tissue may be used.
The DNA extraction method is not particularly limited. When genomic DNA is extracted from a biological sample, it can be extracted by a known method in the art such as phenol/chloroform method. A commercially available DNA extraction kit and the like may be used. The fragmentation, size selection, terminal smoothing and the like of the extracted template DNA may be performed, as necessary.
In this embodiment, the lower limit of the copy number of the template DNA is at least 10 copies, preferably 30 copies, and more preferably 50 copies. The upper limit of the copy number of the template DNA is usually 1,000 copies, preferably 500 copies, and more preferably 200 copies. In this embodiment, when the copy number of the template DNA is in the range of 10 copies or more and 1,000 copies or less, it is possible to distinguish the ratio of variants derived from a rare mutation and the ratio of variants derived from an error due to nucleic acid amplification and sequencing. Particularly preferably, the copy number of the template DNA is 100 copies.
The means of adjusting the copy number of the template DNA in the sample to 1,000 copies or less is not particularly limited. It is known in the art that 1 ng of genomic DNA corresponds to 300 copies. Accordingly, the concentration of the genomic DNA extracted from the biological sample is measured by a spectrophotometer, and a sample containing not more than 1,000 copies, i.e., not more than 3.33 ng of the genomic DNA may be prepared by dilution based on the concentration. A predetermined gene in the template DNA may be quantitatively determined by real-time PCR, and the copy number of the template DNA may be determined from the quantitative result. As the predetermined gene to be quantitatively determined by real-time PCR, a gene present in any molecule of the template DNA is suitable. Examples of the gene include, in human genomic DNA, ALB, GAPDH, KCNA1, ARHGEF4, RAPGEFL1, and the like. Real-time PCR is particularly preferable since the accurate copy number of template DNA can be determined.
In the detection method of this embodiment, the template DNA contained in the sample is amplified to prepare a library, and sequencing of this library is performed.
The amplification of the template DNA is preferably performed by PCR-based method. A primer pair capable of amplifying a region to be analyzed in the template DNA is designed, and the template DNA is amplified by PCR method using this primer pair, whereby an amplicon can be obtained. The region to be analyzed is concentrated from the fragmented genomic DNA by sequence capture method, and an amplicon may be obtained using this region as template DNA.
The region to be analyzed can be determined from an arbitrary site in the template DNA. For example, in the case of genomic DNA, the region to be analyzed may be any of exon, intron, or a region containing both of them. Alternatively, the template DNA is previously subjected to sequencing, and based on the result, a region capable of ensuring a high number of reads or a region having less sequencing error may be selected as the region to be analyzed.
The lower limit of the length of the region to be analyzed (hereinafter, also referred to as “sequencing length”) is at least 1,000 bases, preferably 5,000 bases, and more preferably 10,000 bases, from the viewpoint of detecting mutation with a low appearance frequency. The upper limit of the sequencing length is theoretically not particularly limited. However, the longer the sequencing length is, the more the cost of sequencing increases. In this embodiment, the upper limit of the sequencing length is preferably 1,000,000 bases, and more preferably 100,000 bases.
The primer used in the amplification of the template DNA may have an addition sequence such as an adaptor sequence or a bar code sequence, a labeling substance or the like, depending on the kind of the sequencer to be used. The number of the primer pairs is determined by the desired sequencing length and the average length of the amplicon described below. The number of the primer pairs is counted as one pair by one forward primer and one reverse primer. The number of the primer pairs can be determined based on the following expression.
(Sequencing length)=(Average length of amplicon)×(Number of primer pairs)
When using a plurality of the primer pairs, it is preferred that multiplex PCR can be performed for these primer pairs. This makes it possible to simultaneously amplify a plurality of regions in the template DNA. In this case, it is preferred to add bar code sequences different each other to each primer pair. This makes it possible to distinguish the amplicon by each primer pair. A primer set for multiplex PCR attached to a commercially available kit such as an exome sequencing kit may be used.
The average length of the amplicon can be determined depending on the performance of the sequencer to be used, and should be usually at least 50 bp. The upper limit of the average length of the amplicon is theoretically not particularly limited. However, the length in which sequencing can be stably performed by the sequencer is preferred.
In the amplification of the template DNA by PCR, it is preferred to minimize the number of PCR cycles in the range where the number of reads necessary for sequencing is obtained, in order to suppress an error due to amplification. In this embodiment, the number of cycles should be determined, for example, from the range of 10 cycles or more and 25 cycles or less. It is considered in the art that, even when variation due to an error is introduced at a predetermined position of one molecule (amplified product) in PCR cycle, the probability that variation due to an error is simultaneously introduced also at the same position of other molecule is low. Accordingly, in the detection method of this embodiment, the ratio of variants derived from a rare mutation is higher than the ratio of variants derived from an error during nucleic acid amplification, so that both can be distinguished from each other.
A polymerase used in the amplification of the template DNA can be properly selected from known heat-resistant polymerases used in PCR. Among them, a heat-resistant polymerase suitable for multiplex PCR and having less PCR error is desirable. A buffer suitable for the selected polymerase should be used in the amplification reaction.
In this embodiment, the nucleotide sequence should be analyzed by a sequencing method known in the art for the library as described above. The sequencing method is not particularly limited, but the analysis by a next-generation sequencer is preferred. The “next-generation sequencer” is a term used as compared to a “first-generation sequencer” that is a sequencer by capillary electrophoresis using Sanger's method, and means a device that determines nucleotide sequences by treating several tens of millions to several hundred millions of DNA fragments simultaneously in parallel. In this embodiment, the next-generation sequencer is not particularly limited, but examples thereof include HiSeq 2500 (Illumina, Inc.), MiSeq (Illumina, Inc.), Ion Proton (Thermo Fisher Scientific Inc.), Ion PGM (Thermo Fisher Scientific Inc.), and the like.
In this embodiment, in order to enhance reliability of the determination result described below, it is desirable that the number of reads having variation derived from a rare mutation is at least 10 or more. For that purpose, the number of reads of sequencing is preferably 10 times or more the copy number of the template DNA, for a region to be amplified with each primer pair. On the other hand, the amplification efficiency may be sometimes different from each other in the amplification with a plurality of primer pairs, and thus the number of the amplicon may be different according to the amplified site. Therefore, the number of reads of sequencing also changes according to the amplified site. For example, in the analysis by Ion Proton sequencer (Thermo Fisher Scientific Inc.), it is known that, when the average number of reads is 5,000, the actual number of reads has dispersion of about 2,000 to 20,000 reads according to the amplified site. Therefore, in this embodiment, it is preferred that the average number of reads of sequencing is, for example, 25 times or more, and preferably 50 times or more the copy number of the template DNA. The number of reads can be digitally counted in numerical value by a next-generation sequencer. The average number of reads can be calculated by dividing all the number of reads by the number of primer pairs.
As for a species in which genome sequence has been already decoded, the genome sequence is generally available as a reference sequence in the art. In this embodiment, when the template DNA is derived from the species in which genome sequence has been already decoded, it is preferred to find variation by comparing the analyzed nucleotide sequence with the reference sequence. In the analysis by a next-generation sequencer, the presence or absence of variation can be detected in every read.
In this embodiment, the ratio of variants in a base at a predetermined position is calculated, based on the analysis result of the nucleotide sequences. As the predetermined position, a position is preferred where variation found by the comparison with the reference sequence is present. The ratio of variants in the base at this position is obtained, whereby whether the found variation is derived from a rare mutation or derived from an error can be determined. The ratio of variants in a base at a predetermined position is calculated by the following expression.
(Ratio of variants in base at predetermined position)=(Number of reads having variation in base at predetermined position)/(Number of reads containing base at predetermined position)
In the above expression, “Number of reads containing base at predetermined position” is a sum of the number of reads having variation in the base at the predetermined position and the number of reads having no variation in the base at the predetermined position. As shown in
In this embodiment, the ratio of variants is preferably calculated for each one base in the region to be analyzed. In the region to be analyzed, when a plurality of variations is present in the positions being different from each other, the ratio of variants is calculated for the base at the position where each variation is present.
In this embodiment, the calculated ratio of variants is compared with a predetermined cut-off value, and whether or not the sample has a rare mutation in the base at the predetermined position is determined, based on the result. Specifically, when the calculated ratio of variants is not less than the predetermined cut-off value, it is determined that the sample has a rare mutation in the base at the predetermined position. On the other hand, when the calculated ratio of variants is lower than the predetermined cut-off value, it is determined that the sample has no rare mutation in the base at the predetermined position. When it is determined that the sample has no rare mutation in the base at the predetermined position, it may be determined that the variation in the base at this position is derived from an error.
In this embodiment, the predetermined cut-off value may be the ratio of variants derived from an error. The distribution of an error due to nucleic acid amplification and sequencing is considered to follow the Poisson distribution that is a distribution of random events at a low frequency. Therefore, the predetermined cut-off value can be determined from the Poisson probability obtained from the Poisson distribution based on the Phred scores of the analyzed nucleotide sequence and the number of reads. The predetermined cut-off value may be set for each one base in the region to be analyzed, but it is preferred to set a single cut-off value based on the average value of the Phred scores of the analyzed nucleotide sequence and the average number of reads because of convenience.
The “Phred” refers to a base calling program used in a DNA sequencer, and is known in the art. Phred enables to execute base calling (determination of base) from the trace data (graph image such as waveform data of signals obtained from sequencing reaction) acquired by a DNA sequencer. At this time, a Phred score (also called as “Phred quality score”) is calculated for each designated base. The Phred score is an index representing accuracy of the nucleotide sequence analyzed by a sequencer, and widely spread in the art. The relationship between the Phred score (or the average value thereof) and the frequency of errors in the analyzed nucleotide sequence is represented by the following expression.
(Frequency of errors)=10−a/10(/base)
wherein a is a Phred score or an average value thereof.
For example, when the Phred score of one base is 20, the frequency of errors in the base is 1×10−2/base, and when the Phred score is 30, the frequency of errors in the base is 1×10−3/base. The average value of the Phred score can represent the frequency of errors in the analyzed nucleotide sequence. For example, when the average value of the Phred score is 20, an error occurs once per 100 bases (1×10−2/base), and when the average value of the Phred score is 30, an error occurs once per 1,000 bases (1×10−3/base).
The Phred score of each base is automatically calculated by a next-generation sequencer. The average value of the Phred score can be calculated by dividing the sum of the Phred scores of the analyzed nucleotide sequence by the number of the analyzed bases. The Phred score differs depending on the sequencer to be used. For example, in the case of Ion Proton sequencer used in the examples, the average value of the Phred scores of the analyzed nucleotide sequence is about 25.
In this embodiment, it is preferred to set, as the predetermined cut-off value, the ratio of variants when the expected value of the number of variations due to an error in the sequencing length is 1 or less. The ratio of such variants is calculated from the Poisson probability obtained from the Poisson distribution based on the average value of the Phred scores of the analyzed nucleotide sequence and the average number of reads, and the sequencing length. The calculation example of the predetermined cut-off value will be described below.
As for 100 copies of genomic DNA, the nucleotide sequence was analyzed by a next-generation sequencer. In this analysis, the sequencing length was 10,000 bases, the average value of the Phred score was 30, and the average number of reads was 5,000. The frequency of errors in the sequencing length is 1×10−3/base (10−30/10=1×10−3) since the average value of the Phred score is 30. Since the average number of reads is 5,000, the average of the Poisson distribution is 5 (5000×1×10−3=5). That is, the number of reads having variation due to an error per 5,000 reads is 5 in average. The relationship of the average of the Poisson distribution, the average number of reads and the average value of the Phred scores are represented by the following expression.
(Average of Poisson distribution)=(Average number of reads)×10−a/10
wherein a is an average value of the Phred scores.
Subsequently, the distribution of probability (Poisson distribution) will be determined when the number of reads (the number of events) having variation due to an error per 5,000 reads is k. The probability P(k) is calculated by the following expression (0!=1).
P(k)=e−λ(λk/k!)
wherein λ is the average of the Poisson distribution, and k is the number of events.
The Poisson distribution may be calculated using spreadsheet software capable of performing statistical processing. Examples of such spreadsheet software include Excel (registered trademark) (Microsoft Corporation) and the like. Specifically, a table of the Poisson probability is prepared by Excel (registered trademark) when the number of events is 0 to 50, with an average of the Poisson distribution of 5, the number of events of 0 to 50, and a functional form of FALSE. In this example, the upper limit of the number of events is the average number of reads itself (i.e., 5,000). However, the frequency of occurrence of error is low, and therefore the Poisson probability may be usually calculated by setting the upper limit of the number of events to 1/50 or less the average number of reads. Moreover, the expected value of the number of variations due to an error in the sequencing length was calculated based on the following expression.
(Expected value of number of variations due to error)=(Sequencing length)×(Poisson probability)
The number of events (the number of reads having variation) was 0 to 2 and 16 to 50 when the calculated expected value was 1 or less, namely, the number of variations due to an error in 10,000 bases was 1 or less. The expected value when the number of events was 0 to 2 was apparently 1 or less, but it is highly likely to underestimate the occurrence of error. Herein, 16 was used as the number of events when the expected value was 1 or less, for calculating the lowest predetermined cut-off value. P(16)=4.91×10−5, and the expected value is 0.491 (4.91×10−5×10000=0.491). The ratio of variants derived from an error at this time is 0.32%, since 16 errors are present in the 5,000 reads ((16/5000)×100=0.32). Accordingly, 0.32% can be set as the predetermined cut-off value.
In the case where the Phred score is a relatively low value (e.g., 27 or less), the number of events (referred to as “k′”) when the calculated expected value is 1 or less can take the low value (or group of low values) and the high value (or group of high values), in 0 or more, as the example described above. When using a low value or a value selected from the group of low values as k′, the ratio of variants derived from an error is underestimated. Accordingly, in this embodiment, it is desirable to use a high value or a value selected from the group of high values as k′. When the lowest value among the group of high values is used as k′, the lowest predetermined cut-off value can be calculated.
When the average number of reads and the average value of Phred score obtained from the used next-generation sequencer are stable between analyses to some extent, the predetermined cut-off value may not be calculated each time the detection method of this embodiment is carried out. That is, a fixed value may be used as the predetermined cut-off value. The fixed value can be calculated from the average number of reads and the average value of Phred score empirically obtained by the used next-generation sequencer as described above.
As described above, in this embodiment, when the ratio of variants in the base at the predetermined position is not less than the predetermined cut-off value, it is determined that the sample has a rare mutation in the base at the predetermined position. However, when the ratio of variants in the base at the predetermined position is too high, this variation in the base at the predetermined position may not be a rare mutation. For example, the variation in the template DNA is SNP, the ratio of variants in the base at the position of SNP is theoretically 50% or 100%. SNP is one type of genetic polymorphism, and is desirably distinguished from the rare mutation to be detected in this disclosure. In this embodiment, the ratio of variants in the base at the predetermined position is preferably 10% or less.
The scope of this disclosure also includes a rare mutation detection device (hereinafter, also referred to as “detection device”). The scope of this disclosure also includes a computer program for enabling a computer to execute detection of a rare mutation (hereinafter, also referred to as “computer program”).
Hereinbelow, an example of the detection device will be described with reference to a figure. However, this embodiment is not limited only to a configuration shown in this example.
When a library prepared by a nucleic acid amplification reaction using a sample containing not more than 1,000 copies of template DNA is set in the sequencer 20, the sequencer 20 executes analysis of the nucleotide sequence of the library, and acquires information such as the analyzed nucleotide sequence, and the Phred score, number of reads and sequencing length of each base, and the obtained various information is transmitted to the detection device 30 as analysis data. A format of the analysis data is not particularly limited, and may be a format corresponding to the used sequencer. Examples of such a format include FASTA format and the like.
The detection device 30 receives the analysis data from the sequencer 20. A processor (CPU) of the detection device 30 executes a computer program for detection of a rare mutation, the program being installed on hard disk 313 (refer to
With reference to
The CPU 310 can execute programs stored in the ROM 311 or the hard disk 313 and programs loaded in the RAM 312. The CPU 310 calculates the ratio of variants in a base at a predetermined position, and reads out a predetermined cut-off value stored in the ROM 311 or the hard disk 313, to determine the presence or absence of a rare mutation in the base at the predetermined position. The CPU 310 outputs a determination result and allows the display unit 302 to display the result.
The ROM 311 is configured by mask ROM, PROM, EPROM, EEPROM, or the like. The ROM 311 records the computer programs to be executed by the CPU 310 and the data used in executing the computer programs as described above. The ROM 311 may record the predetermined cut-off value. The ROM 311 may record the expression for calculating the average number of reads, the expression for calculating the average value of Phred scores, the expression for calculating the Poisson distribution, the reference sequence, and the like.
The RAM 312 is configured by SRAM, DRAM, or the like. The RAM 312 is used to read out the programs recorded on the ROM 311 and the hard disk 313. In executing these programs, the RAM 312 is used as a work region of the CPU 310.
The hard disk 313 is installed with programs to be executed by the CPU 310 such as operating system and application program (computer program of this embodiment), as well as the data used in executing the program. The hard disk 313 may record the predetermined cut-off value. The hard disk 313 may record the expression for calculating the average number of reads, the expression for calculating the average value of Phred scores, the expression for calculating the Poisson distribution, the reference sequence, and the like.
The input/output interface 314 is configured, for example, by serial interface such as USB, IEEE 1394 or RS-232C; parallel interface such as SCSI, IDE or IEEE1284; and an analog interface including D/A or A/D converter. The input/output interface 314 is connected to the input unit 301 including a keyboard and a mouse. An operator can input various commands and data into the computer body 300 by the input unit 301.
The reading device 315 is configured by a flexible disk drive, CD-ROM drive, DVD-ROM drive, or the like. The reading device 315 can read programs or data recorded on a portable recording medium 40.
The communication interface 316 is, for example, Ethernet (registered trademark) interface, or the like. The computer body 300 can transmit print data to a printer by the communication interface 316.
The image output interface 317 is connected to the display unit 302 configured by LCD, CRT, or the like. This makes it possible for the display unit 302 to output a video signal corresponding image data provided from the CPU 310. The display unit 302 displays an image (screen) according to the input video signal.
With reference to
In Step S101, the CPU 310 acquires analysis data from the sequencer 20, and stores the analyzed nucleotide sequence and the number of reads in the hard disk 313. In Step S102, the CPU 310 calculates the ratio of variants in the base at the predetermined position based on the stored number of reads, and stores it in the hard disk 313. The base at the predetermined position is preferably at a position where variation is present with respect to the reference sequence. The calculation of the ratio of variants is the same as that stated in the detection method of this embodiment. In Step S103, the CPU 310 compares the calculated ratio of variants with the predetermined cut-off value stored in the hard disk 313. When the calculated ratio of variants is equal to or higher than the predetermined cut-off value, the processing proceeds to Step S104, and the determination result showing that a rare mutation is present in the base at the predetermined position is stored in the hard disk 313. On the other hand, when the calculated ratio of variants is lower than the predetermined cut-off value, the processing proceeds to Step S105, and the determination result showing that a rare mutation is absent in the base at the predetermined position is stored in the hard disk 313. In Step S106, the CPU 310 outputs a determination result, allows the display unit 302 to display, and allows a printer to print the result.
With reference to
In Step S201, the CPU 310 acquires analysis data from the sequencer 20, and stores the analyzed nucleotide sequence, the number of reads and the Phred score of each base in the hard disk 313. In Step S202, in the same manner as in Step S102 described above, the ratio of variants in the base at the predetermined position is calculated based on the stored number of reads, and is stored in the hard disk 313. In Step S203, the CPU 310 calculates the average number of reads based on the stored number of reads, calculates the average value of the Phred scores based on the stored Phred scores, and stores these values in the hard disk 313. The calculation of these values is the same as that stated in the detection method of this embodiment. In Step S204, the CPU 310 calculates the ratio of variants when the expected value of the number of variations due to an error in the sequencing length is 1 or less, based on the stored average number of reads and average value of the Phred scores, and stores this value in the hard disk 313 as the predetermined cut-off value. The calculation of this predetermined cut-off value is the same as that stated in the detection method of this embodiment. In Step S205, the CPU 310 compares the calculated ratio of variants with the calculated predetermined cut-off value. When the calculated ratio of variants is equal to or higher than the predetermined cut-off value, the processing proceeds to Step S206, and the determination result showing that a rare mutation is present in the base at the predetermined position is stored in the hard disk 313. On the other hand, when the calculated ratio of variants is lower than the predetermined cut-off value, the processing proceeds to Step S207, and the determination result showing that a rare mutation is absent in the base at the predetermined position is stored in the hard disk 313. In Step S208, the CPU 310 outputs a determination result, allows the display unit 302 to display, and allows a printer to print the result.
When dividing a sample to prepare a plurality of aliquots, the preparation of the plurality of aliquots can be also automatically performed by a device. When the detection method of this embodiment is performed using a first aliquot, and a rare mutation is not detected, the detection using a second aliquot may be automatically performed. The sequencer 20 and the detection device 30 may be configured such that the analysis of aliquots is automatically repeated until a rare mutation is detected.
This disclosure will be described in more detail by examples hereinbelow. However, this disclosure is not limited to these examples.
In Example 1, N-nitroso-N-methylurea (hereinafter referred to as “MNU”) that was a mutagen was administered to cultured cells, to induce a point mutation of genomic DNA. Then, mutation was detected by the detection method of this embodiment, and the appearance frequency of the mutation was calculated. This analysis was independently performed three times.
Human TK6 lymphoblasts (hereinafter, referred to as “TK6 cell”) were obtained from American Type Culture Collection. On day 0, 1×105 cells of TK6 cells were seeded on a 10 cm plate. On day 1, the TK6 cells were exposed to MNU (Sigma) in a concentration of 0, 0.1, 0.3, 1, 3, 10 or 30 μM for 24 hours. On day 7, the number of cells was counted, and the cells were collected. Then, genomic DNA was extracted by phenol/chloroform method.
The copy number of the extracted genomic DNA was determined quantitatively by real-time PCR using SYBR (registered trademark) green I (BioWhittaker Molecular Applications) and iCycler Thermal Cycler (Bio-Rad Laboratories, Inc.). Genes to be measured and sequences of the primer are shown in Table 1. In the table, “F” means a forward primer, and “R” means a reverse primer. Each sample was measured using three kinds of primers. The average value of three copy numbers obtained above was defined as the DNA copy number of the sample.
A sample containing 100 copies of genomic DNA was prepared, based on the measurement result of the copy number. A library for sequencing was prepared by amplification with multiplex PCR, using 100 copies of genomic DNA in the sample as a template. For the preparation of this library, Ion AmpliSeq Library Kit 2.0 (Thermo Fisher Scientific Inc.) was used. Specific operation was performed in accordance with the instruction attached to the kit. In multiplex PCR, 291 primer pairs (sequence numbers 7 to 588: sequences represented by add sequence numbers are each a sequence of a forward primer, and sequences represented by even sequence numbers are each a sequence of a reverse primer) were used. This made 291 regions in 55 cancer-related genes on the genomic DNA amplified at the same time. These primer pairs cover 48,587 bp. To the amplicon in the library is added a bar code sequence corresponding to each sample by the kit. The resulting library was subjected to sequencing by Ion PI Chip and Ion Proton sequencer (Thermo Fisher Scientific Inc.). The acquired nucleotide sequence data was mapped to the human reference genome hg19 using Ion Suite 4.0 (Thermo Fisher Scientific Inc.) to determine a nucleotide sequence. The average number of reads of sequencing was 5,000. Among the analyzed 48,587 bases, 15,724 bases were selected. It is because, in this selected region, the average number of reads in independent three times of analysis is 2,500 or more in untreated TK6 cells, and this selected region does not contain variation with a ratio of variants of 0.2% or more in the untreated TK6 cells.
When there is one variation in the 100 copies of genomic DNA, the ratio of variants is theoretically 1%. This ratio is considered to be higher than the ratio of variants derived from an error due to PCR and sequencing described above. The ratio of variants derived from an error was calculated as follows. The average value of the Phred scores of the nucleotide sequence analyzed by Ion Proton sequencer was 25. Accordingly, the frequency of errors is 3.16×10−3/base (10−25/10=3.16×10−3). Since the average number of reads is 5,000, the average of the Poisson distribution is 15.8 (5000×3.16×10−3=15.8). Moreover, using the number of reads having an error in the 5,000 reads as the number of events of the Poisson probability, a table of the Poisson probability was formed by spreadsheet program Excel (registered trademark) (Microsoft) (average of the Poisson distribution: 15.8, the number of events: 0 to 60, functional form: FALSE). Then, the expected value of the number of variations due to the error in the region selected above was calculated from the product of the Poisson probability in each of the number of events and the length (15,724 bases) of the selected region. The number of events (the number of reads having variation) was 33 when the resulting expected value was 1 or less, namely, when the number of variations due to the error in the 15,724 bases was 1 or less. In this case, the ratio of variants derived from the error is 0.66% ((33/5000)×100=0.66). Accordingly, in the analyzed nucleotide sequence, variation with a ratio of variants of higher than 0.66% is considered to be a somatic mutation induced by MNU, not variation due to the error. In Example 1, variation with a ratio of variants of 0.8 to 10% was detected as a somatic mutation induced by MNU. Then, the frequency of the detected variations was calculated as the number of variations in 1,572,400 bases (15,724 bases×100 copies).
The result of three times of analysis independently performed is shown in
In Example 2, using esophageal mucosa collected from a donor as a specimen, a point mutation in those genomic DNA was detected by the detection method of this embodiment, and the appearance frequency was calculated.
291 specimens of esophageal mucosa were collected from adults who underwent cancer screening inspection between September, 2008 and April, 2013, using an endoscope. From a donor of each specimen, history information regarding risk factors for esophageal carcinogenesis of alcohol drinking, betel quid chewing, and cigarette smoking (hereinafter also referred to as “ABC”) was obtained by interview (refer to Y. C. Lee et al., Cancer Prev Res (Phila), 2011, vol. 4, p. 1982 to 1992). 93 specimens were classified into the following three groups according to the risk of cancer.
Group 1: Normal esophageal mucosa obtained from healthy subjects not exposed to ABC (30 specimens)
Group 2: Normal esophageal mucosa obtained from healthy subjects exposed to ABC (32 specimens)
Group 3: Noncancerous esophageal mucosa obtained from patients with esophagus squamous epithelium carcinoma (31 specimens)
Genomic DNA was extracted from each specimen by phenol/chloroform method. As to the resulting genomic DNA, the copy number was quantitatively determined in the same manner as in Example 1, and a sample containing 100 copies of genomic DNA was prepared.
As to the sample containing 100 copies of genomic DNA prepared from each specimen, a library for sequencing was prepared in the same manner as in Example 1, and subjected to sequencing by Ion PI Chip and Ion Proton sequencer (Thermo Fisher Scientific Inc.). Then, the variation in the genomic DNA was detected in distinction from the variation derived from an error, and the appearance frequency of variations was calculated in the same manner as in Example 1.
The appearance frequency of variations in each group is shown in
Number | Date | Country | Kind |
---|---|---|---|
2015-199342 | Oct 2015 | JP | national |