The present invention relates to a method, kit, and system for analyzing genes. In particular, the present invention relates to a technique for detecting gene polymorphisms contained in a target gene region.
DNA is a polymer molecule, which carries biogenetic information, and DNA sequencing techniques have enormously evolved as techniques preferable for life science since the Sanger method was developed as one of the methods for analyzing DNA sequences. The “dideoxy method”, a technique developed by Sanger et al., is a sequencing method using synthetic reactions that stop at the positions, at which dideoxynucleotides (ddATP•ddGTP•ddCTP•ddTTP) are uptaken when added in a DNA synthetic reaction solution as low concentration of terminators.
The initial DNA sequencing technique labels four types of dideoxynucleotides with their corresponding radioisotopes; induces a DNA synthetic reaction separately in four vessels, each being filled with the corresponding radioactively-labeled dideoxynycleoride; separates the products of each DNA synthetic reaction based on the lengths of DNA fragments by acrylamide gel electrophoresis using the individual lanes; and determine the nucleotide sequences by detecting the positions, at which radioactive isotopes are found, using autoradiography.
After then, a nucleotide identification method, which uses four kinds of fluorescent dyes, each corresponding one of four types of dideoxynucleotides, has been developed to detect four types of nucleotides together in a mixture thereof, making it possible to analyze DNA sequences on single lane of an acrylamide gel. Moreover, capillary electrophoresis has been developed as a successor electrophoretic method to acrylamide gel electrophoresis. A DNA sequencer using capillary electrophoresis is a system for analyzing one DNA sample labeled with four fluorescent dyes in a single capillary using the Sanger method. A DNA sequencer using capillary electrophoresis, which enables a plurality of samples to be quickly analyzed simultaneously for continuous automatic analysis, have contributed greatly to the large-scale gene sequencing project of, for example the human genome project, of which completion was reported in 2003, and is most widely used at present.
The principle of the DNA sequencer to determine the gene sequences of target samples involves the steps of separating the DNA fragments based on their fragment lengths by electrophoresis and detecting the fluorescent-labeled molecules at the separation positions. The nucleotides are deduced at the coordinate positions, at which their signal peaks are detected, by the rule of majority, based on the strengths of obtained fluorescent signals or the areas of the peaks.
It is known that the genomes of most of living organisms including human being, who are the most preferable target to analyze the gene sequences, are diploid and their genome sequence has mutations, called single nucleotide polymorphisms. These nucleotide polymorphisms, also called germ line polymorphisms, have properties transmitted from parent to offspring and exist in the individuals or cells in a one to one ratio. When the region, in which these nucleotide polymorphisms exist, is analyzed by a DNA sequencer, two kinds of fluorescent peaks are detected simultaneously at the positions, at which the nucleotide polymorphisms are found.
As mentioned before, since the germ line polymorphisms exist in a one to one ratio, basically, these two kinds of fluorescent signals are also detected at the positions, at which germ line polymorphisms are found in a one to one ratio. However, the strengths of signals may not show the one to one correlation at the some nucleotide positions due to a difference in luminescent efficiency between fluorescent substances or a difference in uptaking efficiency between the positions, at which polymorphisms are found; accordingly, the conventional DNA sequencing techniques have a disadvantage of difficulty in detection of the polymorphisms.
To address this problem, a specialized method for analyzing the obtained fluorescent signals to detect these polymorphisms has been developed (Japanese Unexamined Patent Application Publication No. 2002-05508). Furthermore, a mobility shift may occur between the peak positions of chromatograms when a difference in luminescent efficiency between fluorescent substances affects the mobility during electrophoresis. A method for determining the polymorphisms, of which peak positions shifted, has been also developed (Japanese Unexamined Patent Application Publication No. 2003-270206).
Japanese Unexamined Patent Application Publication No. 2002-05508 and Japanese Unexamined Patent Application Publication No. 2003-270206 disclose methods for enhancing the determination accuracy and sensitivity of polymorphisms by reference to existing chromatogram datas in a target gene region. These methods provide effective tools for determining the gene sequences including germ line polymorphisms by analyzing existing datas from the sequencer with a high degree of accuracy.
With recent evolution of genome analysis techniques, the DNA sequence of the entire human genome was reported in 2003, and since then, drug development has been proactively advanced taking advantage of gene information. In particular, medical insurance has been applied to genetic testing conducted on the individual patients with cancers, which may be caused by genetic abnormalities, to select some therapeutic medicines and determine the dosages of these medicines.
Genetic disruptions induced by diseases, such as cancers, are called somatic mutations, differently from the aforementioned germ line polymorphisms. The somatic mutations, which are genetic abnormalities occurred after birth, are characterized in that they are not transmitted from parent to offspring, the positions on the genome, in which mutations have occurred, cannot be estimated, and the existence ratio of polymorphisms cannot be estimated in vivo or tissues. Inability to estimate the existence ratio of somatic mutations has become a major problem in detecting these polymorphisms. Giving an example, the cancer tissue excised out from a cancer patient contains both cancer cells and normal cells, and diversity in genetic abnormality is observed among cancer cells, leading to a low existence ratio of the cells having polymorphisms in the target region of the tissue. For this reason, detection of somatic mutations is more difficult than that of germ line polymorphisms.
Currently, quantitative polymerase chain reaction (PCR) system and DNA sequencers are used for detecting somatic mutations. The quantitative PCR system has a great advantage of the high-sensitivity detection. However, it also is a disadvantage in that a specific detection probe for each target polymorphism is necessary to conduct the detection of reactions using separate probes. In particular, for cancer cells, what types of genetic abnormalities may occur and where cannot be estimated at present; accordingly, any quantitative PCR system, for which probe design specific for the target polymorphism is needed, is not suitable for exhaustive detection of somatic Mutations. Even if a various kinds of combinations of detection probes are available, it would be practically difficult to detect somatic mutations using all these probes due to limited testing cost and analyte samples in amount.
On the other hand, a capillary electrophoresis-based DNA sequencer, the most widely used sequence system at present, has an ability to determine 500-700 bp. For this reason, this type of sequencer has advantages over the quantitative PCR system in that it is capable of 1) detecting a larger region in initial testing, and ii) determining new somatic mutations in the aforementioned nucleotide sequence region. In addition, the quantitative PCR system determines gene sequences based on the relative intensity between signal values for the target polymorphism, while the DNA sequencer, can verify that the genetic polymorphism detected using target polymorphism information is derived truly from the target gene, resulting in highly reliable results of measurement compared with those of the quantitative PCR system. High reliability of measurements is one of preferable features in medical diagnosis, in which accurate determination is needed. However, the DNA sequencer, the system specialized for gene sequencing, has a major problem of insufficient sensitivity to detect minute somatic mutations.
In contrast, the capillary electrophoresis-based DNA sequencer is capable of determining nucleotide sequences using four kinds of fluorescent labeling substances, each corresponding to one of four types of nucleotides. Generally, the fluorescent substance produces luminescence across a wide range of wavelengths but not at a single wavelength.
According to one aspect of the present invention, to solve at least one of the aforementioned problems, the nucleic acid samples are labeled for each of nucleotide types; the labeled samples are electrophoresed in the separate flow channel for each of nucleotide types; and genetic mutations are detected based on chromatogram data obtained from the labeled signal for each of nucleotide types concerning the individual nucleic acid samples separated by electrophoresis in its corresponding one of the plurality of flow channels.
The present invention enables the somatic mutations existing in the target gene region to be detected with a high order of sensitivity. The problems and configurations other than those above mentioned will be explained using the following embodiments.
Hereinafter, an embodiment of the present invention will be explained by reference to the accompanying drawings. It should be noted that the exemplified embodiments of the present invention include but not limited to those explained below.
First, the method for detecting DNA molecules according to an embodiment of the present invention will be explained by reference to
A primer having a complimentary sequence in part of the template DNA and DNA synthetase, as well as dNTP and ddNTP as reactive substrates, are added in a solution containing the template DNA sample (1a) to induce a labeling reaction by the Sanger method. According to the embodiment of the present invention, a pair of the template DNA sample and the primer is labeled with four different reactive solutions (1b-1e) corresponding to the individual target nucleotide types (A,G,C,T) to separate these target nucleotide types (A,G,C,T) in the separate flow channels (1f-1i) by electrophoresis and make measurement.
According to the embodiment of the present invention, either 1) the dye primer method, which labels the primer i), or 2) the dye terminator method, which labels ddNTP ii), may be used for labeling synthetic DNA molecules. Moreover, any of fluorescent dyes, chemical luminescent substances, and radioactive isotopes may be used as labeling substances. The sequencing kits commercially available today label four types of ddNTPs with different fluorescent substances; they may be applied to the labeled DNA sample containing four types of ddNTPs labeled with different fluorescent substances according to the embodiment of the present invention. Either the dye primer method or the dye terminator method may be used, as the labeling method, because four types of ddNTPs labeled with a single kind of fluorescent substance are analyzed in the physically separate flow channels by electrophoresis.
In particular, with the dye terminator method, when ddATP, ddGTP, ddCTP, and ddTTP are labeled with different fluorescent substances, differences in chemical structure among these different fluorescent substances affect the uptaking efficiency during labeling reaction. For this reason, labeling the four types of nucleotides with a single kind of fluorescent substance is useful in detecting the existence ratio correctly. Moreover, the differences among the fluorescent substances also affect the mobility of DNA fragments during electrophoresis; accordingly, labeling the four types of ddNTPs with the single fluorescent substance is also useful in correcting the mobility of the DNA fragments.
Second, the labeled DNA fragments are separated in the separate flow channels (1f - 1i) for each of four nucleotide types by electrophoresis to separate based on their DNA fragment length. Since during electrophoresis, the shorter DNA fragment migrates first, the strengths of signals are measured over time using a measuring apparatus (1k) at a detection part (1j) to allow the signals corresponding to the existing nucleotides to be measured according to the nucleotide sequence in the target samples. A micro-channel developed using a technique called Micro-Electro-Mechanical Systems (MEMS) may be used to separate the labelled DNA fragments by electrophoresis and detecting the signals, in addition to the capillary-type flow channels.
The method for analyzing the DNA fragments using four separate flow channels is exemplified in
Moreover, for example, flow channels may be sequentially added depending on any other application, as explained later by reference to
Next, an example of the configuration of the entire system according to the embodiment of the present invention will be explained by reference to
Next, the function parts contained in the data analyzer (8c) shown in
Initially, the measuring apparatuses (8a, 8f) separate DNA fragments based on their fragment length by electrophoresis, under the separation measurement conditions received from the control systems (8b, 8g), in the separate flow channel for each of nucleotide types shown in
Then, the signal values measured at the measuring apparatus (8a) are transmitted to the data analyzer (8c). The data analyzer (8c) records the measured values, once received, in a measured signal value storage (9i). Then, using the method described later by reference to
Then, a result output (9i) receives data detected at the main/mutation peak detection part (9o), records the data in a sequence/detected mutation storage (9g), and outputs the data to a display (9f). In turn, the display (9f) displays information on the nucleotide sequence and mutations, as well as the chromatograms, as described later by reference to
Then, the peak detection function, which is performed at the peak detection part, (9k) described in
The methods for correcting the mobility using the signal values measured in each of the flow channels at the measuring apparatuses (8a, 8b), as input information, include; i) a method shown in
The i) method shown in
The ii) method shown in
The method iii) shown in
The method iv) shown in
As mentioned above, after the mobility of the measured signals of the target samples is corrected and the data is integrated, the types of the nucleotides showing the signals with the strengths equal to or higher than the threshold value and the coordinates of the signals on the nucleotide sequence are extracted through the process described by reference to
Moreover, it is possible to calculate an index for the existence ratio of mutations using the largest one of and the smallest one of the signal values measured at the coordinate positions. In addition, to calculate the percent identity to reference sequence obtained from the target samples with the reference nucleotide sequence information, the nucleotide type showing the signal with the largest strength, among the signals, with strengths equal to or higher than the threshold value, which are obtained from the target samples, may be determined to compare with the information on the known reference nucleotide sequence in the target region.
Next, an example of the method for displaying the results of analysis according to the embodiment of the present invention will be explained by reference to
In the figure, 7i shows an exemplified display of the list of the results of exhaustive detection of the polymorphisms existing in the target region, based on the measured results obtained by integrating data from each of the flow channels (7a). As shown in 7i, the existence ratios of four nucleotide types showing the signals with the strengths higher than the threshold value among the calculated ones are displayed in the form of a list by plotting information on the reference nucleotide sequence along the abscissa axis and the nucleotide types along the ordinate axis. As shown in 7f-7h of the integrated chromatogram information (7a), the signals detected along the same abscissa axis show the existences of the polymorphisms at the positions of the nucleotides corresponding to the signals; accordingly, it is possible to display the existence ratios of polymorphisms, which are calculated by comparing the nucleotide types of existing polymorphisms and the strengths of the signals at the same coordinate positions (7o-7q). In addition, the percent identity of the nucleotide sequence obtained from target samples with information on the reference nucleotide sequence can be calculated by determining the nucleotide type showing the signal with the largest strength among the strengths equal to or higher than the threshold value at the coordinate positions, which are obtained from the target samples, and comparing the determined nucleotide type with the known information on the reference nucleotide sequence in the target region (7r). The percent identity with the reference nucleotide sequence information plays a preferable role as an indicator for determining whether the measured DNA region agrees with the intended target region to be measured.
Alternatively, in addition to displaying the results of measurement as shown in 7i, the system may be configured so that i) the positions, at which polymorphisms are detected, are extracted or enhanced for displaying; ii) the positions of known polymorphisms related to diseases are extracted or enhanced for display; or iii) the specific coordinate positions specified by a tester are extracted or enhanced for display. The exhaustive information on polymorphisms obtained from the target region shows the existences of minute mutations in somatic cells, which cannot be acquired by the existing DNA sequencers and quantitative PCR apparatuses; accordingly, it is useful in analyzing the correlation between the mutations in somatic cells and diseases and treating these diseases.
If the existences of polymorphisms in the target gene region and the existence ratios thereof serve as indicators for medicine administration or medical treatment, the aforementioned results of exhaustive polymorphism detection may be displayed effectively on the analyzers for medical use because the guideline for whether these medicines can be administered, the dosages of these medicines, and the medical treatment is directly provided. Moreover, the results of exhaustive polymorphism detection may be compared with other clinical information and the outcomes from therapy to work out further effective medicine regimens.
Furthermore, a kit for analyzing genes, supplied with reagents for gene mutation detection, which label the nucleic acid samples for each of nucleotide types, may be used to detect genetic mutations using the aforementioned system; the kit performs electrophoresis in the separate channel for each of nucleotide types and detects genetic mutations for each of the nucleic acid samples based on the chromatogram data obtained from the labeled signal for each of nucleotide types from its corresponding separate flow channel.
Thus, according to the embodiment of the present invention, performing electrophoresis and detection for each of nucleotide types using the separate flow channels allows the polymorphisms in the somatic cell existing in the target gene region to be detected with a high order of sensitivity. Additionally, comparison among the detected strengths of signals enables the existence ratios of the mutations in somatic cell to be analyzed. The obtained exhaustive information on polymorphisms in the target region shows the existences of minute mutations in somatic cells, which cannot be acquired by the existing DNA sequencers or quantitative PCR systems; it is useful in analyzing the correlation between the mutations in somatic cells and diseases and medical treatment thereof. Moreover, comparison of the obtained information on the polymorphisms in the somatic cell line, against the existing information on the polymorphisms in the somatic cell line, clinical information, and the outcomes from therapy enables further effective medicine regimens to be worked out.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2012/077029 | 10/19/2012 | WO | 00 | 11/27/2013 |