1. Field of the Invention
The present invention relates to a method and apparatus for displaying gene information used for analysis for identifying genes involved in phenotypes, such as an individual's diseases or external features. In particular, the invention relates to a method and apparatus capable of displaying the results of analysis in which, when extracting and detecting a DNA fragment with a gene as the subject of analysis using PCR or electrophoresis, a signal from the subject of analysis and noise signals are clearly distinguished, and corrections are made so as to eliminate the influence of the noise signals on the desired signal.
2. Background Art
Following the completion of the sequencing of the human genome, research is actively underway so as to analyze the function of genes. Among other factors, particular attention is being focused on the automatic determination of genotypes and genotype frequency, which form the basis for a search for genes involved in phenotypes, such as the presence or absence of particular diseases, the extent of efficacy of medication, and the presence or absence of side effects.
Microsatellite
Normally, genomes of living organisms of the same species have substantially identical nucleotide sequences, with different nucleotides located at some sites. For example, at a certain genetic locus, some individuals may have A while other individuals may have T. Such presence of polymorphism in a single nucleotide of a genome among individuals is referred to as a SNP (Single Nucleotide Polymorphism).
There are other cases where one individual has A at a certain genetic locus and other individuals do not. For example, as shown in
The genomes of living organisms have many (tens of thousands or more) sites at which a short nucleotide sequence pattern that is two to six nucleotides long appears repeatedly several to a dozen times. Such a characteristic nucleotide sequence pattern is referred to as a microsatellite. An example of microsatellite that appears in a genome is shown in
As described above, SNPs, single nucleotide in/del, and microsatellites, which can vary among individuals, are portions that can be easily distinguished from other nucleotide sequences in a genome, and they can also be easily detected experimentally. In some species of living organisms, the approximate positions of SNPs, single-nucleotide in/del, and microsatellites in the genome are known, and therefore they can be used as indices of genomic positions. Because of these characteristics, SNPs, single-nucleotide in/del's, and microsatellites with polymorphisms are referred to as DNA markers. In particular, microsatellites with polymorphisms, which include a plurality of nucleotides, contain much more amount of information than SNPs or single-nucleotide in/del's, and therefore they are used frequently as DNA markers. Further, microsatellites with polymorphism have an added advantage that a plurality of samples can be subjected to experimentation simultaneously in a pooled typing experiment, as will be described later.
As shown in
In the example shown in
PCR, Electrophoresis Experiment, and Pooled Typing Experiment
When a microsatellite with polymorphism is used as a DNA marker, an experiment such as PCR (Polymerase Chain Reaction) or electrophoresis is carried out to extract and detect the sites in the genome where microsatellites appear. PCR is an experimental technique whereby a pair of nucleotide sequences called primer sequences are designated at either end of a microsatellite, and then only those portions between the thus designated nucleotide sequences are repeatedly replicated as DNA fragments so as to obtain a predetermined amount of a sample. Electrophoresis, examples of which include gel electrophoresis and capillary electrophoresis, is an experimental technique involving causing an amplified DNA fragment to electrophorese in an electrically charged migration path so as to separate DNA fragments with different lengths. Thus, electrophoresis is a sample separation technique that takes advantage of the difference in migration speeds in a migration path depending on the length of DNA fragments (the longer the DNA fragment, the smaller its migration speed).
While an experimental technique involving gel electrophoresis has been described above, the same procedure can be performed for capillary electrophoresis. In capillary electrophoresis, samples are caused to migration over a thin tube filled with gel, and the time it takes for each sample to complete migrating a predetermined distance (normally to the end of the capillary) is measured so as to determine the length of the DNA fragments. In capillary electrophoresis, instead of scanning the samples in gel for fluorescent signals, samples are generally detected using a fluorescent signal detector fitted at the end of the capillary.
The experiment performed on a sample from a single individual involving PCR and electrophoresis, as in
Phenomena During an Actual Experiment
The aforementioned experimental results shown in
Gene that Produces Complex Polymorphism
The DNA marker in which a single-nucleotide in/del or a microsatellite with polymorphism appears in a combined manner is referred to as a compound marker. In a compound marker, complex polymorphisms are observed, such as an instance of polymorphism where, when microsatellites with the same number of repetitions exist, the fragment lengths are not necessarily the same.
Generally, there is no way of knowing whether or not a particular DNA fragment is a compound marker, or, if so, what the polymorphism contained in it is like, unless a DNA sequencing experiment is conducted. While the human genome has already been sequenced and made public, the published human genome information does not provide polymorphism information by itself. Further, although many polymorphisms have been reported in papers, many of them are merely based on PCR and electrophoresis experiments and stop short of DNA sequencing experiments; they simply state that “the fragment lengths differ from one individual to another.” Very few of such papers report that a particular DNA fragment is a compound marker.
Noises Produced in PCR and Electrophoresis Experiments
The aforementioned stutter peak becomes an issue when examining the genotype of an individual via an individual typing experiment, or when examining the frequency distribution of alleles in a group of samples via a pooled typing experiment. In an individual typing experiment, a peak that appears at the position corresponding to the nucleotide length of the original DNA fragment (to be hereafter referred to as “a true peak”), which should be observed, must be distinguished from a stutter peak, so that the true peak alone can be adopted as information indicating the genotype of the individual. On the other hand, in a pooled typing experiment, a stutter peak caused by a single allele influences the height of the peaks of the surrounding alleles, which leads to the problem that the results obtained do not reflect the true frequency distribution of the allele.
With reference to
In both individual typing and pooled typing, it is important to eliminate the influence of stutter peaks as accurate experiment results are to be obtained. Therefore, characteristics of stutter peaks have been widely studied, and the following properties are now known:
Property 1: When the DNA marker, individuals (alleles), and method of experiment are the same, the relative heights of stutter peaks are approximately the same over a plurality of experiments (see Non-patent Document 1).
Property 2: When attention is focused on a single DNA marker and a single individual, the stutter peak is lower than the true peak, and the height of the stutter peak becomes lower as the stutter peak moves away from the true peak (see Non-patent Document 2).
Property 3: When attention is focused on a single DNA marker, there is a linear relationship between the number of repetitions of a unit in a microsatellite and the relative height of the stutter peak (height of stutter peak divided by height of true peak), and the line representing this linear relationship is common to all DNA markers as long as the DNA marker is comprised of a repetition of two nucleotides (see Non-patent Document 3).
Patent Document 1: JP Patent Application No. 2004-192559
Patent Document 2: JP Patent Application No. 2004-262431
Non-patent Document 1: Perlin, M. W., et al., “Toward Fully Automated Genotyping: Allele Assignment, Pedigree Construction, Phase Determination, and Recombination Detection in Duchenne Muscular Dystrophy,” Am. J. Hum. Genet. 55, 1994, pp. 777-787
Non-patent Document 2: Perlin, M. W., et al., “Toward Fully Automated Genotyping: Genotyping Microsatellite Markers by Deconvolution,” Am. J. Hum. Genet. 57, 1995, pp. 1199-1210
Non-patent Document 3: Lipkin, E., et al., “Quantitative Trait Locus Mapping in Dairy Cattle by Means of Selective Milk DNA Pooling Using Dinucleotide Microsatellite Markers: Analysis of Milk Protein Percentage,” Genetics 149, July 1998, pp. 1557-1567
For both the individual typing experiment and the pooled typing experiment, methods for correcting experimental results taking advantage of properties 1 to 3 have been proposed. In these existing correction methods, a preliminary experiment is conducted in the preparatory stage of correction so as to determine, for a particular DNA marker, a formula for estimating the relative height of a stutter peak based on the length of a fragment. Thereafter, in the case of correction for an individual typing experiment, the true peak is identified by taking into consideration the relative height of each peak. In the case of correction for a pooled typing experiment, a correction process is performed whereby components derived from a stutter peak are subtracted from the waveform that is observed. For the determination of the aforementioned formula for determining the relative height of a stutter peak based on the length of a fragment, the following two methods have been proposed.
In one method, a DNA sequencing experiment is conducted on several to dozens of sample DNA markers so as to determine whether any of the markers is a compound marker, what the polymorphism of the compound marker is like if such marker is indeed a compound marker, and whether or not a microsatellite with polymorphism or a single-nucleotide in/del is included in the compound marker. Because the length of the fragment and the number of repetitions of a unit can be determined for each DNA marker based on the DNA sequencing experiment, the formula for estimating the relative height of a stutter peak based on the length of a fragment can be determined from a line representing the linear relationship between the number of repetitions of a unit and the relative height of a stutter peak that is common to all the DNA markers, and the relationship between the fragment length and the number of repetitions that has been determined by the DNA sequencing experiment.
The other method involves directly determining the relationship between the fragment length of a DNA marker and the relative height of a stutter peak based on an individual typing experiment for each of several to dozens of sample DNA markers. In order to determine the relative height of a stutter peak from a waveform obtained by an individual typing experiment for a particular individual, it is necessary to isolate the true peak and stutter peaks derived from the true peak from other noise peaks. However, in the case of a DNA marker that is heterozygous, two true peaks can appear in close proximity in some cases. In such cases, the resultant waveform is comprised of a complex superposition of the true peak, stutter peaks, and other noise peaks, which cannot be properly isolated. In view of this, it is necessary to prepare a large number of individuals for which individual typing experiments are conducted.
Generally, when correcting the experimental results of an individual typing experiment, the second method is employed. This is due to the fact that the second method can utilize the experimental results obtained by an individual typing experiment that has already been conducted, and that the fact that there is no need to perform an additional preliminary experiment. However, if many heterozygotes in which two true peaks appear in close proximity are included in the sample DNA markers used in an individual typing experiment, a problem arises that the relative height of a stutter peak cannot be estimated with sufficient accuracy due to the above-described reasons.
On the other hand, in a pooled typing experiment, both the first and the second methods are employed. However, as opposed to the case of the individual typing experiment, it is necessary to perform a preliminary experiment (a DNA sequencing experiment or an individual typing experiment) in addition to the pooled typing experiment regardless of which of the two methods is employed.
Thus, it is an object of the invention, which relates to a method and apparatus for displaying the results of the extraction and analysis of a DNA marker including a microsatellite with polymorphism or a single-nucleotide in/del via PCR and electrophoresis experiments, to provide a method and apparatus whereby experimental results that have already been obtained can be utilized in estimating the relative height of a stutter peak with high accuracy in both an individual typing experiment and an pooled typing experiment without the need to perform an additional preliminary experiment, and whereby experimental results can be displayed in which the influence of stutter peaks has been eliminated on the basis of the results of estimation.
With a view to achieving the foregoing object, the inventors conducted research and analysis on compound markers and have obtained the following information concerning regularity.
Information 1: As a result of progress in research into the gene sequences polymorphisms, several DNA markers have been analyzed to determine whether or not they are compound markers and, if so, what the polymorphisms they contain are like, and the relevant information is being accumulated in public databases.
Information 2: Whether or not a single-nucleotide in/del exists can be judged from waveforms that can be obtained from a pooled typing experiment or an individual typing experiment, without performing a DNA sequencing experiment. This is due to the fact that, in a waveform that is obtained from a DNA marker that includes only a microsatellite and that does not include a single-nucleotide in/del, peaks appear at unit-length intervals of a microsatellite, as shown in
Information 3: It can be said empirically that the greater the number of repetitions, the more highly polymorphic the marker. For example, when a microsatellite in which a unit is repeated five times is compared with a microsatellite in which a unit is repeated 20 times, it is empirically known that the latter exhibits greater variety of polymorphisms.
Information 4: There are more DNA markers that are known not to be compound markers than DNA markers that are known to be compound markers. It can be expected, therefore, that of the DNA markers that are not yet known to be either compound markers or not compound markers, there are more DNA markers that are not compound markers than DNA markers that are compound markers.
Information 5: Even for compound markers that include single-nucleotide in/del's, the number of repetitions of microsatellites can be uniquely calculated from the fragment length of the PCR amplification product if the unit length of the microsatellite is 3 nucleotides or longer. An example of a method for such calculation is shown in
Meanwhile, when the DNA marker includes a single-nucleotide in/del and when the published human genome sequence includes a single nucleotide insertion, the fragment length of the PCR amplification product would be either x nucleotides, the number of nucleotides x from which an integral multiple of the unit length is subtracted or to which such integral multiple is added, or, possibly, such numbers of nucleotides from which 1 has been subtracted. The relationship between the fragment length of the PCR amplification product and the number of repetitions of the unit in such cases would be as shown in graph 101. On the contrary, when the DNA marker includes a single-nucleotide in/del and when the published human genome sequence includes a single nucleotide deletion, the fragment length of the PCR amplification product could be x nucleotides, the number of nucleotides x from which an integral multiple of the unit length is subtracted or to which such an integral multiple is added, or, possibly, such numbers of nucleotides to which 1 has been added. The relationship between the fragment length of the PCR amplification product and the number of repetitions of the unit in such cases would be as shown in graph 102. Further, when the DNA marker includes a single-nucleotide in/del but it is not known whether it is an insertion or deletion, the relationship between the fragment length of the PCR amplification product and the number of repetitions of the unit can be predicted to be within the range shown by graph 103. Thus, even for a compound marker that includes a single-nucleotide in/del, a linear relationship between the fragment length of the PCR amplification product and the number of repetitions of the unit can be drawn from the result of an electrophoresis experiment as long as the unit length of the microsatellite is 3 nucleotides or longer.
Information 6: For a compound marker that includes a plurality of kinds of microsatellites with the same lengths (including a case where two or more microsatellites have polymorphisms), the relative height of a stutter peak can be calculated by taking advantage of property 3 even if the nucleotide sequence of each individual is not known. For example, assume a case where it has been found that a DNA marker of interest includes two microsatellites whose unit lengths are 2 nucleotides, for which it is not known whether or not the microsatellites have polymorphism, that the first microsatellite from the published human genome sequence is repeated n1 times, that the second microsatellite is repeated n2 times, and that the length of the original DNA marker before amplification is x nucleotides. The linear relationship mentioned with reference to property 3 is assumed to be r=a×m+b where r is the relative height of a stutter peak, m is the number of repetitions of the unit, and a and b are the slope and the intercept, respectively, of the line. Under these assumptions, when the length of a certain PCR amplification product is x+2×n nucleotides, a relationship n1+n2+n=n1′+n2′ holds for the number of repetitions n1′ of the first microsatellite and for the number of repetitions n2′ of the second microsatellite in the PCR amplification product. Accordingly, the relative height of a stutter peak in this allele can be calculated as follows:
Namely, in compound markers that include a plurality of kinds of microsatellites with the same unit lengths, even if it cannot be determined how the number of repetitions of each microsatellite has been increased or decreased (namely, the values of n1′ and n2′) by PCR amplification, the relative height of a stutter peak can be calculated for a particular nucleotide length if only information is available that the sample DNA that has been subjected to PCR amplification is a sample DNA of the published human genome sequence that has been increased or decreased by (unit length×n) nucleotides. This means that the relative height of a stutter peak can be calculated without the need to examine the nucleotide sequence of the PCR amplification product.
In view of the above information, the inventors have come to the conclusion that, in order to estimate the relative height of a stutter peak accurately so as to correct the experimental results of an individual typing experiment and a pooled typing experiment and to eliminate the need to perform an additional preliminary experiment, the following functions are required.
Function 1: DNA markers that are known to be compound markers and DNA markers that are known not to be compound markers are registered in a database in advance, and the database is referred to when estimating the relative height of a stutter peak. If a particular DNA marker is known to be not a compound marker, the relationship between the fragment length and the number of repetitions can be determined by referring to the sequence information, such as the human genome, without performing any additional preliminary experiment. On the contrary, even if the DNA marker is known to be a compound marker, the relationship between the fragment length and the number of repetitions can be known by comparatively observing the results of DNA sequencing experiments on many individuals, if such results are available. Thus, the first method for the calculation for estimating the relative height of a stutter peak from a fragment length can be utilized, without performing a DNA sequencing experiment as an additional preliminary experiment. This function is based on the aforementioned information 1.
Function 2-1: In a pooled typing experiment, by examining the intervals between peaks of a pooled typing waveform, it is determined whether or not a single-nucleotide in/del is included. This function can be realized by the procedure described with reference to the foregoing information 2.
Function 2-2: In an individual typing experiment, by examining the intervals between peaks of individual typing waveforms, it is determined whether or not a single-nucleotide in/del is included. This function can be realized by the procedure described with reference to the foregoing information 2.
Function 3: By examining, with reference to the published human genome sequence, whether or not a plurality of microsatellites with a number of repetitions are included, it is estimated whether a particular microsatellite is a compound marker. This function is based on the foregoing information 3.
Function 4: DNA markers that cannot be determined either to be compound markers or not by either function 1, function 2-1, function 2-2, or function 3 are estimated not to be compound markers. This function is based on the foregoing information 4.
With regard to DNA markers that cannot be determined either to be compound markers or not by any of the foregoing functions, the process can continue while presuming that such DNA markers are either compound markers or not. In an individual typing experiment, function 2-2 can be utilized, while in a pooled typing experiment, function 2-1 can be utilized. When a DNA marker is estimated not to be a compound marker by these functions, the relationship between the fragment length and the number of repetitions is estimated without performing an additional preliminary experiment. On the contrary, even when a DNA marker is estimated to be a compound marker, the following functions 5 and 6 can be utilized for many DNA markers.
Function 5: With regard to a compound marker that includes a single-nucleotide in/del and whose unit length is 3 nucleotides or longer, the relative height of a stutter peak is estimated by adjusting a linear regression line common to all of the DNA markers by a single nucleotide. This function is based on the foregoing information 5.
Function 6: With regard to DNA markers that include a plurality of microsatellites with polymorphisms in which unit lengths are the same, the relative height of a stutter peak is estimated by combining a plurality of linear regression lines. This function is based on the foregoing information 6.
Functions 5 and 6 allow the first method for determining the formula for estimating the relative height of a stutter peak based on the length of a fragment to be utilized for many of those DNA markers that are known to be compound markers but for which the experimental results of individual DNA sequencing cannot be utilized, or for many of those DNA markers that are estimated to be compound markers, without performing a DNA sequencing experiment as an additional preliminary experiment.
Function 7: Display Function
The results of estimation of the relative height of a stutter peak using functions 1 to 6 are displayed on a screen. Thus, the user can be shown the results of estimation of the relative height of a stutter peak and the data on which the results are based.
In order to realize those functions mentioned above, the invention provides an apparatus for displaying the results of analysis of the length of a DNA fragment based on a detection signal obtained from a PCR amplification product of said DNA fragment, comprising:
a compound marker determination unit for determining whether or not said DNA fragment is a compound marker having a plurality of sequence portions with polymorphisms;
a relative height estimation unit for determining, based on the results of estimation made by said compound marker determination unit, whether or not it is possible to estimate the relative relationship between the height of a true peak that corresponds to said detection signal from said PCR amplification product of said DNA fragment and the height of a stutter peak that corresponds to a detection signal from said PCR amplification product in which the number of repetitions of a unit in a microsatellite in said DNA fragment has been increased or decreased; and
a display unit for displaying the results of determination made by said relative height estimation unit.
The apparatus further comprises a means for storing known information about compound markers, wherein said compound marker determination unit determines whether or not said DNA fragment is a compound marker using said known information, and wherein said relative height estimation unit determines whether or not a relative relationship between the height of a true peak and the height of a stutter peak can be estimated using said known information.
The invention further provides an apparatus for displaying the results of analysis of the length of a DNA fragment from a detection signal obtained from a PCR amplification product of said DNA fragment, comprising:
a compound marker determination unit for determining whether or not said DNA fragment is a compound marker having a plurality of sequence portions with polymorphism;
a relative height estimation unit for estimating, based on the results of estimation made by said compound marker determination unit, the relative relationship between the height of a true peak that corresponds to said detection signal from said PCR amplification product of said DNA fragment and the height of a stutter peak that corresponds to a detection signal from said PCR amplification product in which the number of repetitions of a unit in a microsatellite in said DNA fragment has been increased or decreased; and
a display unit for displaying the results of determination made by said relative height estimation unit.
The invention further provides an apparatus for displaying the results of analysis of the length of a DNA fragment from a detection signal obtained from a PCR amplification product of said DNA fragment, comprising:
a compound marker determination unit for determining whether or not said DNA fragment is a compound marker having a plurality of sequence portions with polymorphism;
a relative height estimation unit for determining, based on the results of estimation made by said compound marker determination unit, whether or not it is possible to estimate the relative relationship between the height of a true peak that corresponds to said detection signal from said PCR amplification product of said DNA fragment and the height of a stutter peak that corresponds to a detection signal from said PCR amplification product in which the number of repetitions of a unit in a microsatellite in said DNA fragment has been increased or decreased;
a correction unit for correcting said detection signal from said PCR amplification product of said DNA fragment based on the results of estimation made by said relative height estimation unit; and
a display unit for displaying the results of analysis of the length of said DNA fragment based on a corrected detection signal.
The apparatus further comprises a means for storing known information about compound markers, wherein said compound marker determination unit determining whether or not said DNA fragment is a compound marker using said known information, and wherein said relative height estimation unit determines whether or not a relative relationship between the height of a true peak and the height of a stutter peak can be estimated using said known information.
The compound marker determination unit determines whether or not, based on the intervals of peaks in a waveform of said detection signal of said PCR amplification product of said DNA fragment, said DNA fragment includes a single-nucleotide in/del.
The compound marker determination unit acquires information about the number of repetitions of a unit in a microsatellite included in said DNA fragment by referring to the published genome sequence of said DNA fragment.
The relative height estimation unit, when said DNA fragment includes a single-nucleotide in/del, adjusts the results of estimation based on a linear relationship between the length of said DNA fragment and the sum of the number of repetitions of a unit in each microsatellite included in said DNA fragment by referring to the published genome sequence of said DNA fragment.
The relative height estimation unit, when a plurality of microsatellites are included in said DNA fragment, adjusts the results of estimation based on a linear relationship between the length of said DNA fragment and the number of repetitions of a unit in each microsatellite included in said DNA fragment by referring to the published genome sequence of said DNA fragment.
The display unit displays information on which the estimation made by said relative height estimation unit is based.
The invention further provides a method for displaying the results of analysis of the length of a DNA fragment based on a detection signal obtained from a PCR amplification product of said DNA fragment, comprising the steps of:
determining whether or not said DNA fragment is a compound marker having a plurality of sequence portions with polymorphisms;
determining whether or not it is possible to estimate, based on the results of determination made in the compound marker determination step, a relative relationship between the height of a true peak that corresponds to said detection signal from said PCR amplification product of said DNA fragment and the height of a stutter peak that corresponds to a detection signal from a PCR amplification product in which the number of repetitions of a unit in a microsatellite of said DNA fragment has increased or decreased; and
displaying the results of determination made by the relative height estimation step.
The method further comprises the step of acquiring known information about compound markers prior to the compound marker determination step, wherein it is determined, using said known information, in the compound marker determination step whether or not said DNA fragment is a compound marker, and wherein it is determined in the relative height estimation step whether or not it is possible to estimate the relative relationship between the height of said true peak and the height of a stutter peak using said known information.
The invention further provides a method for displaying the results of analysis of the length of a DNA fragment based on a detection signal obtained from a PCR amplification product of said DNA fragment, comprising the steps of:
determining whether or not said DNA fragment is a compound marker having a plurality of sequence portions with polymorphisms;
estimating, based on the results of determination made by the compound marker determination step, the relative relationship between the height of a true peak that corresponds to said detection signal from said PCR amplification product of said DNA fragment and the height of a stutter peak that corresponds to a detection signal from a PCR amplification product in which the number of repetitions in a unit in a microsatellite of said DNA fragment has increased or decreased; and
displaying the results of estimation made by the relative height estimation step.
The invention further provides a method for displaying the results of analysis of the length of a DNA fragment based on a detection signal obtained from a PCR amplification product of said DNA fragment, comprising the steps of:
determining whether or not said DNA fragment is a compound marker having a plurality of sequence portions with polymorphism;
estimating, based on the results of determination made by the compound marker determination step, the relative relationship between the height of a true peak that corresponds to said detection signal from said PCR amplification product of said DNA fragment and the height of a stutter peak that corresponds to a detection signal from a PCR amplification product in which the number of repetitions in a unit in a microsatellite of said DNA fragment has increased or decreased;
correcting said detection signal from said PCR amplification product of said DNA fragment based on the results of estimation made by the relative height estimation step; and
displaying the results of analysis of the length of said DNA fragment based on a corrected detection signal.
The method further comprises the step of acquiring known information about compound markers prior to the compound marker determination step,
wherein it is determined, using said known information, in the compound marker determination step whether or not said DNA fragment is a compound marker,
and wherein it is determined in the relative height estimation step whether or not it is possible to estimate the relative relationship between the height of said true peak and the height of a stutter peak using said known information.
The compound marker determining step comprises determining, based on the intervals of peaks in the waveform of said detection signal of said PCR amplification product of said DNA fragment, whether or not said DNA fragment includes a single-nucleotide in/del.
The compound marker determination step comprises acquiring information about the number of repetitions of a unit in a microsatellite included in said DNA fragment by referring to the published genome sequence of said DNA fragment.
The relative height estimation step comprises adjusting, when a single-nucleotide in/del is included in said DNA fragment, the results of estimation by referring to the published genome sequence of said DNA fragment and estimation in accordance with a linear relationship between the length of said DNA fragment and the number of repetitions of a unit in a microsatellite included in said DNA fragment.
The relative height estimation step comprises adjusting, when a plurality of microsatellites are included in said DNA fragment, the results of estimation by referring to the published genome sequence of the DNA fragment and based on a linear relationship between the length of said DNA fragment and the sum of the number of repetitions of a unit in each microsatellite included in said DNA fragment.
The display step comprises displaying information on which the estimation made by the relative height estimation step is based.
The invention further provides a program for causing a computer to carry out any one of the foregoing methods.
As described above, in accordance with the method and apparatus for displaying gene information according to the invention, when the result of extracting and analyzing a DNA marker including a microsatellite with polymorphism or a single-nucleotide in/del by PCR and electrophoresis experiments, the experiment results that have already been obtained can be utilized for estimating the relative height of a stutter peak with high accuracy without conducting an additional preliminary experiment, whether in an individual typing experiment or a pooled typing experiment. The experimental results or the like can then be corrected by eliminating the influence of stutter peaks based on the estimation made.
In particular, in accordance with the method and apparatus for displaying gene information according to the invention, by applying a formula for estimating the relative height of a stutter peak based on the length of a fragment, the relative height of a stutter peak can be estimated, without requiring an additional experiment, for the following DNA markers that are yet to be known to be whether a compound marker or not: (1) DNA markers that include both a single-nucleotide in/del and a microsatellite with a unit length of 3 nucleotides or longer; (2) DNA markers that include a plurality of microsatellites with the same unit length; and (3) DNA markers that are actually not compound markers.
With reference to the attached drawings, preferred embodiments are described of the method and apparatus for displaying gene information utilizing a gene frequency estimation system based on the utilization of published genome sequences according to the invention. FIGS. 1 to 11 show the embodiments of the invention, in which identical reference numerals designate identical elements with similar structures and operations.
Structure of a Gene Information Display System
The program memory 206 includes: a compound marker determination unit 208 for realizing the aforementioned functions 1, 2-1, 2-2, 3, and 4, namely, those functions for examining whether or not a DNA marker is a compound marker; a relative height estimation unit 209 for estimating the relative height of a stutter peak by realizing the aforementioned functions 1, 5, and 6, namely, functions for estimating the relative height of a stutter peak; an estimation result display unit 210 for displaying the results from the compound marker determination unit 208 and the relative height estimation unit 209; and an estimation result correction unit 211 for correcting waveform data using the results of estimation if the estimation is possible and a conventional technique.
The data memory 207 includes marker data 212 including the published human genome sequence data for each DNA marker, pooled typing data 213 including waveform data obtained as a result of a pooled typing experiment, and individual typing data 214 including waveform data obtained as a result of an individual typing experiment.
The data 302 has a NULL value when no calculations have been made. The data 303 has a NULL value when no calculations have been made, or when it is unknown whether or not there is a single-nucleotide in/del. The data 304 has a NULL value when no calculations have been made, or when it is unknown whether or not the DNA marker is a compound marker that includes a plurality of microsatellite with polymorphism. The data 305 holds data in the form of a sequence of a data structure FragmentSizeRepeatNumberData, as will be described below, for DNA markers that are known to be compound markers and for which nucleotide sequence frequency information can be utilized. For other DNA markers, the data 305 has a NULL value.
The data structure FragmentSizeRepeatNumberData [ ] includes, for a number j of fragment lengths that a single DNA marker has as an allele, fragment lengths 307 and a list 308 of structures of microsatellites that correspond to the fragment lengths. The data 308 are stored in the form of sequences of a data structure RepeatNumberData. The data structure RepeatNumberData [ ] includes, regarding the structures of a number k of microsatellites that a single allele has, an intra-group proportion 309 and a microsatellite structure content 310. In the data shown in
Process Performed by the Gene Information Display System
In the following, a process performed by the gene information display system of the present embodiment, which is configured as described above, are described.
With reference to
Thereafter, it is examined by the compound marker determination unit 208 whether or not the target DNA marker is a compound marker (step 601). This process will be described later with reference to
The relative height of a stutter peak that appears in the waveform data of the target DNA marker is then estimated by the relative height estimation unit 209 (step 602). This process will be later described in detail with reference to
When the estimation impossibility flag is not set to be TRUE, the results of estimation is displayed on the screen by the estimation result display unit 210 (step 604). The details of the display process will be described later with reference to
On the other hand, when the estimation impossibility flag is set to be TRUE, a message is displayed by the estimation result display unit 210 on the screen to the effect that an additional preliminary experiment is required (step 606). The process of this display will be described later in greater detail with reference to
Process for Examining Whether or not a Target DNA Marker is a Compound Marker
With reference to a detailed flowchart shown in
Process for Determining the Relationship Between the Fragment Length and the Relative Height of a Stutter Peak
Details of the process for determining the relationship between the fragment length and the relative height of a stutter peak that is performed at step 602 shown in
With reference to an example of a display screen shown in
With reference to an example of a display screen shown in
While the invention has been described in the foregoing only with reference to a stutter peak as noise that is caused during the PCR and electrophoresis experiment processes, the invention can also be applied when a noise referred to as a +A peak is caused. This is due to the fact that functions 1, 3, 4, 5, 6, and 7, which do not involve waveform data, are not affected by +A peaks. Nor are functions 2-1 and 2-2 affected by +A peaks. As described in Patent Document 1, the way +A peaks appear in waveforms obtained in a single experiment conducted on a single sample (namely, the relative height of +A peaks relative to the original peaks) is substantially constant. Therefore, it can be concluded that, when no peaks appear at the unit length intervals, +A peaks appear and there is no single-nucleotide in/del if the ratio of height of two peaks that are spaced apart from one another by the length of a single nucleotide is constant, and that there is a single-nucleotide in/del if the ratio is not constant.
While the method and apparatus for displaying gene information according to the invention have been particularly shown and described with reference to preferred embodiments thereof, it will be understood by those skilled in the art that the foregoing and other changes in form and details can be made therein without departing from the spirit and scope of the invention.
While the human genome sequence information is currently open to the public, sequencing of the genomes of other animal species has not been completed and their sequence information that is available is limited. It goes without saying, however, that the method and apparatus for displaying gene information according to the invention will be able to utilize sequence information about other animal species when such sequence information is made public in the future.
The method and apparatus for displaying gene information can be realized on a computer having memory means, input means, display means, and so on. Information processing, such as the displaying of the result of a gene analysis experiment, estimation of a noise peak, and correction of experimental result based on the estimated result, can be performed using the aforementioned hardware resources including the memory means, input means, and display means. Thus, the invention can be industrially utilized.
Number | Date | Country | Kind |
---|---|---|---|
2004-353068 | Dec 2004 | JP | national |