The present invention relates to a spectroscopic analysis apparatus, a spectroscopic analysis method, and a program, and more particularly, to a spectroscopic analysis apparatus, a spectroscopic analysis method, and a program that perform an analysis using spectra obtained by dispersing light generated in a sample.
An apparatus for specifying a gene locus is disclosed (Patent literature 1). Patent literature 1 discloses using capillary electrophoresis. Patent literature 1 further discloses performing labeling using fluorescence. Patent literature 1 further discloses using Raman spectrometry.
Patent literature 1: Published Japanese Translation of PCT International Publication for Patent Application, No. 2005-527904
Patent literature 1 discloses a method for performing an analysis using a computer program. There are some cases, however, in which samples cannot be appropriately analyzed by the method disclosed in Patent literature 1.
The present invention aims to provide a spectroscopic analysis apparatus, a spectroscopic analysis method, and a program capable of appropriately analyzing a sample.
A spectroscopic analysis apparatus according to one exemplary aspect includes: a light source that generates light incident on a sample including a plurality of substances labeled by a plurality of labeled substances; a spectrometer that disperses observed light generated in the sample by the light incident on the sample; a detector that detects the observed light dispersed by the spectrometer to output observed spectral data; and a processor that analyzes the plurality of substances included in the sample based on the observed spectral data output from the detector, the processor analyzing the substances included in the sample from the observed spectral data using a generalized inverse of a matrix including, as an element, reference spectrum data set for the plurality of labeled substances.
A spectroscopic analysis method according to one exemplary aspect of the present invention includes: irradiating a sample including a plurality of substances labeled by a plurality of labeled substances with light; dispersing observed light generated in the sample by the light incident on the sample; detecting the observed light that is dispersed to output observed spectral data; obtaining a generalized inverse of a matrix having, as an element, reference spectrum data set for the plurality of labeled substances; and analyzing the substances included in the sample using the generalized inverse and the observed spectral data.
A program according to one exemplary aspect of the present invention is a program for causing a computer to execute a spectroscopic analysis method that analyzes a sample using observed spectral data obtained by performing spectrometry for light generated in the sample, in which: the spectroscopic analysis method obtains a generalized inverse of a matrix of reference spectrum data using, as a matrix, the reference spectrum data set for a plurality of labeled substances that label the plurality of substances included in the sample, and the spectroscopic analysis method analyzes the substances included in the sample using the observed spectral data and the generalized inverse.
According to the present invention, it is possible to provide a spectroscopic analysis apparatus, a spectroscopic analysis method, and a program capable of appropriately analyzing a sample.
With reference to the accompanying drawings, an exemplary embodiment of the present invention will be described. The exemplary embodiment described below is an example of the present invention and the present invention is not limited to the following exemplary embodiment. Throughout the specification and the drawings, the same components are denoted by the same reference symbols.
In this exemplary embodiment, a DNA sequence analysis is performed using a plurality of fluorescent substances having different emission wavelengths. Specifically, DNA is extracted from human cells. DNA fragments are amplified by a polymerase chain reaction (PCR) and are labeled by the fluorescent substances. The fluorescent substance may be, for example, 5-FAM, JOE, NED, and ROX. As a matter of course, the fluorescent substance used for the labeling is not particularly limited. In this example, a plurality of fluorescent substances having different peak wavelengths are used for the labeling. Different bases are labeled by different fluorescent substances.
Different PCR products labeled by fluorescence are supplied to a capillary and are electrophoresed in gel. In a state in which a voltage is applied by electrophoresis, the migration velocity varies depending on the size of the DNA fragments. The migration distance increases with decreasing number of bases. It is therefore possible to separate the DNA fragments by size. When PCR products in the capillary are irradiated with excitation light emitted from a light source, fluorescence is generated from fluorescent substances. The fluorescence generated from the fluorescent substances is spectroscopically measured to obtain observed spectral data. The observed spectral data is obtained for each size of the DNA fragments. By analyzing these observed spectral data, it is possible to quantify DNA of a particular sequence and to execute DNA testing.
While the spectroscopic analysis apparatus is used for DNA testing in this exemplary embodiment, the application of the spectroscopic analysis apparatus according to this exemplary embodiment is not limited to the DNA testing. The spectroscopic analysis apparatus according to this exemplary embodiment can be applied to a spectroscopic analysis apparatus that analyzes the spectrum of the fluorescence generated from the sample that has labeled the substances by a fluorescence probe. It is possible, for example, to analyze nucleic acid, proteins and the like. The spectroscopic analysis apparatus may be used to identify the substances, for example. Further, it is possible to label the substances included in the sample by labeled substances other than the fluorescent substances. The labeled substances may be preferably substances having different light peak wavelengths.
With reference to
PCR products including DNA fragments labeled by fluorescent substances are injected into the injection part 11. In this example, the DNA fragments which are the sample are labeled by a plurality of fluorescent substances. For example, fluorescent substances such as 5-FAM, JOE, NED, and ROX are used depending on the base sequence of the DNA fragments. As a matter of course, the type and the number of the fluorescent substances used for the labeling are not particularly limited.
The injection part 11 is communicated with the capillary 12 on the microchip 20. Electrodes (not shown) are arranged on both ends of the capillary 12 provided in the microchip 20 and a voltage is applied to the electrodes. The capillary 12 and the injection part 11 are filled with an electrophoresis medium such as agarose gel. Accordingly, since the electrophoretic velocity becomes low according to the number of bases of the DNA fragments, the DNA fragments are separated by size.
The light source 13 irradiates the medium in the capillary with light. The light source 13 may be, for example, an argon ion laser light source that emits excitation light having a wavelength of 488 nm or 514.5 nm.
The light emitted from the light source 13 is incident on the capillary 12. In this example, 8-lane capillaries 12 are provided in parallel in the microchip 20. When the 8-lane capillaries 12 are irradiated with excitation light, the fluorescent substances that label the DNA fragments in the capillary 12 generate fluorescence. The fluorescence generated by the fluorescent substances is observed light.
The fluorescence generated by the fluorescent substances in the sample is input to the spectrometer 14. The spectrometer 14 includes, for example, a prism or diffraction grating, and disperses the fluorescence. In summary, the fluorescence is spatially dispersed according to the wavelength. The fluorescence spatially dispersed by the spectrometer 14 is input to the detector 15. Accordingly, the fluorescence generated by the fluorescent substances becomes observed light observed by the detector.
The detector 15 is, for example, a photodetector such as a CCD device, and includes light-receiving elements arranged along a dispersion direction. Accordingly, fluorescence having different wavelengths is detected for each of the light-receiving elements arranged in the dispersion direction. The detector 15 detects the spectra of the fluorescent substances that have labeled the DNA fragments and outputs the observed spectral data to the processor 16. For example, the spectrum having a wavelength region of 640 to 860 nm is detected by the spectrometer 14 and the detector 15. As a matter of course, the wavelength region that can be spectroscopically measured by the spectrometer 14 and the detector 15 is not particularly limited. The wavelength region can be appropriately set according to the excitation light wavelength or the fluorescent substance used as a label.
The detector 15 outputs to the processor 16 the light intensity in each wavelength that can be observed as observed spectral data. The number of pieces of data included in the observed spectral data varies according to the dispersion performance or the like of the spectrometer 14.
The processor 16 is an information processing device such as a personal computer, and performs processing according to a control program. Specifically, the processor 16 stores an analysis program that analyzes the observed spectral data output from the detector 15. The processor 16 executes processing according to the analysis program. The processor 16 analyzes the plurality of substances included in the sample based on the observed spectral data output from the detector 15. The concentration of the DNA fragments is thus obtained. It is therefore possible to perform DNA testing.
The processing in the processor 16 is one of the characteristics of the spectroscopic analysis method according to this exemplary embodiment. In the following description, the processing in the processor 16 will be described.
In
The reference spectra 51 to 54 of the fluorescent substances are known and are different depending on the fluorescent substance. In short, the reference spectra have different peak wavelengths. For example, the reference spectrum 51 of 5-FAM has a peak wavelength of about 540 nm, the reference spectrum 52 of JOE has a peak wavelength of about 560 nm, the reference spectrum 53 of NED has a peak wavelength of about 580 nm, and the reference spectrum 54 of ROX has a peak wavelength of about 610 nm.
The observed spectrum detected by the detector 15 is obtained by overlapping the reference spectra 51-54 shown in
When the concentration of the fluorescent substances included in the sample is obtained, windows 41 to 44 each having a predetermined wavelength width are normally set. The window 41 is set to a value close to the peak wavelength of the reference spectrum 51 of 5-FAM, the window 42 is set to a value close to the peak wavelength of the reference spectrum 52 of JOE, the window 43 is set to a value close to the peak wavelength of the reference spectrum 53 of NED, and the window 44 is set to a value close to the peak wavelength of the reference spectrum 54 of ROX. The light intensity data of the observed spectral data is accumulated for each of the windows 41 to 44.
The concentration of the fluorescent substances is obtained from the integrated value of each of the windows 41 to 44. For example, the concentration of 5-FAM, JOE, NED, and ROX are respectively set to b, g, y, and r. Further, the integrated values of the windows 41 to 44 are respectively set to I540, I560, I580, and I610. By solving the simultaneous equations with four unknowns shown in the following Expression (1) for b, g, y, and r, the concentration of the fluorescent substances is obtained.
I
540
=bx
b
+gy
b
+yb
b
+rw
b
I
560
=bx
g
+gy
g
+yb
g
+rw
g
I
580
=bx
y
+gy
y
+yb
y
+rw
y
I
610
=bx
r
+gy
r
+yb
r
+rw
r (1)
Here, the integrated values of the windows 41 to 44 in the reference spectrum 51 are respectively denoted by coefficients xb, yb, bb, and wb. In a similar way, the integrated values of the windows 41 to 44 in the reference spectrum 52 are respectively denoted by coefficients xg, yg, bg, and wg, the integrated values of the windows 41 to 44 in the reference spectrum 53 are respectively denoted by coefficients xy, yy, by, and wy, and the integrated values of the windows 41 to 44 in the reference spectrum 54 are respectively denoted by coefficients xr, yr, br, and wr. Since the reference spectra 51 to 54 of each fluorescent substance are known, these coefficients are all known. Accordingly, the processor 16 solves the above simultaneous equations for b, g, y, and r, whereby it is possible to obtain the concentration of the fluorescent substances.
When the windows 41 to 44 according to the peak wavelength of the fluorescent spectrum are set as described above, however, the analysis may not be appropriately performed. For example, it may be difficult to set the windows 41 to 44 according to the peak wavelength of the fluorescent spectrum. When the width of the windows 41 to 44 is narrow, for example, the number of pieces of information to be accumulated becomes small and the noise increases. This is because noise normally decreases proportional to the square root of the number to be accumulated. In summary, while it is advantageous to make the width of the windows 41 to 44 wider in terms of S/N, data of another fluorescent substance is included if the windows 41 to 44 are too wide. It is therefore difficult to set appropriate windows 41 to 44.
However, it is possible to make an appropriate analysis by using the spectroscopic analysis method according to this exemplary embodiment. In order to simplify the following description, a case in which the sample is labeled by two fluorescent substances will be described.
It is assumed that two fluorescent substances include reference spectra 61 and 62 as shown in
The processor 16 calculates the generalized inverse of a matrix having, as an element, light intensity data of the reference spectra 61 and 62 set for the plurality of labeled substances. The data of the generalized inverse is shown as generalized inverse data 63 and 64 in the graph shown in
The matrix of the light intensity data in each wavelength included in the observed spectral data is denoted by b. When the observed spectral data includes, m (m is an integer larger than 2) pieces of light intensity data, for example, the matrix b has m rows and one column. The elements included in the matrix b are denoted by b1, b2, . . . bm.
Further, the matrix of the light intensity data included in the reference spectra 61 and 62 of the two fluorescent substances is denoted by A. The matrix A has m rows and two columns. The elements of the matrix A are m pieces of light intensity data A11, A21, A31, . . . Am1 included in the reference spectrum 61 and m pieces of light intensity data A12, A22, A32, . . . Am2 included in the reference spectrum 62. The light intensity data A11, A21, A31, . . . Am1 are the elements of the first row and the light intensity data A12, A22, A32, . . . Am2 are the elements of the second row. Since the number of fluorescent substances that label the sample is 2, the matrix A has m rows and two columns. The number of rows of the matrix A increases in accordance with the increase in the number of fluorescent substances to be used. When the sample is labeled by four fluorescent substances corresponding to four bases, for example, the matrix A has m rows and four columns.
Note that the number of pieces of light intensity data of the reference spectra 61 and 62 is the same as the number of pieces of light intensity data included in the observed spectrum. In summary, the wavelength where the light intensity data is present is the same in the observed spectrum and the reference spectra 61 and 62. As a matter of course, when the number of pieces of data of the reference spectra 61 and 62 is different from the number of pieces of observed spectrum data, the number of pieces of data may be made the same by complementing data.
Further, the matrix of the concentration of the fluorescent substances included in the sample is denoted by x. Since the number of fluorescent substances used for the labeling is two, the matrix x has two rows and one column. The elements included in the matrix x are denoted by x1 and x2. The processor 16 executes processing for obtaining the matrix x.
In each wavelength, the following Expression (2) is established.
bj=Aj1×x1+Aj2×x2 (2)
Note that j is any integer from 1 to m. From the product of the concentration of the fluorescent substances used for the labeling and the light intensity data of the reference spectrum in one wavelength, the light intensity data of the observed spectrum in this wavelength can be calculated. Since Expression (2) is established for any desired wavelength, when Expression (2) is expressed using the matrix A, the matrix b, and the matrix x, Expression (3) in
In an ideal measurement, Expression (3) in
Since A is not a square matrix, there is no inverse matrix. It is also possible, however, to calculate a generalized inverse (or generalized inverse matrix). By using the generalized inverse, x can be calculated from Expression (3) shown in
It is assumed that the matrix is AT=two rows and m columns. As shown in Expression (5) in
x=(ATA)−1ATb (6)
Expression (6) means obtaining the least squares solution that minimizes the error r shown in Expression (4) in
It is possible to calculate the matrix x by multiplying the matrix b of the observed spectrum by (ATA)−1AT. It is therefore possible to obtain the concentration of the fluorescent substances. When C=(ATA)−1AT, for example, C is the generalized inverse. The product of the generalized inverse C of A and the matrix b is then obtained. The element of the generalized inverse (ATA)−1AT is generalized inverse data 63 and 64 shown in
It is therefore possible to calculate the concentration of the plurality of fluorescent substances used for the labeling in a simple way. Further, since the windows 41 to 44 are not set as shown in
As described above, the processor 16 analyzes the plurality of substances included in the sample based on the observed spectral data output from the detector 15. Accordingly, the processor 16 obtains the generalized inverse of the matrix of the data of the reference spectra 61 and 62 using, as a matrix, the data of the reference spectra 61 and 62 set for the plurality of labeled substances that label the plurality of substances. The processor 16 analyzes the substances included in the sample using the observed spectral data and the generalized inverse. If the generalized inverse of the matrix of the reference spectrum is calculated in advance, the processing can be executed in a shorter period of time.
It is therefore possible to perform an analysis using a larger number of observed spectral data. It is therefore possible to appropriately analyze the sample based on the spectrum of the fluorescence and to perform DNA testing with a small measurement error.
As described above, by electrophoresing the PCR amplified sample, the DNA fragments are separated by size. The DNA fragments in the capillary are irradiated with light to detect the observed spectrum in each size of the DNA fragments. The plurality of observed spectra are subjected to the above processing to calculate the concentration of each base. The distribution of the concentration of the bases is obtained for each size of the DNA fragments. The DNA testing is carried out according to the base sequence of the DNA fragment. It is therefore possible to perform DNA testing with higher accuracy.
The control for analyzing the above sample may be executed by a computer program. The control program described above can be stored and provided to a computer using any type of non-transitory computer readable media. Non-transitory computer readable media include any type of tangible storage media. Examples of non-transitory computer readable media include magnetic storage media (such as flexible disks, magnetic tapes, hard disk drives, etc.), optical magnetic storage media (e.g. magneto-optical disks), CD-ROM (Read Only Memory), CD-R, CD-R/W, and semiconductor memories (such as mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (Random Access Memory), etc.). The program may be provided to a computer using any type of transitory computer readable media. Examples of transitory computer readable media include electric signals, optical signals, and electromagnetic waves. Transitory computer readable media can provide the program to a computer via a wired communication line (e.g. electric wires, and optical fibers) or a wireless communication line.
Further, the exemplary embodiment of the present invention includes not only the case in which the functions of the above exemplary embodiment are achieved by the computer executing the program that achieves the functions of the above exemplary embodiment but also a case in which this program achieves the functions of the above exemplary embodiment in collaboration with an application software or an operating system (OS) operated on the computer.
While the present invention has been described with reference to the exemplary embodiment, the present invention is not limited to the above exemplary embodiment. Various changes that can be understood by those skilled in the art may be made on the configuration and the details of the present invention within the scope of the present invention.
This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2012-206023, filed on Sep. 19, 2012, the disclosure of which is incorporated herein in its entirety by reference.
The spectrometry analysis apparatus according to the present invention can be applied to analyze DNA, nucleic acid, proteins and the like.
Number | Date | Country | Kind |
---|---|---|---|
2012-206023 | Sep 2012 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2013/002371 | 4/5/2013 | WO | 00 |