The present invention relates to a device, method, and program for analyzing a mass spectrum measured for samples.
Recently, MALDI-TOF-MS (Matrix-Assisted Laser Desorption/Ionization Time-Of-Flight Mass Spectrometry) has come into wide use. The MALDI-TOF-MS performs the mass spectrometric analysis on proteins in blood, for example, to thereby provide the diagnosis of diseases, the biochemical elucidation of precise mechanism of disease development, and so on. Specifically, the mass spectrum of proteins which increase in blood as cancer spreads is measured and analyzed so as to find a pattern for distinguishing between cancer and non-cancer, and make a judgment using the pattern as a reference.
In the MALDI-TOF-MS, the analysis of a peak of a mass spectrum is important. Conventionally, the analysis of a mass spectrum has been carried out by the hands of operators. Specifically, the conventional procedure collects a plurality of samples from each of a normal healthy person and a patient and measures mass spectra for the samples. It then visually overlaps the plurality of mass spectra to extract a characteristic peak which exhibits a difference between the normal healthy person and the patient. However, the human perceptual judgment varies and thus fails to provide highly reproducible analysis. Further, it takes a long time for the analysis. Particularly, if there are a large number of samples, the analysis takes a long time to cause inefficiency and fails to demonstrate high reproducibility.
Further, a data processing device that calculates the area or height of each peak for data of two chromatograms and casts a difference in the calculated area or height into a histogram for each peak is disclosed (Patent Document 1). However, the data processing device operates to compare two chromatograms and therefore it is not suitable for the analysis of mass spectra for various biological samples.
[Patent Document 1]
Japanese Unexamined Patent Application Publication No. 9-210983
A biological sample, such as blood, which is taken from a living body generally exhibits wide variation in a mass spectrum. Therefore, a comparison of data between a normal healthy person and a patient does not always reveal a significant difference. Due to being a biological sample, the position, height, area and so on of the peaks of a mass spectrum can vary even if the mass spectrum is obtained from the same patient because of an individual difference in living body itself, a change in health condition, and so on. Further, because of the presence of atom isotope and the coexistence of a plurality of biochemical substances, the analysis is complicated. If a mass spectrum is measured with different mass spectrometers, a difference occurs in the mass spectrum due to device settings.
However, such variation factors, if any, are not as critical as causing the mass spectral peak to completely lose its characteristics. Thus, the peak characteristics appear so far forth as patterns of a patient and a normal healthy person can be distinguished by perceptual judgment of a skilled person.
The present invention has been accomplished to solve the above problems and an object of the present invention is thus to provide a mass spectrum analysis device, analysis method, and analysis program capable of accurately analyzing a mass spectrum for samples.
According to a first aspect of the present invention, there is provided a mass spectrum analysis device for analyzing mass spectrum measured for a plurality of samples, including peak position detection means (e.g. a peak position detection unit 14 according to an embodiment of the present invention) for detecting a peak position where the mass spectrum is at its peak, and coincidence degree calculation means (e.g. a coincidence degree calculation unit 15 according to an embodiment of the present invention) for calculating a coincidence degree of peaks according to the number of peak positions detected in a plurality of mass spectra that is contained in a window having a width for a mass number. This enables accurate analysis of a mass spectrum.
According to a second aspect of the present invention, there is provided the above-described mass spectrum analysis device wherein weights are assigned to the number of peaks according to a position of the window. This enables accurate analysis of a mass spectrum.
According to a third aspect of the present invention, there is provided the above-described mass spectrum analysis device wherein the plurality of mass spectra are measured for each of two different groups of samples, and the device further includes coincidence degree difference calculation means (e.g. a coincidence degree difference calculation unit 16 according to an embodiment of the present invention) for calculating a difference in coincidence degree between the two different groups. This enables accurate analysis of a mass spectrum.
According to a fourth aspect of the present invention, there is provided a mass spectrum analysis method for analyzing mass spectrum measured for a plurality of samples, including a peak position detection step (e.g. a peak position detection step S102 according to an embodiment of the present invention) for detecting a peak position where the mass spectrum is at its peak, and a coincidence degree calculation step (e.g. a coincidence degree calculation step S103 according to an embodiment of the present invention) for calculating a coincidence degree of peaks according to the number of peak positions detected in a plurality of mass spectra that is contained in a window having a width for a mass number. This enables accurate analysis of a mass spectrum.
According to a fifth aspect of the present invention, there is provided the above-described mass spectrum analysis method wherein weights are assigned to the number of peaks according to a position of the window. This enables accurate analysis of a mass spectrum.
According to a sixth aspect of the present invention, there is provided the above-described mass spectrum analysis method wherein the plurality of mass spectra are measured for each of two different groups of samples, and the method further includes a coincidence degree difference calculation step (e.g. a coincidence degree difference calculation step S105 according to an embodiment of the present invention) for calculating a difference in coincidence degree between the two different groups. This enables accurate analysis of a mass spectrum.
According to a seventh aspect of the present invention, there is provided a mass spectrum analysis program for analyzing mass spectrum measured for a plurality of samples, causing a computer to implement a method including a peak position detection step for detecting a peak position where the mass spectrum is at its peak, and a coincidence degree calculation step for calculating a coincidence degree of peaks according to the number of peak positions detected in a plurality of mass spectra that is contained in a window having a width for a mass number. This enables accurate analysis of a mass spectrum.
According to an eighth aspect of the present invention, there is provided the above-described mass spectrum analysis program wherein weights are assigned to the number of peaks according to a position of the window. This enables accurate analysis of a mass spectrum.
According to a ninth aspect of the present invention, there is provided the above-described mass spectrum analysis program wherein the plurality of mass spectra are measured for each of two different groups of samples, and the method further includes a coincidence degree difference calculation step for calculating a difference in coincidence degree between the two different groups. This enables accurate analysis of a mass spectrum.
The present invention provides a mass spectrum analysis device, analysis method, and analysis program capable of accurately analyzing a mass spectrum for samples.
Embodiments of the present invention are described hereinbelow. The explanation provided hereinbelow merely illustrates exemplary embodiments of the present invention, and the present invention is not limited to the below-described embodiments. The description hereinbelow is appropriately shortened and simplified to clarify the explanation. A person skilled in the art will be able to easily change, add, or modify various elements of the below-described embodiments, without departing from the scope of the present invention. In the figures, the identical reference symbols denote identical structural elements and the redundant explanation thereof is omitted.
In order to compare mass spectra of samples of patients suffering from particular disease and samples of normal healthy persons, the present invention collects biological samples from a plurality of patients and a plurality of normal healthy persons and measures a mass spectrum for each sample. Then, the invention compares the measured mass spectra of the patients and the normal healthy persons to thereby obtain characteristic peaks appearing in the mass spectra. In this embodiment, results of the manual analysis by a skilled person are shown for comparison with results of the analysis according to the present invention.
A mass spectrum analysis device according to the present invention is described hereinafter with reference to
The analysis device 10 according to the present invention may be a processing unit such as a personal computer, for example, and analyzes the mass spectrum which is measured by the measurement device 20. The measurement device 20 may include a flight mass spectrometer that is used for MALDI-TOF-MS (Matrix-Assisted Laser Desorption/Ionization Time-Of-Flight Mass Spectrometry). The measurement device 20 measures the mass spectrum of proteins contained in biological samples such as blood, urine, bodily fluid, or cerebrospinal fluid.
The measurement device 20 applies laser light onto proteins and vaporizes samples to dissociate them into free ions. It then lets the protein ions to travel through the electric field in vacuum and determines the mass number based on a time for the ions to reach a detector. The mass number actually indicates the value of mass number/charge. The time of flight mass spectrometer utilizes the fact that the particles which are given the same energy by a uniform electric field have a different velocity depending on its mass. The particle with a high mass number travels at a low speed and thus takes a long time for traveling. On the other hand, the particular with a low mass number travels at a high speed and thus takes a short time for traveling. Therefore, the traveling time changes according to the mass number.
The measurement device 20 measures a mass spectrum based on the current which is detected according to the traveling time. The traveling time corresponds to the mass number and the detected current corresponds to the intensity. The mass spectrum of the proteins existing in the biological sample is thereby measured.
The output from the measurement device 20 is input to the analysis device 10 through the data I/F unit 11. The data I/F unit 11 converts analog data from the measurement device 20 into digital data, for example. The mass spectrum data which can be analyzed in the analysis device 10 is thereby input to the analysis device 10. The analysis device 10 includes a storage device such as a hard disk (not shown) to store the input mass spectrum data. The stored mass spectrum data is analyzed by the analysis unit 13. The analysis unit 13 includes a processing circuit having CPU, memory and so on and implements a prescribed analysis processing based on the input mass spectrum and outputs the analysis result.
The analysis unit 13 includes the peak position detection unit 14, the coincidence degree calculation unit 15, and the coincidence degree difference calculation unit 16. The peak position detection unit 14 detects the positions of peaks from the input mass spectrum data. Specifically, the peak position detection unit 14 detects the mass number at the top of a peak. The peak position detection unit 14 detects the peak positions for each of the input mass spectra. Normally, a plurality of peak positions are detected from one mass spectrum.
The coincidence degree calculation unit 15 calculates the coincidence degree of the peak positions which are detected from a plurality of mass spectra. The coincidence degree is a value indicating how closely coincide are the peak positions of a plurality of mass spectra. For example, if four peak positions of five mass spectra are coincide, the coincidence degree is ⅘=0.8. Thus, the coincidence degree is a value indicating in how many cases out of the entire samples are the peaks recognized for each mass number. Further, the present invention sets a window which has a certain width for the mass number to calculate the coincidence degree. If a peak position of a mass spectrum falls within the window, the coincidence degree is counted up as the peaks being coincide. This eliminates dropouts of the characteristic peaks even if there are variation factors in a mass spectrum. It is therefore suitable for use in the analysis of a biological sample with wide variations. The window width can be adjusted by a user through the input unit 12 having input devices such as a keyboard and a mouse. The peak positions which appear frequently in the target biological sample can be thereby obtained.
The coincidence degree difference calculation unit 16 calculates a difference between the two coincidence degrees which are calculated in the coincidence degree calculation unit 15. For example, the mass spectra of a plurality of samples respectively for patients and normal healthy persons are measured. Based on the coincidence degrees of the two groups, the coincidence degree difference calculation unit 16 calculates a difference of the two. It thus calculates a difference in coincidence degree between the group of patients and the group of normal healthy persons. The characteristic peaks which exhibit a difference between the patients and the normal healthy persons are thereby obtained. The display unit 17 includes a monitor such as a liquid crystal display to display analysis results. From the displayed analysis results, a user can be informed of the characteristic peaks which appear in particular disease, for example.
A process of the analysis method is described hereinafter with reference to
Then, the peak positions are detected from each mass spectrum (Step S102). This step performs the same processing on the patients and the normal healthy persons and detects the peak positions for each sample. The peak positions are stored in association with the input group. After detecting the peak positions of all the input mass spectra, a coincidence degree is calculated for each group. Firstly, the coincidence degree of the peaks is calculated for the data of the group A (Step S103). The peak positions which frequently appear in the mass spectra of the normal healthy persons are thereby obtained. Then, the coincidence degree of the peaks is calculated for the data of the group B (Step S104). The peak positions which frequently appear in the mass spectra of the patients are thereby obtained.
After that, a difference in coincidence degree is calculated based on the coincidence degrees of the group A and the group B (Step S105). This step obtains a difference between the coincidence degree of the group A and the coincidence degree of the group B. The peak position at which the difference in coincidence degree is equal to or higher than a prescribed value is determined as a differential peak (Step S106). A user inputs an arbitrary value through the input unit 12 and display the differential peaks as a table on the display unit 17. The value input by a user serves as a threshold, and the peak positions at which a difference in coincidence degree is equal to or higher than the threshold are displayed. Based on the peak positions, a user can determine whether a subject is a patient or a normal healthy person from a newly measured mass spectrum. For example, a difference in coincidence degree is large at the peak position which appears frequently for a patient and appears scarcely for a normal healthy person. A user observes whether or not the new mass spectrum has its peak at such a peak position to thereby determine whether a subject is a patient or a normal healthy person.
An analysis processing is described hereinafter using actual mass spectrum data and analysis data.
The measurement is conducted on four persons serving as test subjects two times each during illness and after recovery, so that the mass spectra of total sixteen cases are obtained. The mass spectrum during illness is referred to as the data of patients, and the mass spectrum after recovery is referred to as the data of normal healthy persons. Thus, the mass spectra of eight cases each for patients and normal healthy persons are measured in this example.
The data shown in
In this embodiment, the following operation is performed for the accurate detection of peak positions from a mass spectrum with much noise. Firstly, a slope of the mass spectrum is obtained by smoothing differentiation. In this example, the smoothing differentiation is performed by calculating a moving average with a smoothing point of 70. Specifically, a value is obtained by smoothing the average of the intensity values for 70 points of mass numbers. The smoothed value is differentiated to thereby obtain a slope. For example, the smoothed intensity at the mass number 4000 is an average of the intensity values at the mass numbers 3966 to 4035. The mass spectrum with much noise can be thereby smoothed.
After that, the mass number at which the smoothed intensity reaches its maximum is obtained from a change in the slope of the smoothed intensity. The point where a change in the slope turns from positive to negative exhibits a maximum value, and the mass number at this point is obtained. Further, the mass number at which the unsmoothed data reaches its greatest in the proximity of the maximum value is obtained. The proximity point is set to the same value as the smoothing point. Thus, the mass number at which the unsmoothed value reaches its greatest is obtained from the range of 70 mass numbers in the proximity of the mass number at which the smoothed value reaches its maximum. For example, if a maximum value of the smoothed intensity is reached at the mass number of 4000, the mass number at which the unsmoothed intensity reaches its greatest value is calculated from the range of the mass numbers 3966 to 4035. Further, the portion in which the greatest value of the intensity exceeds a threshold is determined as a peak, and its peak position is obtained. Thus, the mass number at which the greatest value of the intensity which exceeds the threshold exists serves as a peak position. This enables the accurate detection of a peak position in spite of the presence of much noise.
As shown in
In
The human judgment is carried out by arranging the mass spectra in a vertical line. For example, the eight cases of mass spectra after recovery are arranged in a vertical line as shown in
When the mass spectra shown in
The present invention implements the automatic detection of peaks by prescribed processing, thereby providing highly reproducible analysis without dropouts even on the biological sample with lots of variation factors. Further, the present invention implements the analysis in regard to the peak position only, without regard to the peak height or area value, thus being suitable for the biological samples with wide variations such as changes in health condition or individual differences. Furthermore, the present invention enables the reduction of an analysis time even if the number of samples is increased in order to improve the statistical accuracy.
The step of calculating the coincidence degree is described hereinafter with reference to
The peak coincidence degree is a value indicating how closely coincide are the peak positions of a plurality of mass spectra. Because the number of samples during illness is eight in this example, if the peak appears at the same mass number in all of the eight mass spectra, the coincidence degree at that mass number is 8/8=1. On the other hand, if no peak appears at the same mass number in all of the eight mass spectra, the coincidence degree at that mass number is 0. At the mass number at which the peak appears in one out of the eight mass spectra, the coincidence degree at that mass number is ⅛=0.125. The peak coincidence degree is calculated for each mass number.
The present invention calculates the coincidence degree by setting a window 51 having a certain width for the mass number in consideration of variation factors of a mass spectrum. Specifically, in the target mass spectrum, the coincidence degree is calculated based on the number of peaks which fall within the width of the window. For example, referring to the window 51a shown in
As described above, the coincidence degree is calculated based on the number of peaks which fall within the window width. Further, the present invention sets the shape of a window to a cosine curve and changes the number of peaks contained in the window according to the position within the window. Specifically, it assigns weights to the peak positions contained in the window width according to their positions. The function for the weighting is a cosign function. The shape of the window is described hereinafter with reference to
As shown in
In the configuration shown in
The step of calculating a difference between the coincidence degree after recovery and the coincidence degree during illness is described hereinafter with reference to
The calculation of a difference in peak coincidence degree enables the obtainment of characteristic and differential peak positions. At the mass number at which the peak appears frequently after recovery and scarcely during illness, a difference in coincidence degree is large. Further, at the mass number at which the peak appears frequently during illness and scarcely after recovery, a difference in coincidence degree is large. The peak which appears at such a mass number is a characteristic and differential peak. On the other hand, at the mass number at which the peak appears frequently both during illness and after recovery, a difference in coincidence degree is small. Because a peak appears in most samples at this mass number, the peak which appears in the vicinity of this mass number is a non-differential peak. At the mass number at which the peak appears scarcely both during illness and after recovery, a difference in coincidence degree is large. Because a peak does not appear in most samples at this mass number, the peak which appears in the vicinity of this mass number, if any, is considered due to variation factors. As a difference in peak coincidence degree is larger, a difference in the frequency that a peak appears is larger between patients and normal healthy persons. Accordingly, as a difference in peak coincidence degree is larger, a characteristic and differential peak is more likely to exist at the mass number.
Table 1 shows the analysis results of peak positions analyzed according to the present invention.
Table 1 shows the characteristic and differential peak positions which are detected by the analysis method according to the present invention as the automatic detection result. The analysis result may be displayed on the display unit 17 as a table. For example, an arbitrary value may be input through the input unit 12, and the peak positions having a difference in coincidence degree which is equal to or higher than the input value may be displayed on the display unit 17. In Table 1, the peak positions are displayed from the top in descending order of a difference in coincidence degree. For comparison, Table 1 also shows the characteristic and differential peak positions which are detected by the manual detection by a person.
As shown in Table 1, the manual detection sometimes fails to detect the mass number at which a difference in coincidence degree is large. The present invention obtains the characteristic and differential peak positions as described above, thereby achieving the accurate analysis without dropouts. Further, even when the number of samples is increased in order to reduce statistical errors, the present invention can perform the analysis in a significantly shorter time than the manual detection. As a result of the accurate analysis without dropouts, the peak position which appears frequently in a specific disease can be identified accurately. Using the peak positions, it is possible to accurately determine whether or not another target person suffers from the specific disease.
Further, the present invention allows a user to input various settings through the input unit 12. For example, the window width may be adjusted according to a sample to be analyzed or disease. A smoothing point or a threshold in the peak position detection step may be varied. Further, the window shape may be set arbitrarily and weights may be assigned with a function different from a cosign function. Allowing a user to input these settings enables the accurate analysis as appropriate according to various diseases. Furthermore, the scanning pitch of the window is not limited to 1 mpz. More accurate analysis would be enabled by smaller scanning pitch, and shorter analysis time would be enabled by larger scanning pitch. The scanning pitch or the window width is not limited to an integer but may be a decimal.
The above-described analysis process or set values are given by way of illustration only, and the present invention is not limited to the above embodiments. Although the intensity data exists for each 1 mass number in the mass spectrum in the above description, the intensity data in practice exists for each mass number in accordance with the resolution of the measurement device 20. If the resolution of the measurement device 20 is 0.1, the intensity data exists for each 0.1 mass number. In such a case, the peak position is detected at the resolution of 0.1. The analysis according to the present invention is suitable for use on the mass spectrum which is obtained by ionizing proteins in a biological sample by SELDI or MALDI, for example.
The present invention extracts the information only regarding peak positions from a mass spectrum and carries out the analysis based on the peak positions. This enables the highly reproducible and accurate analysis without dropouts even if the peak height or area varies by a variety of variation factors. Further, the present invention calculates the number of peak positions of a plurality of biological samples which is contained in the window having a certain width for the mass number. This enables the accurate analysis without dropouts even if the peak positions are not aligned due to variation factors such as the presence of isotope.
The mass spectrum analysis device and the mass spectrum analysis method according to the present invention may be implemented not only by a normal personal computer (PC) but also by a work station, a general purpose machine, a FA computer, or a combination of those. These components, however, are given by way of illustration only, and not all the components are fundamental components for the present invention. Further, the analysis device is not necessarily physically integrated, and it is possible to perform parallel processing by a plurality of terminals.
The present invention may be applied to a mass spectrum analysis device, a mass spectrum analysis method, and a mass spectrum analysis program for analyzing the mass spectrum measured for samples.
Number | Date | Country | Kind |
---|---|---|---|
2004-170244 | Jun 2004 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP05/10471 | 6/8/2005 | WO | 3/9/2007 |