The present invention relates to a mass spectrometer that identifies or discriminates a microorganism, a mass spectrometry method for identifying or discriminating a microorganism, and a non-transitory computer readable medium that stores a mass spectrometry program for identifying or discriminating a microorganism.
A mass spectrometer is used to identify or discriminate samples of various microorganisms. It is possible to detect a marker peak for identifying or discriminating each sample by comparing a plurality of mass spectra obtained with respect to a plurality of samples. A microorganism identification/discrimination system (hereinafter referred to as the MALDI-MS system) using MALDI-MS (Matrix-assisted Laser Desorption Ionization Mass Spectrometry) is excellent in rapidity and cost performance, and has been rapidly widely used in clinical sites in recent years.
At this time, in the clinical sites, microorganism identification/discrimination using the MALDI-MS system remains at a species level of identification/discrimination. On the other hand, in academic research, it has been reported that a microorganism has been identified or discriminated at a strain level. For example, an article by Yudai Hotta et al., “Classification of the Genus Bacillus Based on MALDI-TOF MS Analysis of Ribosomal Proteins Coded in S10 and spc Operons,” Journal of Agricultural and Food Chemistry, 2011, Vol. 59, No. 10, pp. 5222-5230 describes that a theoretical mass of a protein (mainly a ribosomal protein) that is expressed only in a specific strain is calculated based on gene information. Discrimination of a strain is performed depending on whether there is a peak (marker peak) in a mass-to-charge ratio corresponding to the calculated theoretical mass.
Also in the clinical sites, it is expected that infection routes of microorganisms can be clarified or determination can be made as to whether microorganisms have toxicity by putting the discrimination of strains of microorganisms into practice use. However, it is not easy to discriminate the strains of microorganisms with high accuracy.
An object of the present invention is to provide a mass spectrometer, a mass spectrometry method, and a non-transitory computer readable medium that stores a mass spectrometry program, for enabling higher accuracy in discrimination of the strains of the microorganisms.
The inventors of the present invention have considered producing a discrimination analysis model for discriminating the strains of the microorganisms by performing machine learning using a plurality of mass spectra. As a result of various experiments and considerations, the inventors have found that it is possible to produce a discrimination analysis model that is available for the discrimination of strains by reducing variations in peak intensity of each mass spectrum. Based on this finding, the inventors have conceived of the present invention as described below.
(1) A mass spectrometer according to one aspect of the present invention that discriminates a strain of a microorganism includes a training data acquirer that acquires, as training data, each of a plurality of mass spectral data with respect to a plurality of samples, each sample including a microorganism of which strain is known, an additive, and a matrix mixed with the sample, a model producer that produces a discrimination analysis model for discriminating a strain based on the plurality of training data acquired by the training data acquirer by performing machine learning, a target data acquirer that acquires, as target data, mass spectral data with respect to a sample including a microorganism of which strain is unknown, the additive, and the matrix mixed with the sample, and a discriminator that discriminates the strain of the microorganism corresponding to the target data acquired by the target data acquirer based on the discrimination analysis model for each strain produced by the model producer and the acquired target data.
In this mass spectrometer, each of the plurality of mass spectral data corresponding to the microorganisms, of which strains are known, is acquired as the training data. The sample corresponding to each training data includes the additive and also includes the matrix mixed with the sample. The discrimination analysis model for discriminating a strain based on the acquired plurality of training data is produced by performing the machine learning. Also, the mass spectral data corresponding to the microorganism, of which strain is unknown, is acquired as the target data. The sample corresponding to the target data includes the additive and also includes the matrix mixed with the sample. The strain of the microorganism corresponding to the acquired target data is discriminated based on the produced discrimination analysis model for each strain and the acquired target data.
With this configuration, variations in peak intensity of each training data are reduced. As such, it is possible to produce the discrimination analysis model available for the discrimination of the strain by performing the machine learning on the acquired plurality of training data. Further, similarly to each training data, variations in peak intensity of the target data are reduced. This makes it possible to discriminate the strain of the microorganism corresponding to the target data based on the produced discrimination analysis model and the target data. As a result, accuracy of the discrimination of the strain of the microorganism is improved.
(2) The additive may include at least one of a compound that inhibits alkali metal-added ion detection and a surfactant. In this case, variations in peak intensity of each of the plurality of training data and the target data can be efficiently reduced.
(3) The additive may include a methylenediphosphonic acid or decyl-β-D-maltopyranoside. In this case, the variations in peak intensity of each of the plurality of training data and the target data can be more efficiently reduced.
(4) The model producer may produce the discrimination analysis model by a support vector machine or a neural network. In this case, the discrimination analysis model for discriminating the strain with high accuracy can easily be produced.
(5) The matrix may include a sinapic acid. In this case, each of the plurality of training data and the target data can easily be acquired. Moreover, the variations in peak intensity of each of the plurality of training data and the target data can be efficiently reduced.
(6) A mass spectrometry method according to another aspect of the present invention for discriminating a strain of a microorganism includes acquiring, as training data, each of a plurality of mass spectral data with respect to a plurality of samples, each sample including a microorganism of which strain is known, an additive, and a matrix mixed with the sample, producing a discrimination analysis model for discriminating a strain based on the acquired plurality of training data by performing machine learning, acquiring, as target data, mass spectral data with respect to a sample including a microorganism of which strain is unknown, the additive, and the matrix mixed with the sample, and discriminating the strain of the microorganism corresponding to the acquired target data based on the produced discrimination analysis model for each strain and the acquired target data.
With this mass spectrometry method, it is possible to discriminate the strain of the microorganism corresponding to the target data with high accuracy based on the produced discrimination analysis model and the target data. As a result, the accuracy of discrimination of the strain of the microorganism is improved.
(7) The additive may include at least one of a compound that inhibits alkali metal-added ion detection and a surfactant. In this case, variations in peak intensity of each of the plurality of training data and the target data can be efficiently reduced.
(8) The additive may include a methylenediphosphonic acid or decyl-β-D-maltopyranoside. In this case, the variations in peak intensity of each of the plurality of training data and the target data can be more efficiently reduced.
(9) The producing of the discrimination analysis model may include producing the discrimination analysis model by a support vector machine or a neural network. In this case, the discrimination analysis model for discriminating the strain with high accuracy can easily be produced.
(10) The matrix may include a sinapic acid. In this case, each of the plurality of training data and the target data can easily be acquired. Moreover, the variations in peak intensity of each of the plurality of training data and the target data can be efficiently reduced.
(11) A non-transitory computer readable medium that stores a mass spectrometry program according to still another aspect of the present invention for discriminating a strain of a microorganism executable by a processor, wherein the mass spectrometry program causes the processor to execute processes of acquiring, as training data, each of a plurality of mass spectral data with respect to a plurality of samples, each sample including a microorganism of which strain is known, an additive, and a matrix mixed with the sample, producing a discrimination analysis model for discriminating a strain based on the acquired plurality of training data by performing machine learning, acquiring, as target data, mass spectral data with respect to a sample including a microorganism of which strain is unknown, the additive, and the matrix mixed with the sample, and discriminating the strain of the microorganism corresponding to the acquired target data based on the produced discrimination analysis model for each strain and the acquired target data.
With this mass spectrometry program, it is possible to discriminate the strain of the microorganism corresponding to the target data with high accuracy based on the produced discrimination analysis model and the target data. As a result, the accuracy of discrimination of the strain of the microorganism is improved.
Other features, elements, characteristics, and advantages of the present invention will become more apparent from the following description of preferred embodiments of the present invention with reference to the attached drawings.
A mass spectrometer, a mass spectrometry method, and a non-transitory computer readable medium that stores a strain discrimination program (mass spectrometry program) according to an embodiment of the present invention will be described in detail below with reference to the drawing.
The processor 10 is constituted by a CPU (Central Processing Unit) 11, a RAM (Radom Access Memory) 12, a ROM (Read Only Memory) 13, a storage device 14, an operator 15, a display 16, and an input/output I/F (interface) 17. The CPU 11, the RAM 12, the ROM 13, the storage device 14, the operator 15, the display 16, and the input/output I/F 17 are connected to a bus 18. The CPU 11, the RAM 12, and the ROM 13 constitute a strain discriminator 30.
The RAM 12 is used as a workspace of the CPU 11. The ROM 13 stores a system program. The storage device 14 includes a storage medium such as a hard disk or a semiconductor memory and stores a strain discrimination program. The CPU 11 executes the strain discrimination program stored in the storage device 14, so that strain discrimination processing is performed as described below.
The operator 15 is an input device such as a keyboard, a mouse or a touch panel. A user can give a predetermined instruction to the analyzer 20 or the strain discriminator 30 by operating the operator 15. The display 16 is a display device such as a liquid crystal display device and displays results of strain discrimination performed by the strain discriminator 30. The input/output I/F 17 is connected to the analyzer 20.
The analyzer 20 produces mass spectral data indicating mass spectra of various samples of microorganisms using MALDI (Matrix-assisted Laser Desorption Ionization). The samples include a sample of which strain is known (hereinafter referred to as training sample) and a sample to be discriminated of which strain is unknown (hereinafter referred to as target sample). A matrix is mixed in each of the training sample and the target sample. Each of the training sample and the target sample includes a predetermined additive.
The matrix includes a sinapic acid, for example. The additive includes at least one of a compound that inhibits detection of alkali metal-added ions and a surfactant. More specifically, the compound inhibiting the detection of the alkali metal-added ions includes a methylenediphosphonic acid (MDPNA). The surfactant includes decyl-β-D-maltopyranoside (DMP). Thus, variations in peak intensity of the produced mass spectral data can be reduced.
The strain discriminator 30 produces a discrimination analysis model based on a plurality of mass spectral data each corresponding to a plurality of the training samples. The strain discriminator 30 discriminates a strain of the target sample based on the produced discrimination analysis model. An operation of the strain discriminator 30 will be described below.
The training data acquirer 31 acquires a plurality of mass spectral data (hereinafter referred to as training data) each corresponding to the plurality of training samples produced by the analyzer 20. The user can instruct the analyzer 20 to apply a plurality of desired training data to the training data acquirer 31 by operating the operator 15. While the training data acquirer 31 acquires the plurality of training data directly from the analyzer 20 in the example of
The strain information acquirer 32 acquires from the operator 15 strain information indicating a strain of each of the plurality of training samples corresponding to the plurality of training data acquired by the training data acquirer 31. The user can provide the strain information acquirer 32 with the strain information of each of the plurality of training samples corresponding to the plurality of training data by operating the operator 15.
When training data is produced by the analyzer 20, the user may register strain information corresponding to the training data in the analyzer 20. In this case, each training data and strain information corresponding to the training data can be treated integrally in such a manner that the training data and the corresponding strain information are linked to each other. Thus, when training data is acquired by the training data acquirer 31, strain information corresponding to the training data is automatically acquired from the analyzer 20 or the storage device 14 by the strain information acquirer 32.
The model producer 33 classifies the plurality of training data acquired by the training data acquirer 31 for each strain, based on the strain information acquired by the strain information acquirer 32. Also, the model producer 33 performs machine learning (supervised learning) using the plurality of training data classified into the same strain, thereby to produce, as a discrimination analysis model, a pattern of a mass spectrum for discriminating the strain. The discrimination analysis model is preferably produced by a support vector machine (SVM) or a neural network (NN).
The left column of
While a target of the discrimination analysis models is a sequential waveform in the examples of
The target data acquirer 34 acquires mass spectral data (hereinafter referred to as target data) corresponding to the target sample produced by the analyzer 20. The user can instruct the analyzer 20 to provide the target data acquirer 34 with desired target data by operating the operator 15. While the target data acquirer 34 acquires the target data directly from the analyzer 20 in the example of
The discriminator 35 discriminates a strain of the target sample based on the discrimination analysis model produced by the model producer 33 and the target data acquired by the target data acquirer 34. More specifically, the discriminator 35 performs pattern authentication between the mass spectrum based on the target data and each of the discrimination analysis models corresponding to the plurality of strains. A strain that corresponds to a discrimination analysis model that has the highest degree of coincidence with the mass spectrum is discriminated as the strain of the target sample. The discriminator 35 allows the display 16 to display the discriminated strain.
First of all, the training data acquirer 31 acquires training data from the analyzer 20 (step S1). In the present embodiment, each training data and strain information corresponding to the training data are registered in the analyzer 20 in such a manner that these data are linked to each other. As such, the strain information acquirer 32 acquires strain information from the analyzer 20 in step S1.
Next, the training data acquirer 31 determines whether an end of acquisition of the training data is instructed (step S2). The user can instruct the training data acquirer 31 to end the acquisition of the training data by operating the operator 15. If the end of the acquisition of the training data has not been instructed, the training data acquirer 31 returns to the step S1. The steps S1 and S2 are repeated until the end of the acquisition of the training data is instructed. Accordingly, the plurality of training data are acquired.
If the end of the acquisition of the training data has been instructed, the model producer 33 produces a discrimination analysis model based on the training data and the strain information acquired in the step S1 (step S3). In the case where a plurality of sets of training data and strain information are acquired for each of the plurality of strains in the step S1, the model producer 33 produces a discrimination analysis model for each strain. The target data acquirer 34 acquires target data from the analyzer 20 (step S4). The step S4 may be executed simultaneously with the step S3 or may be executed at a time point before the step S4.
The discriminator 35 performs pattern authentication between the discrimination analysis models produced in the step S3 and the mass spectrum based on the target data acquired in the step S4 (step S5). After that, the discriminator 35 determines whether the pattern authentication has been performed on all of the discrimination analysis models produced in the step S3 (step S6). If the pattern authentication has not been performed on all of the discrimination analysis models, the discriminator 35 returns to the step S5. The steps S5 and S6 are repeated until the pattern authentication is performed on all of the discrimination analysis models.
If the pattern authentication has been performed on all of the discrimination analysis models, the discriminator 35 discriminates the strain of the target sample based on a result of the authentication in the step S5 (step S7). Finally, the discriminator 35 allows the display 16 to display the strain discriminated in the step S7 (step S8) and ends the strain discrimination processing.
In the mass spectrometer 100 according to the present embodiment, each of the plurality of mass spectral data corresponding to the microorganisms, of which strains are known, is acquired as the training data by the training data acquirer 31. The sample corresponding to each training data includes the additive and also the matrix mixed with the sample. The discrimination analysis models for discriminating the strains based on the plurality of training data acquired by the training data acquirer 31 are produced by the model producer 33 by performing the machine learning.
Moreover, the mass spectral data corresponding to the microorganism, of which strain is unknown, is acquired as the target data by the target data acquirer 34. The sample corresponding to the target data includes the additive and also the matrix mixed the sample. The strain of the microorganism corresponding to the target data acquired by the target data acquirer 34 is discriminated by the discriminator 35 based on the discrimination analysis model for each strain produced by the model producer 33 and the acquired target data.
With this configuration, variations in peak intensity of each training data are reduced. As such, it is possible to produce the discrimination analysis model available for the strain discrimination by performing the machine learning on the acquired plurality of training data. In addition, variations in peak intensity of the target data are reduced similarly to each training data. This makes it possible to discriminate with high accuracy the strain of the microorganism corresponding to the target data based on the produced discrimination analysis model and the target data. As a result, the accuracy of discrimination of the strains of the microorganisms is improved.
As a technique for discrimination of the strains of the microorganisms, determination of marker peaks is considered as described in the article by Yudai Hotta et al., “Classification of the Genus Bacillus Based on MALDI-TOF MS Analysis of Ribosomal Proteins Coded in S10 and spc Operons,” Journal of Agricultural and Food Chemistry, 2011, Vol. 59, No. 10, pp. 5222-5230.
However, as shown in
As another technique for discrimination of the strains of the microorganisms, main component analysis is considered. More specifically, a plurality of samples of microorganisms classified into any of first to sixth strains were prepared, and a mass spectrum for each sample was measured. Also, a vector composed of a row of peak intensities was produced for each sample, and the main component analysis was performed using the produced plurality of vectors as inputs. An arithmetic operation method for the main component analysis is well known and therefore will not be described herein.
As shown in
In an inventive example shown below, the strains of samples were discriminated with use of the discrimination analysis model produced by the SVM based on the aforementioned embodiment. On the other hand, in a comparative example, the strains of samples were discriminated with use of a linear model produced by a general linear discrimination method. An incorrect discrimination rate in each of the inventive example and the comparative example was evaluated by each of holdout validation and cross validation. Details thereof are described below.
(a) Holdout Validation
Mass spectral data with respect to each of 205 samples of which strains are known (hereinafter referred to as simply “data”) was produced for two days. More specifically, 107 data were produced on the first day, and 98 data were produced on the second day. A plurality of combinations of training data and target data were defined with use of part or all of the produced 205 data.
In the inventive example, a strain of each target data in the first combination was discriminated based on the discrimination analysis model produced by the SVM using the training data in the first combination. Similarly, a strain of each target data in the second combination was discriminated based on the discrimination analysis model produced by the SVM using the training data in the second combination. A strain of each target data in the third combination was discriminated based on the discrimination analysis model produced by the SVM using the training data in the third combination.
In the production of the aforementioned 205 data, a matrix was mixed with every sample and an additive was blended in every sample. In the case where no matrix was mixed with the samples or in the case where no additive was blended in the samples, noise components in the data were increased, and variations in peak intensity became larger, and thus, it was impossible to produce a discrimination analysis model.
In the comparative example, a strain of each target data in the first combination was discriminated based on the linear model produced by the linear discrimination method using the training data in the first combination. Similarly, a strain of each target data in the second combination was discriminated based on the linear model produced by the linear discrimination method using the training data in the second combination. A strain of each target data in the third combination was discriminated based on the linear model produced by the linear discrimination method using the training data in the third combination.
Further, the incorrect discrimination rates in the inventive example and the comparative example were evaluated by holdout validation.
(b) Cross Validation
In the cross validation, the random selection of the training data as described above are repeated plural times. Thus, the target data changes and also the training data changes each time the selection is performed.
In the inventive example, each time the training data in the fourth combination was selected, a strain of each target data in the fourth combination was discriminated based on the discrimination analysis model produced by the SVM using the selected training data. Similarly, each time the training data in the fifth combination was selected, a strain of each target data in the fifth combination was discriminated based on the discrimination analysis model produced by the SVM using the selected training data. Each time the training data in the sixth combination was selected, a strain of each target data in the sixth combination was discriminated based on the discrimination analysis model produced by the SVM using the selected training data.
In the comparative example, each time the training data in the fourth combination was selected, a strain of each target data in the fourth combination was discriminated based on the linear model produced by the linear discrimination method using the selected training data. Similarly, each time the training data in the fifth combination was selected, a strain of each target data in the fifth combination was discriminated based on the linear model produced by the linear discrimination method using the selected training data. Each time the training data in the sixth combination was selected, a strain of each target data in the sixth combination was discriminated based on the linear model produced by the linear discrimination method using the selected training data.
Moreover, average incorrect discrimination rates in the inventive example and the comparative example were evaluated by the cross validation.
While preferred embodiments of the present invention have been described above, it is to be understood that variations and modifications will be apparent to those skilled in the art without departing the scope and spirit of the present invention. The scope of the present invention, therefore, is to be determined solely by the following claims.
Number | Date | Country | Kind |
---|---|---|---|
2018-191764 | Oct 2018 | JP | national |