The present invention relates to non-transitory computer-readable storage medium and the like storing a program for estimating a gate region in flow cytometry.
Flow cytometry (FCM) is a technique that enables measurement of multiple feature quantities for each single cell. In the flow cytometry, a suspension in which cells are suspended is prepared and injected into a measurement instrument so as to make the cells flow in a line. Light is directed to the cells flowing one by one to thereby produce scattered light and fluorescent light, which provides indexes such as the size of the cell, the internal complexity of the cell, the cellular composition and the like. The flow cytometry is used for a cellular immunological test in a medical field, for example.
In the cellular immunological test, a laboratory analyzes multiple index values obtained by the flow cytometry and returns the analysis results to a laboratory that requests for the analysis as a test result. The analysis techniques include gating as one example. The gating is a technique for selecting only a specific population from the obtained data and analyzing the selected one. Conventionally, specification of a population to be analyzed is performed by a tester i.e., a person who conducts the test drawing an oval or a polygon (referred to as a gate) in a two-dimensional scatter diagram. Such gate setting greatly depends on the experience and knowledge of the tester. Thus, it is difficult for a tester with less experience and less knowledge to appropriately perform gate setting.
In contrast thereto, a technique of automating gate setting has been proposed (Japanese Patent No. 6480918 and Japanese Patent No. 5047803, etc.). Since the conventional technique, however, is a setting method using cellular density information or is a rule-based setting method, this does not fully utilize the experience and knowledge that have been accumulated by the tester.
The present disclosure is made in view of such circumstances. The object thereof is to provide a gate region estimation program and the like that estimate a gate region using a learning model.
According to the present disclosure, there is provided gate region estimation program causing a computer to execute processing of: acquiring a group of scatter diagrams including a plurality of scatter diagrams each different in a measurement item that are obtained from measurements by flow cytometry; inputting the group of scatter diagrams acquired to a learning model trained based on teaching data including a group of scatter diagrams and a gate region; and outputting an estimated gate region obtained from the learning model.
The present disclosure enables gate setting like a gate setting performed by an experienced tester.
The above and further objects and features will more fully be apparent from the following detailed description with accompanying drawings.
The following embodiments will be described with reference to drawings. The following description is made while taking CD45 gating in a Leukemia, Lymphoma Analysis (LLA) test as an example. The procedure of the LLA test will first be described. The LLA test roughly includes five processes. These five processes are: 1. dispensing; 2. performing pretreatment; 3. measuring and drawing; 4. analyzing; and 5. reporting.
The dispensing process is for dividing one specimen (hereinafter referred to as “ID”). In the LLA test, one ID is divided into ten at the maximum for running a test. Each of the divided specimens is denoted as SEQ. The divided ten specimens are denoted as SEQ1, SEQ2, . . . SEQ 10. In the pretreatment process, the SEQs are subjected to a process common to the SEQs, e.g., adjustment of the cellular density and are individually labeled with surface markers. SEQ1 is assumed as a negative control. The negative control means that test is performed on a subject already known to have a negative result under the same condition as that for a subject desired to be validated. Alternatively, the negative control means the subject of such a test. In the test, the result for the subject desired to be validated and the result for the negative control are compared, whereby the test result is analyzed based on a relative difference between them.
In the measuring and drawing process, measurement is performed on the ten SEQs by a flow cytometer to obtain fluorescence values. For individual cells in each SEQ, information consisting of five items including a measurement value can be acquired. The details of the items are FSC, SSC, FL1, FL2 and FL3. FSC indicates a measurement value of forward scattered light. FSC indicates a value of scattered light detected forward with respect to the optical axis of a laser beam. Since FSC is approximately proportional to the surface area or the size of a cell, it is an index value indicating the size of a cell. SSC indicates a measurement value of side scattered light. The side scattered light is light detected at a 90° angle with respect to the optical axis of a laser beam. SSC is light mostly directed to and scattered by materials within the cell. Since SSC is approximately proportional to the granularity or the internal composition of a cell, it is an index value of the granularity or the internal composition of a cell. FL indicates florescence but here indicates multiple fluorescent detectors provided in a flow cytometer. The number indicates the order of each fluorescent detector. FL1 indicates a first fluorescent detector but here represents an item to which marker information of each SEQ is set as a marker. FL2 indicates a second fluorescent detector, but here represents an item to which marker information of each SEQ is set as a marker. FL3 indicates the third fluorescent detector but here means the name of an item to which the marker information of CD45 is set.
The flow cytometer creates two scatter diagrams for each SEQ and displays them on the display or the like. For example, one of the scatter diagrams is graphed with SSC on the one axis and FL3 on the other axis. The other one of the scatter diagrams is graphed with SSC on the one axis and FSC on the other axis.
In the analyzing process, the tester estimates a disease according to the manner of the scatter diagrams and creates gates useful for specifying a disease on the scatter diagrams. The tester then creates a FL1-FL2 scatter diagram for each SEQ only consisting of the cells existing in the gate region and observes a reaction to each of the markers for each SEQ. In the reporting process, the tester determines particularly useful two gates for reporting and creates a report.
The following describes a mode in which gate setting conventionally performed by the tester in the analyzing process is performed by a learning model.
The learning server 3 is composed of a sever computer, a workstation or the like. The learning server 3 is not an indispensable component in the test system. The learning server 3 functions as a supplementary of the flow cytometer 10 and stores measurement data and a learning model as a backup. Moreover, in place of the flow cytometer 10, the learning server 3 may generate a learning model and retrain the learning model. In this case, the learning server 3 transmits parameters and the like for characterizing the learning model to the flow cytometer. Note that the function of the learning server 3 may be provided using a cloud service and a cloud storage.
The control unit 11 has one or more arithmetic processing devices such as a central processing unit (CPU), a micro-processing unit (MPU), a graphics processing unit (GPU) and the like. The control unit 11 performs various information processing, control processing and the like related to the flow cytometer 10 by reading out and executing an operating system (OS) (not illustrated) and a control program 1P (gate region estimation program) that are stored in the auxiliary storage 13. Furthermore, the control unit 11 includes functional parts such as an acquisition unit and an output unit.
The main storage 12 is a static random access memory (SRAM), a dynamic random access memory (DRAM), a flash memory or the like. The main storage 12 mainly temporarily stores data necessary for the control unit 11 to execute arithmetic processing.
The auxiliary storage 13 is a hard disk, a solid state drive (SSD) or the like and stores the control program 1P and various databases (DB) necessary for the control unit 11 to execute processing. The auxiliary storage 13 stores a measurement value DB 131, a feature information DB 132, a gate DB 133, an alternative positive rate DB 135 and a regression model 134. The alternative positive rate DB 135 is not indispensable in the present embodiment. The auxiliary storage 13 may be an external storage device connected to the flow cytometer 10. The various DBs stored in the auxiliary storage 13 may be stored in a database server or a cloud storage that is connected over the network N.
The input unit 14 is a keyboard and a mouse. The display unit 15 includes a liquid crystal display panel or the like. The display unit 15 displays various information such as information for measurement, measurement results, gate information and the like.
The display unit 15 may be a touch panel display integrated with the input unit 14. Note that information to be displayed on the display unit 15 may be displayed on an external display device for the flow cytometer 10.
The communication unit 16 communicates with the learning server 3 over the network N. Moreover, the control unit 11 may download the control program 1P from another computer over the network N or the like using the communication unit 16 and store it in the auxiliary storage 13.
The reading unit 17 reads a portable storage medium 1a including a CD (compact disc)-ROM and a DVD (digital versatile disc)-ROM. The control unit 11 may read the control program 1P from the portable storage medium 1a via the reading unit 17 and store it in the auxiliary storage 13. Alternatively, the control unit 11 may download the control program 1P from another computer over the network N or the like and store it in the auxiliary storage 13. Alternatively, the control unit 11 may read the control program 1P from a semiconductor memory 1b.
The databases stored in the auxiliary storage 13 will now be described.
The receipt date column stores a date when a request for a test is received. The test number column stores a test number issued when a test is run. The test date column stores a date when a test is run. The chart number column stores a chart number corresponding to the request for the test. The name column stores a name of a subject who provides a specimen. The gender column stores a gender of the subject. For example, if the subject is a man, the gender column stores M while if the subject is a woman, the gender column stores F. The age column stores an age of the subject. The specimen taking date column stores a date when a specimen was taken from the subject. In the data part 1312, each column stores a measurement value for each cell concerning the measurement item. Each row stores measurement values for each cell concerning the respective measurement items.
In the flow cytometer 10 according to the present embodiment, the processing unit 1 performs deep learning for the appropriate feature quantities of a gate on the scatter diagram image created based on the measurement results obtained by the measurement unit 2. Such deep learning allows the processing unit 1 to generate the regression model 134 to which multiple scatter diagram images (a group of scatter diagrams) are input and from which gate information is output. The multiple scatter diagram images are images of multiple scatter diagrams each being different in an item of at least one of the axes. The multiple scatter diagram images are two scatter diagram images composed of an image of a scatter diagram graphed with SSC on the horizontal axis and FL3 on the vertical axis and an image of a scatter diagram graphed with SSC on the horizontal axis and FSC on the vertical axis. Three or more scatter diagram images may be input to the regression model 134. The neural network is Convolution Neural Network (CNN), for example. The regression model 134 includes multiple feature extractors for training feature quantities of the respective scatter diagram images, a connector for connecting the feature quantities output from the respective feature extractors, and multiple predictors for predicting and outputting items of the gate information (center x coordinate, center y coordinate, major axis, minor axis and angle of the inclination) based on the connected feature quantities. Note that, not the scatter diagram images, a collection of measurement values, which are the base of the scatter diagrams, may be input to the regression model 134.
Each of the feature extractors includes an input layer and an intermediate layer. The input layer has multiple neurons that accept inputs of the pixel values of the respective pixels included in the scatter diagram image, and passes on the input pixel values to the intermediate layer. The intermediate layer has multiple neurons and extracts feature quantities from the scatter diagram image, and passes on the feature quantities to an output layer.
In the case where the feature extractor is CNN, for example, the intermediate layer is composed of alternate layers of a convolution layer that convolves the pixel values of the respective pixels input from the input layer and a pooling layer that maps the pixel values convolved in the convolution layer. The intermediate layer finally extracts image feature quantities while compressing the image information. Instead of preparing feature extractors for respective ones of scatter diagram images to be input, one feature extractor may receive inputs of multiple scatter diagram images.
Though the following description is made assuming that the regression model 134 is CNN in the present embodiment, the regression model 134 may be any trained model constructed by another learning algorithm such as a neural network other than CNN, Bayesian Network, Decision Tree or the like without being limited to CNN.
The processing unit 1 performs training using teaching data including multiple scatter diagram images and correct answer values of the gate information corresponding to the scatter diagrams that are associated with each other. As illustrated in
The processing unit 1 inputs two scatter diagram images as teaching data to the respective different feature extractors. The feature quantities output from the respective feature extractors are connected by the connector. The connection by the connector includes a method of simply connecting the feature quantities (Concatenate), a method of summing up values indicating the feature quantities (ADD) and a method of selecting the maximum feature quantity (Maxpool).
The respective predictors output gate information as prediction results based on the connected feature quantities. A combination of values output from the respective predictors is a set of gate information. Multiple sets of gate information may be output. In this case, predictors in number corresponding to the multiple sets are provided. For example, if the gate information with the highest priority and the gate information with the second highest priority are output, five to ten predictors in
The processing unit 1 compares the gate information obtained from the predictors with the information labeled on the scatter diagram image in the teaching data, that is, the correct answer values to optimize parameters used in the arithmetic processing at the feature extractors and the predictors so that the output values from the predictors approximate the correct answer values. The parameters include, for example, weights (coupling coefficient) between neurons, a coefficient of an activation function used in each neuron and the like. Any method of optimizing parameters may be employed. For example, the processing unit 1 optimizes various parameters by using backpropagation. The processing unit 1 performs the above-mentioned processing on data for each test included in the teaching data to generate the regression model 134.
Next, the processing performed by the control unit 11 of the processing unit 1 will be described.
Next, gate setting using the regression model 134 will be described.
A gate is set to the scatter diagram displayed on the display unit 15 based on the gate information.
It is noted that such retraining processing may be performed by the learning server 3, not by the flow cytometer 10. In this case, the parameters of the regression model 34 updated as a result of retraining are transmitted from the learning server 3 to the flow cytometer 10, and the flow cytometer 10 updates the regression model 134 that is stored therein. Moreover, the retraining processing may be executed every time update gate information occurs, may be executed at a predetermined interval like daily batch, or may be executed after predetermined number of update gate information occur.
Though described is an example in which a single numerical value (center x coordinate, center y coordinate, major axis, minor axis or angle of the inclination) is output from each of the multiple output layers of the regression model 134, a set of numerical data, not limited to a single value, may be output. Five dimensional data including a center x coordinate, a center y coordinate, a major axis, a minor axis and an angle of the inclination may be output. For example, sets of values (10, 15, 20, 10, 15), (5, 15, 25, 5, 20), (10, 15, . . . ) . . . are assigned to the respective nodes included in the output layer, and the nodes may output probabilities with respect to the sets of values.
Modification
Though the gate information that is input to and output from a learning model is a numerical value, it may be an image. The training and estimation in this case will be performed below. U-NET as a model for the semantic segmentation is employed as a learning model. U-NET is a type of Fully Convolutional Networks (FCN) and includes an encoder that performs downsampling and a decoder that performs upsampling. U-NET is a neural network composed of only a convolutional layer and a pooling layer without provision of a fully connected layer. Upon training, multiple scatter diagram images are input to the U-NET. The U-NET outputs images each divided into a gate region and a non-gate region, and performs trainings such that the gate region indicated in the output image approaches the correct answer. In the case where a gate region is estimated after the training, two scatter diagram images are input to the U-NET. A scatter diagram image on which a gate region is represented can be obtained as an output. Edge extraction is performed on the obtained image to detect the contour of an oval representing the gate. The center coordinates (CX, CY), the major axis DX, the minor axis DY and a rotation angle ANG of the oval are evaluated from the detected contour. Then, cells included within the gate are specified. The specification can be achieved by using a known algorithm for determining whether a point is inside or outside of a polygon. The number of gate regions to be trained and output may be more than one.
In the present embodiment, even a less experienced tester can perform gate setting for indicating a population of cells important for specifying a disease. In addition, an experienced tester can perform gate setting based on the gate setting proposed by the regression model 134 unlike the conventional method, which can shorten his/her working hours.
In the present embodiment, an alternative positive rate is included as an input to the regression model 134. In flow cytometry, the feature quantity is first detected by reaction with a fluorescent marker added to cells. The measurement value obtained by a marker is a relative value and it is necessary to decide a threshold to judge positivity or negativity when used. The threshold is decided by observing the populations within the gate from a negative control specimen. The threshold is evaluated from the negative specimen, so that for subdivided specimens having been added with the marker and measured, the positive rate of the marker can be obtained. When conventionally performing a gate setting, the tester modifies a gate while viewing the positive rate (the rate of positive cells) within the gate. Thus, even in the case where gate setting is performed by using the regression model 134 as well, the positive rate is possibly highly useful. Since the positive rate, however, is an index that can be calculated after gate setting is performed, it cannot be obtained before gate setting. Hence, an index that can be calculated even when gate setting has not been performed yet and that is considered to be effective for gate setting like the positive rate is introduced. This index is called an alternative positive rate.
The alternative positive rate can be calculated as described below. The cell populations in a specimen each have a different threshold for separating positivity and negativity. The cell populations thus are subdivided into populations, and a threshold is set for each of the subdivided populations. In the present embodiment, a three-dimensional automatic clustering method, namely k-means, is applied to a scatter diagram of SEQ1 with FSC, SSC and FL3 on the axes to thereby create n pieces of small populations. Here, n is a natural number and is equal to 10.
APR for SEQ1 is as follows:
It is noted that since SEQ1 is a negative specimen, there are few cells in the partitions except for the lower left partition. With respect to SEQ2 and thereafter, the central points for the respective small populations of SEQ1 are reflected on each of the SEQs. For each of the SEQs, cells are classified into ten small populations based on their closest central points. The threshold obtained for SEQ1 is applied to each of the small populations to generate four partitions. As in SEQ1, the numbers of cells for the respective four partitions are evaluated for each of the small populations.
Comparing APR for SEQ 2 with APR for SEQ1, the number of cells at the upper left has increased from 0.001 to 0.057. This shows the presence of the cell population reacting with the SEQ2 marker in the specimen.
Likely, APR is calculated for SEQ 3 to SEQ 10. The following describes a calculation example of APR for each of the SEQs.
In the present embodiment, the APR evaluated from the measurement values is included as the teaching data for training the regression model 134.
The two of the feature extractors respectively accept scatter diagram images. The one of the feature extractors accepts APR.
A connector connects feature quantities extracted from the three feature extractors. Predictors predict and output items of the gate information (center x coordinate, center y coordinate, major axis, minor axis and angle of the inclination) based on the connected feature quantities. The processing unit 1 compares the gate information obtained from the predictors with the information labeled on the scatter diagram image as the teaching data, that is, the correct answer values. The processing unit 1 then optimizes parameters used in the arithmetic processing at the feature extractors and the predictors so that the output values from the predictors approximate the correct answer values. The rest of the matters are similar to those of Embodiment 1. It is noted that APR may be input to the connector without going through the feature extractors. Furthermore, sets of values are assigned to the respective nodes included in the output layer, and the nodes may be configured to output probabilities for the sets of values.
The processing restarts from step S4 shown in
Next, gate setting using the regression model 134 will be described.
In the present embodiment, the alternative positive rate is included as the teaching data for the regression model 134. The alternative positive rate is included when gate information is estimated by the regression model 134 as well. Thus, improvement of the accuracy of the gate information output from the regression model 134 can be expected.
In the present embodiment as well, a variant of Embodiment 1 can be applied. Multiple scatter diagram images and APR are input to the U-NET. The U-NET outputs images each divided into a gate region and a non-gate region, and performs trainings so that the gate region indicated in the output image approaches the correct answer. In the case where the gate region is estimated after training, two scatter diagram images and APR are input to the U-NET. A scatter diagram image on which a gate region is represented can be obtained as an output. The rest of the processing is similar to the above description.
While the description is made taking CD45 gating in an LLA test as an example in the above-described embodiment, a similar procedure is executable even for CD45 gating in a Malignant Lymphoma Analysis (MLA) test. The regression model employed in CD 45 gating in the Malignant Lymphoma Analysis test is provided separately from the regression model 134 for the LLA test and is stored in the auxiliary storage 13. A column indicating the content of the test is added to each of the measurement value DB 131, the feature information DB 132, the gate DB 133 and the alternative positive rate DB 135 so as to make discriminable between LLA data or MLA data. When performing training and prediction of a gate as well, the tester designates the content of the test with the input unit 14.
It is to be noted that, as used herein and in the appended claims, the singular forms “a”, “an”, and “the” include plural referents unless the context clearly dictates otherwise.
The technical features (constituent features) in the embodiments can be combined with each other, and the combination can form a new technical feature. It is to be understood that the embodiments disclosed here is illustrative in all respects and not restrictive. The scope of the present invention is defined by the appended claims, and all changes that fall within the meanings and the bounds of the claims, or equivalence of such meanings and bounds are intended to be embraced by the claims.
Number | Date | Country | Kind |
---|---|---|---|
2019-159937 | Sep 2019 | JP | national |
This nonprovisional application is a National Stage of International Application No. PCT/JP2020/032979, which was filed on Sep. 1, 2020, and which claims priority to Japanese Patent Application No. 2019-159937, which was filed in Japan on Sep. 2, 2019, and which are both herein incorporated by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/032979 | 9/1/2020 | WO |