The present application claims priority from Japanese patent application JP 2022-015123 filed on Feb. 2, 2022, the content of which is hereby incorporated by reference into this application.
The present invention relates to a collation apparatus, a collation method, and a collation program for collating data.
Cardio toco gram (CTG) is a waveform showing temporal changes in fetal heart rate and a tocogram (uterine contraction) obtained from a fetal heart rate monitor and an external tocodynamometer, respectively, and is used to evaluate well-being of a fetus. The CTG is an essential test for evaluating the fetus during delivery. The CTG is useful for early detection of low oxygen and acidosis of the fetus that may occur during delivery, and for reducing hypoxic encephalopathy and cerebral palsy in an unborn fetus.
A doctor performs an evaluation of the CTG according to a level classification in which an evaluation is executed based on a baseline of a fetal heart rate and a pattern of bradycardia. However, the waveform of the CTG is complicated and depends on experiences of a medical worker who makes a determination. Ramanujam, E., et al. “Prediction of Fetal Distress Using Linear and Non-linear Features of CTG Signals.” International Conference On Computational Vision and Bio Inspired Computing. Springer, Cham, 2019 (Non-Patent Literature 1) discloses a technique for predicting a fetal state based on a fetal heart rate signal.
However, a prediction method presented in Non-Patent Literature 1 described above is a technique of predicting a fetal state by extracting feature data from waveforms of the fetal heart rate and the uterine contraction using annotated data of a doctor, and the method requires well-organized annotated data. Since an annotation of data depends on experiences of the doctor, there is a variation in reliability. In addition, the annotation performed by the doctor can be only classified into several categories (for example, safety or danger), and a pH value cannot be predicted according to the annotation alone.
An object of the invention is to improve reliability of an annotation.
A collation apparatus according to an aspect of the invention disclosed in the present application includes: a grouping processing unit configured to group, based on each explanatory variable of a sample group, the sample group into a first group indicating a first classification and a second group indicating a second classification having a lower evaluation than the first classification; a collation unit configured to collate the classification of the first group and the second group that are obtained by the grouping processing unit with a classification identified by each objective variable of the sample group; and an output unit configured to output a collation result obtained by the collation unit.
According to a representative embodiment of the invention, reliability of an annotation is improved. Problems, configurations, and effects other than those described above are made clear according to the following description of the embodiments.
Each piece of sample data is clustered into a safety group in which the fetal state is safe and a danger group in which the fetal state is dangerous. Clustering is executed in unsupervised learning that does not use the correct data or supervised learning that uses the correct data. Here, the unsupervised learning that does not use the correct data will be described as an example. In the unsupervised learning that does not use the correct data, a sample data group is clustered into two groups, in which a group having a larger number of pieces of sample data is set as the safety group and a group having a smaller number of pieces of sample data is set as the danger group.
pH is an umbilical arterial blood gas analysis value immediately after delivery serving as an index of low oxygen or acidosis of a fetus. For example, a pH value is set to 7.2 as a reference. When the pH > 7.2, the fetus is safe, and when the pH ≤ 7.2, the fetus is dangerous. The pH value is an actual measurement value immediately after delivery in the case of the delivered sample, and is a predicted value before delivery in the case of the pre-delivery sample.
If a collation result 101 indicates that the sample data belongs to the safety group and the pH value is a safety value (both safe), the sample data is stored in the learning DB 100 as a learning target, and an annotation indicating “safety” is attached to the sample data. The annotation indicating “safety” is displayed to a user.
If a collation result 102 indicates that the sample data belongs to the safety group and the pH value is a danger value (contradiction), the sample data is set as a non-learning target in the learning DB 100 so as not to be used for learning, and a case indicating that an attribute “safety” of the sample and the safety value of the pH are “contradictory” is displayed to a user.
If a collation result 103 indicates that the sample data belongs to the danger group and the pH value is the safety value (contradiction), the sample data is set as the non-learning target in the learning DB 100 so as not to be used for learning, and a case indicating that an attribute “danger” of the sample and the safety value of the pH are “contradictory” is displayed to a user.
If a collation result 104 indicates that the sample data belongs to the danger group and the pH value is the danger value (both dangerous), the sample data is set as the non-learning target in the learning DB 100 so as not to be used for learning, and an annotation indicating “danger” is attached to the sample data. The annotation indicating “danger” is displayed to a user.
In this way, the collation results 101 to 104 indicating “safety”, “contradiction”, and “danger” are automatically attached as annotations. Accordingly, reliability of the annotations can be improved. In addition, since the sample data to which the collation results 102 to 104 indicating “contradiction” or “danger” are attached as annotations serves as the non-learning target, the sample data to which the collation result 101 indicating “safety” is attached as an annotation remains. Therefore, automatic organization of the learning DB 100 can be implemented. In addition, by generating a learning model using the remaining sample data group, accuracy of a predicted value of the pH can be improved.
The learning DB 100 includes, as fields, sample IDs 301, feature data 302, pHs 303, belonging groups 304, annotations 305, and non-learning target information 306. A combination of values of the fields 301 to 306 in the same row defines an entry indicating sample data of one sample. Although the sample represents a pregnant woman, when a heart rate and labor intensity of the same pregnant woman are measured at different dates and times, samples will be different. In
Each of the sample IDs 301 is identification information that uniquely identifies a sample. The feature data 302 is data indicating features of the sample identified by each of the sample IDs 301. The feature data 302 includes factors F1 to Fn (n is an integer of 1 or more) including basic factors such as age, the number of weeks of pregnancy, development delay of a fetus, the number of fetuses, a delivery method, medication information, and smoking for a pregnant woman, and measurement factors such as measurement results obtained by a measurement apparatus that measures a heart rate and labor intensity.
The pHs 303 are the umbilical arterial blood gas analysis values immediately after delivery, and predicted values 331 and actual measurement values 332 are stored therein. Before delivery, only the predicted values 331 are calculated and stored by a calculation unit 703 to be described later. After delivery, the umbilical arterial blood gas analysis values are measured and stored as the actual measurement values 332.
Each of the belonging groups 304 is a group to which the sample identified by the sample ID 301 belongs. The group includes the safety group and the danger group. As described with reference to
On the other hand, the belonging groups 304 of the sample data may be determined by using a pH value = 7.2 as a reference for the actual measurement values 332, attaching a correct label indicating safety when the actual measurement values 332 are pH > 7.2, attaching a correct label indicating danger when the actual measurement values 332 are pH ≤ 7.2, and executing the supervised learning.
Each of the annotations 305 is an evaluation index for a fetus of a sample, and is attached by a doctor or attached as a collation result by a collation unit 704 to be described later.
The non-learning-target information 306 is information for setting the sample data to be the non-learning target. “0” indicates a default value of the learning target, and “1” indicates the non-learning target. It is noted that the collation apparatus may delete an entry of sample data serving as the non-learning target, without using the non-learning target information 306.
The basic information DB 600 stores data used for cleansing the sample data. Specifically, for example, the basic information DB 600 includes conditions 601 related to a basic factor group and data quality conditions 602. In the conditions 601 related to the basic factor group, for example, the basic factor group such as the age, the number of weeks of pregnancy, the delivery method, the number of fetuses, the development delay of the fetus, the medication information, and the smoking for a pregnant woman who is a sample serves as the non-learning target. For example, in the case of age, if sample data is 45 years old or older, the sample data is set to the non-learning target. In addition, in the case of the number of weeks of pregnancy, sample data of less than 27 weeks or 43 weeks or more is set to the non-learning target. In this way, a condition serving as the non-learning target is set for each factor.
The data quality conditions 602 are conditions related to quality of values of sample data, which are different from the conditions 601 related to the basic factor group. For example, sample data with a missing value of 20% or more is set to the non-learning target. It is noted that the non-learning target may be, as long as the sample data is not used for learning, a state in which the sample data itself is simply left in the learning DB 100 or a state in which the sample data is deleted from the learning DB 100. It is noted that even if the sample data is deleted from the learning DB 100, the sample data may remain in the storage device 202.
The preprocessing unit 701 refers to the basic information DB 600, cleanses the sample data group in the learning DB 100, and sets unnecessary sample data to the non-learning target.
The grouping processing unit 702 groups the sample data group in the learning DB 100 into the safety group and the danger group. Specifically, for example, as described above, the grouping processing unit 702 executes, by the unsupervised learning, grouping between the sample data having short distances between the feature data 302 (specifically, for example, a measurement factor group) of the sample data, and executes grouping until the sample data finally converge into two groups. The final two groups are the safety group and the danger group. The group having a larger number of pieces of sample data is the safety group, and the group having a smaller number of pieces of sample data is the danger group.
The grouping processing unit 702 classifies a combination of the feature data 302 (specifically, the measurement factor group) and a correct label (safety if pH > 7.2, and danger if pH ≤ 7.2) based on the actual measurement values 332 of the pHs 303 as training data into the safety group and the danger group by the supervised learning. Since the actual measurement values 332 of the pHs 303 are used, sample data to be grouped serves as sample data of a delivered pregnant woman.
The calculation unit 703 calculates the predicted values 331 of the pHs 303 by inputting a specific measurement factor group (a factor group obtained as medical knowledge) of sample data (prediction target sample data) for which the predicted values 331 of the pHs 303 have not been calculated to a learning model generated by the learning unit 706.
As shown in
Each of the annotations 305 is an annotation performed by a doctor before the collation executed by the collation unit 704, and becomes a collation result obtained by the collation unit 704 after the collation executed by the collation unit 704.
The output unit 705 outputs a preprocessing result obtained by the preprocessing unit 701 and the collation result obtained by the collation unit 704 in a displayable manner. Specifically, for example, the output unit 705 displays the preprocessing result or the collation result on a display apparatus which is an example of the output device 204, or transmits the preprocessing result or the collation result to another computer via the communication IF 205 to display the preprocessing result or the collation result on the other computer.
The learning unit 706 generates a multiple regression model as the learning model using the factor group obtained as medical knowledge of the sample data in the feature data 302 and the actual measurement values 332 of the pHs 303.
Next, an example of a collation processing procedure performed by the collation apparatus 200 will be described for each function. The collation apparatus 200 executes learning of the learning model and prediction of the pHs 303 using the learning model, and will be described by indicating which of learning and prediction the collation apparatus 200 is applied to in processes performed by functions to be described later.
Then, the preprocessing unit 701 outputs a selection screen in the displayable manner (step S804). Specifically, for example, the preprocessing unit 701 executes displaying on a display apparatus, which is an example of the output device 204, or executes displaying on another computer operated by a user.
The first radio button 901 is a selection button for a user to set the preprocessed selected sample data to the learning target, and a reason thereof is displayed as the first character string 910. “All conditions” of the first character string 910 are the conditions 601 related to the basic factor group and the data quality conditions 602 in the basic information DB 600. That is, it is indicated that the selected sample data is the learning target.
The second radio button 902 is a selection button for a user to set the preprocessed selected sample data to the non-learning target, and a reason thereof is displayed as the second character string 920. The second character string 920 is, for example, a condition corresponding to the selected sample data among the conditions 601 related to the basic factor group and the data quality conditions 602 in the basic information DB 600.
The execution button 903 is a button for checking selection of either the first radio button 901 or the second radio button 902, and the preprocessing unit 701 receives a signal indicating whether the selected sample data is selected as the learning target or the non-learning target.
The first radio button 1001 is a selection button for a user to set the preprocessed selected sample data to the learning target, and a reason thereof is displayed as the first character string 1010. “All conditions” of the first character string 1010 are the conditions 601 related to the basic factor group and the data quality conditions 602 in the basic information DB 600. That is, it is indicated that the selected sample data is the learning target.
The second radio button 1002 is a selection button for a user to set the preprocessed selected sample data to the non-learning target and for prompting the user to recheck mounting of the measurement apparatus, and content for prompting the rechecking is displayed as the second character string 1020. The execution button 1003 is a button for checking selection of either the first radio button 1001 or the second radio button 1002, and the preprocessing unit 701 receives a signal indicating whether the selected sample data is selected as the learning target or the non-learning target.
Referring back to
Referring back to
The grouping processing unit 702 executes a grouping process on the sample data group in the learning DB 100 using the factor selected in step S1101 as the feature data 302 (step S1102). Sample data to be subjected to the grouping process (step S1102) is sample data having the actual measurement values 332 of the pHs 303 and having the non-learning target information 306 of “0” (learning target). Accordingly, since defective data is not applied to the grouping process (step S1102), accuracy of the grouping process (step S1102) during learning is improved.
During learning, in the grouping process (step S1102), the grouping processing unit 702 may execute the supervised learning using selection factors of the sample data and correct data (a correct label obtained based on the actual measurement values 332 of the pHs 303) as a learning data set, or may execute the unsupervised learning using only the selection factors of the sample data. In any case, the grouping processing unit 702 groups the sample data group into the safety group and the danger group, and sets the belonging groups 304.
Next, as shown in
Returning to
Sample data to be subjected to the grouping process (step S1702) is sample data having no actual measurement values 332 of the pHs 303 and having the non-learning target information 306 of “0” (learning target). Accordingly, since defective data is not applied to the grouping process (step S1702), accuracy of the grouping process (step S1702) during prediction is improved.
During learning, in the grouping process (step S1702), the grouping processing unit 702 may execute the unsupervised learning using only the selection factors of the sample data, or may execute the supervised learning using the selection factors of the sample data and the correct data as the learning data set.
In the case of the supervised learning, the correct data is, for example, a classification (a correct label with both safety as “safety” and a correct label with contradiction and both danger as “danger”) of the annotations 305 attached in step S1104. In any case, the grouping processing unit 702 groups the sample data group into the safety group and the danger group, and sets the belonging groups 304. By executing the supervised learning using the correct data, the collation result 101 attached as the annotations 305 in step S1104 can be reflected in the learning model.
Next, the calculation unit 703 calculates the predicted values 331 of the pHs 303 of the sample data by inputting values of the selection factors of the sample data to the learning model, and stores the predicted values 331 in the learning DB (step S1703) .
Next, as shown in
Specifically, for example, the collation unit 704 updates the non-learning target information 306 to “1” for the sample data of the collation results 101 to 104. Then, the collation unit 704 outputs the collation results 101 to 104 in the displayable manner as shown in
It is noted that even in a sample having no actual measurement values 332 of the pHs 303 before delivery, the actual measurement values 332 of the pHs 303 are obtained after delivery. In this case, the collation apparatus can update the annotations 305 by executing the collation process during learning shown in
As described above, since the collation results 101 to 104 indicating “safety”, “contradiction”, and “danger” are automatically attached as the annotations 305, reliability of the annotations 305 can be improved. In addition, since the sample data to which the collation results 102 to 104 indicating “contradiction” or “danger” are attached as the annotations 305 serves as the non-learning target, the sample data to which the collation result 101 indicating “safety” is attached as the annotation 305 remains. Therefore, the automatic organization of the learning DB 100 can be implemented. In addition, by generating the learning model using the remaining sample data group, accuracy of the predicted values 331 of the pHs 303 can be improved. Therefore, it is possible to predict a fetal state without increasing data processing cost.
In the above-described embodiment, two groups of safety and danger are used. However, a classification of groups is not limited to safety and danger, and may be a classification of evaluation indicating a degree such as accuracy or reliability of a sample. In addition, in the above-described embodiment, the sample is a pregnant woman. However, the sample is not limited to a pregnant woman, and various measurement targets such as a person and an apparatus may be used as the sample.
The invention is not limited to the above embodiment and includes various modifications and equivalent configurations within the spirit of the appended claims. For example, the above-mentioned embodiment is described in detail in order to make the invention easy to understand, and the invention is not necessarily limited to those including all the configurations described above. In addition, a part of the configurations according to one embodiment may be replaced with configurations according to another embodiment. In addition, the configurations according to one embodiment may be added to the configurations according to another embodiment. Furthermore, a part of the configurations according to each embodiment may be added to, deleted from, or replaced with another configuration.
A part or all of the configurations, functions, processing units, processing methods described above and the like may be implemented by hardware by, for example, designing with an integrated circuit, or may be implemented by software by a processor interpreting and executing a program for implementing each function.
Information of a program, a table, and a file for implementing each function can be stored in a storage apparatus such as a memory, a hard disk, and a solid state drive (SSD), or a recording medium such as an integrated circuit (IC) card, an SD card, and a digital versatile disc (DVD) .
Control lines and information lines indicate what is considered necessary for description, and not all control lines and information lines necessary for implementation are necessarily shown. It may be considered that almost all the configurations are actually connected to each other.
Number | Date | Country | Kind |
---|---|---|---|
2022-015123 | Feb 2022 | JP | national |