The present invention relates to a symptom detection program, a symptom detection method, and a symptom detection device.
Typically, it has been known that a specialized doctor diagnoses a major neurocognitive disorder that makes a person be unable to perform basic actions such as eating meals or bathing or a mild cognitive impairment that makes a person be unable to perform complicated actions such as shopping or household matters although the basic operations can be performed, from computed tomography (CT), blood tests, or the like.
Japanese Laid-open Patent Publication No. 2022-61587 is disclosed as related art.
According to an aspect of the embodiments, a non-transitory computer-readable recording medium storing a symptom detection program for causing a computer to execute processing includes acquiring video data that includes a face of a patient who is executing a specific task, detecting each occurrence intensity of each action unit included in the face of the patient, by analyzing the acquired video data, and detecting a symptom related to a major neurocognitive disorder of the patient, based on a temporal change in the occurrence intensity of each of a plurality of the detected action units.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
Diagnosis of an initial major neurocognitive disorder or a mild cognitive impairment is difficult, because symptoms are less likely to appear from typical CT, blood tests, or the like. For example, at the time of emergency diagnosis such as emergency transport or a nighttime outpatient, a doctor other than a specialist may make a diagnosis, and there is an increasing possibility that erroneous diagnosis or the like occurs.
In one aspect, an object is to provide a symptom detection program, a symptom detection method, and a symptom detection device that can detect a symptom related to a major neurocognitive disorder early.
Hereinafter, embodiments of a symptom detection program, a symptom detection method, and a symptom detection device according to the present invention will be described in detail with reference to the drawings. Note that the present invention is not limited by these embodiments. In addition, the embodiments may be appropriately combined with each other in a range without inconsistency.
In medical fields, a specialized doctor diagnoses a major neurocognitive disorder that makes a person be unable to perform basic actions such as eating meals or bathing or a mild cognitive impairment that makes a person be unable to perform complicated actions such as shopping or household matters although the basic operations can be performed, from CT, blood tests, or the like.
Therefore, the symptom detection device 10 according to the first embodiment realizes early detection of the major neurocognitive disorder or the mild cognitive impairment, using video data at the time when the patient executes a specific task (application) to apply a load to a cognitive function so as to examine the cognitive function. Note that, in the present embodiment, an example will be described in which the symptom detection device 10 executes both of the specific task and the symptom detection. However, the specific task and the symptom detection can be executed by separate devices.
Specifically, the symptom detection device 10 generates each machine learning model used to detect the symptoms of the major neurocognitive disorder or the mild cognitive impairment, in a training phase. For example, as illustrated in
More specifically, the symptom detection device 10 generates the first machine learning model by inputting training data using image data in which a face of the patient is imaged as an explanatory variable and an occurrence intensity (value) of each AU as an objective variable into the first machine learning model and training a parameter of the first machine learning model so as to minimize error information between an output result of the first machine learning model and the objective variable.
Furthermore, the symptom detection device 10 generates the second machine learning model by inputting training data using the explanatory variable including the temporal change in the occurrence intensity of each AU when the patient is executing the specific task and the score that is an execution result of the specific task and whether or not the mild cognitive impairment occurs as the objective variable into the second machine learning model and training a parameter of the second machine learning model, so as to minimize error information between an output result of the second machine learning model and the objective variable.
Thereafter, the symptom detection device 10 detects whether or not a symptom related to the major neurocognitive disorder occurs, using the video data at the time when the patient executes the specific task and each trained machine learning model, in a detection phase.
For example, as illustrated in
In this way, the symptom detection device 10 can capture a fine change in a facial expression with a less individual difference, by using the AU and can detect the mild cognitive impairment early. Note that, here, as an example of the symptom related to the major neurocognitive disorder of the patient, the mild cognitive impairment has been described. However, the present embodiment is not limited to this and can be similarly applied to other symptoms of the major neurocognitive disorder or the like by setting the objective variable.
The communication unit 11 is a processing unit that controls communication with another device and, for example, is implemented by a communication interface or the like. For example, the communication unit 11 receives the video data or the score of the specific task to be described later and transmits a processing result to a destination specified in advance, by the control unit 30 to be described later.
The display unit 12 is a processing unit that displays and outputs various types of information, and is implemented by, for example, a display, a touch panel, or the like. For example, the display unit 12 outputs the specific task and receives an answer to the specific task.
The imaging unit 13 is a processing unit that captures a video and acquires video data and is implemented by, for example, a camera or the like. For example, the imaging unit 13 captures a video including the face of the patient while the patient is executing the specific task and stores the video in the storage unit 20 as the video data.
The storage unit 20 is a processing unit that stores various types of data, programs executed by the control unit 30, and the like and, for example, is implemented by a memory, a hard disk, or the like. The storage unit 20 stores a training data DB 21, a video data DB 22, a first machine learning model 23, and a second machine learning model 24.
The training data DB 21 is a database that stores various types of training data used to generate the first machine learning model 23 and the second machine learning model 24. The training data stored here can include supervised training data to which correct answer information is added and unsupervised training data to which the correct answer information is not added.
The video data DB 22 is a database that stores the video data captured by the imaging unit 13. For example, the video data DB 22 stores the video data including the face of the patient while the patient is executing the specific task, for each patient. Note that the video data includes a plurality of time-series frames. A frame number is given to each frame in a time-series ascending order. One frame is image data of a still image captured by the imaging unit 13 at a certain timing.
The first machine learning model 23 is a machine learning model that outputs the occurrence intensity of each AU, according to an input of each frame (image data) included in the video data. Specifically, the first machine learning model 23 estimates an AU that is a method for decomposing and quantifying a facial expression based on a portion of a face and facial expression muscles. This first machine learning model 23 outputs a facial expression recognition result such as “AU 1:2, AU 2:5, AU 3:1, . . . ” that expresses an occurrence intensity (for example, five-step evaluation) of each AU from an AU 1 to an AU 28 set to specify the facial expression, in response to an input of the image data. For example, for the first machine learning model 23, various algorithms such as a neural network or a random forest can be adopted.
The second machine learning model 24 is a machine learning model that outputs whether or not the mild cognitive impairment occurs, according to an input of the feature amount. For example, the second machine learning model 24 outputs a detection result including whether or not the mild cognitive impairment occurs, according to the input of the feature amount including the temporal change (change pattern) in the occurrence intensity of each AU and the score of the specific task. For example, for the second machine learning model 24, various algorithms such as a neural network or a random forest can be adopted.
The control unit 30 is a processing unit that takes overall control of the symptom detection device 10 and is implemented by, for example, a processor or the like. This control unit 30 includes a preprocessing unit 40 and an operation processing unit 50. Note that the preprocessing unit 40 and the operation processing unit 50 are implemented by an electronic circuit included in a processor, a process executed by a processor, or the like.
The preprocessing unit 40 is a processing unit that generates each model, using the training data stored in the storage unit 20, prior to an operation for detecting the symptom related to the major neurocognitive disorder. The preprocessing unit 40 includes a first training unit 41 and a second training unit 42.
The first training unit 41 is a processing unit that generates the first machine learning model 23, by performing training using the training data. Specifically, the first training unit 41 generates the first machine learning model 23, through supervised training using the training data with the correct answer information (label).
Here, the generation of the first machine learning model 23 will be described with reference to
As illustrated in
In training data generation processing, the first training unit 41 acquires the image data captured by the RGB camera 25a and a result of the motion capture by the IR camera 25b. Then, the first training unit 41 generates an AU occurrence intensity 121 and image data 122 obtained by deleting a marker from the captured image data through image processing. For example, the occurrence intensity 121 may be data in which the occurrence intensity of each AU is expressed with the five-step evaluation from A to E and annotation is performed as “AU 1:2, AU 2:5, AU 3:1, . . . ”.
In machine learning processing, the first training unit 41 performs machine learning, using the image data 122 and the AU occurrence intensity 121 output from the training data generation processing, and generates the first machine learning model 23 used to estimate the AU occurrence intensity from the image data. The first training unit 41 can use the AU occurrence intensity as a label.
Here, camera arrangement will be described with reference to
Furthermore, the plurality of markers is attached to the face of the subject to be imaged so as to cover the AU 1 to the AU 28. Positions of the markers change according to a change in a facial expression of the subject. For example, a marker 401 is arranged near the root of the eyebrow. In addition, a marker 402 and a marker 403 are arranged near the nasolabial line. The markers may be arranged on the skin corresponding to movements of one or more AUs and facial expression muscles. Furthermore, the markers may be arranged to exclude a position on the skin where a texture change is larger due to wrinkles or the like.
Moreover, the subject wears an instrument 25c to which a reference point marker is attached outside a contour of the face. It is assumed that a position of the reference point marker attached to the instrument 25c do not change even when the facial expression of the subject changes. Accordingly, the first training unit 41 is enabled to detect a positional change of the markers attached to the face, based on a change in a relative position from the reference point marker. Furthermore, by setting the number of the reference point markers to three or more, the first training unit 41 can specify a position of the marker in a three-dimensional space.
The instrument 25c is, for example, a headband. In addition, the instrument 25c may be a virtual reality (VR) headset, a mask made of a hard material, or the like. In that case, the first training unit 41 can use a rigid surface of the instrument 25c as the reference point marker.
Note that, when the IR camera 25b and the RGB camera 25a perform imaging, the subject changes his or her facial expression. This enables to acquire, as an image, how the facial expression changes as time passes. In addition, the RGB camera 25a may capture a moving image. The moving image may be regarded as a plurality of still images arranged in time series. Furthermore, the subject may change the facial expression freely, or may change the facial expression according to a predefined scenario.
Note that the AU occurrence intensity can be determined according to a marker movement amount. Specifically, the first training unit 41 can determine an occurrence intensity, based on the marker movement amount calculated based on a distance between a position preset as a determination criterion and the position of the marker.
Here, a movement of the marker will be described with reference to
In this way, the first training unit 41 specifies image data in which a certain facial expression of the subject is imaged and an intensity of each marker at the time of that facial expression and generates training data having an explanatory variable “image data” and an objective variable “an intensity of each marker”. Then, the first training unit 41 generates the first machine learning model 23 through supervised training using the generated training data. For example, the first machine learning model 23 is a neural network. The first training unit 41 changes a parameter of the neural network by performing machine learning of the first machine learning model 23. The first training unit 41 inputs the explanatory variable into the neural network. Then, the first training unit 41 generates a machine learning model of which the parameter of the neural network has been changed so as to reduce an error between an output result output from the neural network and correct answer data that is an objective variable.
Note that the generation of the first machine learning model 23 is merely an example, and another method can be used. Furthermore, as the first machine learning model 23, a model disclosed in Japanese Laid-open Patent Publication No. 2021-111114 may be used. Furthermore, a direction of the face can be trained by a similar method.
The second training unit 42 is a processing unit that generates the second machine learning model 24, by performing training using the training data. Specifically, the second training unit 42 generates the second machine learning model 24, through supervised training using training data to which the correct answer information (label) is added.
For example, the second training unit 42 acquires “presence or absence of mild cognitive impairment” as a diagnosis result of the doctor for the patient. Furthermore, the second training unit 42 acquires the occurrence intensity of each AU and the direction of the face obtained by inputting the score that is the result of executing the specific task by the patient and the video data including the face of the patient imaged while the patient is executing the specific task, into the first machine learning model 23.
Then, the second training unit 42 generates training data including “presence or absence of mild cognitive impairment” as “correct answer information” and “temporal change in occurrence intensity of each AU, temporal change in direction of face, and score of specific task” as “feature amounts”. Then, the second training unit 42 inputs the feature amount of the training data into the second machine learning model 24 and updates the parameter of the second machine learning model 24, so as to reduce an error between an output result of the second machine learning model 24 and the correct answer information.
Here, the specific task will be described.
For example, the specific task illustrated in
The specific task illustrated in
The specific task illustrated in
Next, the generation of the training data will be described in detail.
For example, the second training unit 42 inputs image data in a frame 1 into the trained first machine learning model 23 and acquires “AU1: 2, AU 2:5 . . . ” and “direction of face: A”. Similarly, the second training unit 42 inputs image data in a frame 2 into the trained first machine learning model 23 and acquires “AU1: 2, AU 2:6 . . . ” and “direction of face: A”. In this way, the second training unit 42 specifies a temporal change in each AU of the patient and a temporal change in the direction of the face of the patient, from the video data.
Furthermore, the second training unit 42 acquires a score “XX” output after the specific task ends. Furthermore, the second training unit 42 acquires a diagnosis result of the doctor for the patient who has executed the specific task “mild cognitive impairment: present”, from the doctor, an electronic chart, or the like.
Then, the second training unit 42 generates training data using “occurrence intensity of each AU”, “direction of face”, and “score (XX)” acquired using each frame as the explanatory variables and “mild cognitive impairment: present” as the objective variable and generates the second machine learning model 24. That is, the second machine learning model 24 trains a relationship between “change pattern of temporal change in occurrence intensity of each AU, change pattern of temporal change in direction of face, and score” and “whether or not mild cognitive impairment occurs”.
Returning to
Here, symptom detection will be described with reference to
The task execution unit 51 is a processing unit that executes the specific task on the patient and acquires the score. For example, the task execution unit 51 displays any one of the tasks illustrated in
The video acquisition unit 52 is a processing unit that acquires the video data including the face of the patient who is executing the specific task. For example, the video acquisition unit 52 starts imaging by the imaging unit 13 when the specific task is started, ends the imaging by the imaging unit 13 when the specific task ends, and acquires video data while the specific task is executed, from the imaging unit 13. Then, the video acquisition unit 52 stores the acquired video data in the video data DB 22 and outputs the video data to the AU detection unit 53.
The AU detection unit 53 is a processing unit that detects an occurrence intensity for each AU included in the face of the patient, by inputting the video data acquired by the video acquisition unit 52 into the first machine learning model 23. For example, the AU detection unit 53 extracts each frame from the video data, inputs each frame into the first machine learning model 23, and detects the AU occurrence intensity and the direction of the face of the patient for each frame. Then, the AU detection unit 53 outputs the detected AU occurrence intensity and direction of the face of the patient for each frame, to the symptom detection unit 54. Note that the direction of the face can be specified from the AU occurrence intensity.
The symptom detection unit 54 is a processing unit that detects whether or not a symptom related to the major neurocognitive disorder of the patient occurs, using the temporal change in the occurrence intensity of each AU, the temporal change in the direction of the face of the patient, and the score of the specific task as the feature amounts. For example, the symptom detection unit 54 inputs the “score” acquired by the task execution unit 51, the “temporal change in occurrence intensity of each AU” in which “respective AU occurrence intensities” detected for the respective frames by the AU detection unit are connected in time order, and the “temporal change in direction of face” in which the detected “directions of face” are similarly connected in time order, into the second machine learning model 24, as the feature amounts. Then, the symptom detection unit 54 acquires an output result of the second machine learning model 24 and acquires a higher one of a probability value (reliability) of the occurrence of the symptom and a probability value with no occurrence of the symptom included in the output result, as a detection result. Thereafter, the symptom detection unit 54 displays and outputs the detection result to the display unit 12 and stores the detection result in the storage unit 20.
Here, details of the detection of the mild cognitive impairment will be described.
For example, the operation processing unit 50 inputs image data in a frame 1 into the trained first machine learning model 23 and acquires “AU1: 2, AU 2:5 . . . ” and “direction of face: A”. Similarly, the operation processing unit 50 inputs image data in a frame 2 into the trained first machine learning model 23 and acquires “AU1: 2, AU 2:5 . . . ” and “direction of face: A”. In this way, the operation processing unit 50 specifies the temporal change in each AU of the patient and the temporal change in the direction of the face of the patient, from the video data.
Thereafter, the operation processing unit 50 acquires a score “YY” of the specific task, inputs “temporal change in each AU of patient (AU1: 2, AU2:5, . . . , AU1: 2, AU2: 5 . . . ), temporal change in direction of face of patient (direction of face: A, direction of face A, . . . ), and score (YY)” into the second machine learning model 24 as the feature amounts and detects whether or not the mild cognitive impairment occurs.
Subsequently, when the specific task is started (S103: Yes), the preprocessing unit 40 acquires the video data (S104). Then, the preprocessing unit 40 inputs each frame of the video data into the first machine learning model 23 and acquires the occurrence intensity of each AU and the direction of the face, for each frame (S105).
Thereafter, when the specific task ends (S106: Yes), the preprocessing unit 40 acquires the score (S107). Furthermore, the preprocessing unit 40 acquires the diagnosis result of the patient by the doctor (S108).
Then, the preprocessing unit 40 generates the training data including the temporal change in the occurrence intensity of each AU, the temporal change in the direction of the face, and the score (S109) and generates the second machine learning model 24 using the training data (S110).
Then, when the specific task ends (S204: Yes), the operation processing unit 50 acquires the score and ends the acquisition of the video data (S205). The operation processing unit 50 inputs each frame of the video data into the first machine learning model 23 and acquires the occurrence intensity of each AU and the direction of the face, for each frame (S206).
Thereafter, the operation processing unit 50 specifies the temporal change in each AU and the temporal change in the direction of the face, based on the occurrence intensity of each AU and the direction of the face for each frame and generates “temporal change in each AU, temporal change in direction of face, and score” as the feature amounts (S207).
Then, the operation processing unit 50 inputs the feature amount into the second machine learning model 24 and acquires the detection result by the second machine learning model 24 (S208) and outputs the detection result to the display unit 12 or the like (S209).
As described above, the symptom detection device 10 according to the first embodiment can detect presence or absence of the symptom related to the major neurocognitive disorder, the mild cognitive impairment, or the like, without specialized knowledge of the doctor. Furthermore, the symptom detection device 10 can capture the fine change in the facial expression with a less individual difference, by using the AU and can find the symptom related to the major neurocognitive disorder, the mild cognitive impairment, or the like early.
While the embodiment has been described above, the embodiment may be implemented in a variety of different modes in addition to the above-described embodiment.
In the above first embodiment, an example has been described in which the temporal change in each AU, the temporal change in the direction of the face, and the score are used as the feature amounts (explanatory variable), as the training data of the second machine learning model 24. However, the present embodiment is not limited to this.
Furthermore, in the above embodiment, an example has been described in which two values including presence and absence of the symptom of the mild cognitive impairment are used as the objective variable. However, the present embodiment is not limited to this. For example, it is possible to use two values including presence and absence of the symptom of the major neurocognitive disorder as the objective variables, and it is possible to use four values including presence and absence of the symptom of the major neurocognitive disorder and presence and absence of the symptom of the mild cognitive impairment.
In this way, since the symptom detection device 10 can determine the feature amount to be used for training or detection, according to accuracy and cost, it is possible to provide a simple symptom detection service, and it is possible to further provide a detailed service for supporting diagnosis of a doctor.
In the above embodiment, an example has been described in which the presence or absence of the symptom of the mild cognitive impairment is detected using the second machine learning model 24. However, the present embodiment is not limited to this. For example, it is possible to detect the presence or the absence of the symptom of the mild cognitive impairment, using a detection rule in which a pattern of the temporal change in each AU is associated with the presence or the absence of the symptom of the mild cognitive impairment.
Furthermore, the occurrence intensity of each AU can be detected by analyzing video data, other than processing using a first machine learning model 23. For example, it is possible to detect a change in each AU in the entire video data, by setting each AU for a face region of each frame in the video data.
The symptom detection processing described in the first embodiment can be provided to each individual as an application.
In such a situation, a user purchases the application 71 in any place such as home, downloads the application 71 from the application server 70, and installs the application 71 into a smartphone 60 of the user or the like. Then, the user executes the processing similar to the operation processing unit 50 described in the first embodiment, using the smartphone 60 of the user and acquires a detection result of the symptom.
As a result, when the user visits a hospital for examination with a symptom detection result by the application, a hospital side can make the examination in a state where basic detection results have been acquired. Therefore, this is useful for early determination of a disease name or symptom and early start of a treatment.
The numerical value example, the training data, the explanatory variable, the objective variable, the number of devices, or the like used in the above embodiments are merely examples and can be arbitrarily changed. In addition, the process flow described in each flowchart may be appropriately modified in a range without inconsistency.
Pieces of information including the processing procedure, control procedure, specific names, various types of data and parameters described above or illustrated in the drawings may be altered in any way unless otherwise noted.
In addition, each component of each device illustrated in the drawings is functionally conceptual and does not necessarily have to be physically configured as illustrated in the drawings. In other words, specific forms of distribution and integration of individual devices are not limited to the forms illustrated in the drawings. That is, all or a part thereof may be configured by being functionally or physically distributed or integrated in any units depending on various loads, use situations, or the like. For example, the preprocessing unit 40 and the operation processing unit 50 can be realized by separate devices.
Moreover, all or any part of each processing function performed in each device can be implemented by a central processing unit (CPU) and a program analyzed and executed by the CPU or can be implemented as hardware by wired logic.
The communication device 10a is a network interface card or the like, and communicates with another device. The HDD 10b stores a program for operating the functions illustrated in
The processor 10d reads, from the HDD 10b or the like, a program that executes processing similar to that of each processing unit illustrated in
In this way, the symptom detection device 10 operates as an information processing device that executes the symptom detection method by reading and executing the program. Furthermore, the symptom detection device 10 can also implement functions similar to the functions of the embodiments described above by reading the program described above from a recording medium by a medium reading device and executing the read program described above. Note that the programs referred to in other embodiments are not limited to being executed by the symptom detection device 10. For example, the embodiments described above may be similarly applied to a case where another computer or server executes the program or a case where these computer and server cooperatively execute the program.
This program may be distributed via a network such as the Internet. In addition, this program may be recorded in a computer-readable recording medium such as a hard disk, a flexible disk (FD), a compact disc read only memory (CD-ROM), a magneto-optical disk (MO), or a digital versatile disc (DVD), and may be executed by being read from the recording medium by a computer.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
This application is a continuation application of International Application PCT/JP2022/029199 filed on Jul. 28, 2022 and designated the U.S., the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2022/029199 | Jul 2022 | WO |
Child | 19035509 | US |