This application claims priority under 35 U.S.C § 119(a) to Japanese Patent Application No. 2021-086546 filed on 21 May 2021. The above application is hereby expressly incorporated by reference, in its entirety, into the present application.
The present invention relates to a medical image processing system and a method for operating the same.
In a case in which an endoscope is inserted into a subject for observation, a moving image may be captured in order to record a movement in the subject or to prevent a still image from being missed. An endoscope operator presses a recording button of a moving image recording device to start imaging before starting observation or inserting an endoscope tip portion into the subject and continues the imaging until the end of the observation or until the endoscope tip portion is removed from the subject. Therefore, the recording time of the acquired moving image file increases. A swallowing endoscope is used to acquire a series of swallowing aspects as a moving image. However, a portion used for diagnosis is often less than 5 seconds in one swallowing movement, and a plurality of swallowing reactions are observed during a single examination. Since the moving image recording time of recording one swallow movement is several minutes or more, a loss occurs in a case in which the target swallowing is checked later. Therefore, in a case in which swallowing is diagnosed, it is necessary to selectively play back only a swallowing portion of the acquired moving image file.
However, in the diagnosis of swallowing, it is inefficient to search for a target portion by designating the playback time or fast-forwarding the moving image, and the management of the moving image file tends to be complicated. Since the part to be imaged is almost the same, it is difficult to easily check the results of each moving image. In addition, in a case in which the playback time is long, it takes time to review the moving image.
WO2018/043585A discloses a technique which performs a freezing operation at any time to create a chapter image in a case in which an image is acquired with an endoscope and plays back the chapter image from a chapter image acquisition point after the end of observation or compares the chapter image with images captured in the past. JP2017-510368A (corresponding to US2017/027495A1) discloses a technique which evaluates the characteristics of a swallowing process of a subject in a case in which the subject swallows food by associating a vibration sensor with an imaging technique.
In WO2018/043585A, since the chapter image is acquired by the operation of the user, for example, the missing of the image or the deviation of a playback position occurs. In addition, WO2018/043585A does not disclose the execution of the observation of swallowing or the pharynx. In JP2017-510368A, the movement of swallowing is sensed and the swallowing characteristics are evaluated using the imaging technique. However, this is a method that captures an image after sensing the vibration of swallowing, and JP2017-510368A does not disclose the execution of the observation of swallowing only with images. Based on the above points, it is desirable to minimize the observation time of swallowing by a moving image and to efficiently observe the swallowing by accurately performing the detection of the swallowing and the extraction of the images obtained by imaging the swallowing without a time lag.
An object of the invention is to provide a medical image processing system that can automatically extract an index moving image from a moving image obtained by imaging a swallowing examination and automatically play back the index moving image after the swallowing examination ends and a method for operating the same.
According to an aspect of the invention, there is provided a medical image processing system comprising a processor. The processor receives a video signal on which a swallowing examination has been recorded by an endoscope, analyzes the video signal to determine whether or not a swallowing timing is present, sets a frame image at the swallowing timing as a swallowing frame image tagged with swallowing timing detection, and extracts an index moving image including the swallowing frame image from the video signal.
Preferably, the index moving image includes a swallowing frame image group including the swallowing frame image and a frame image group for a predetermined period which is continuous with the swallowing frame image group.
Preferably, the frame image group for the predetermined period is non-swallowing frame images which are arranged before a start of the swallowing frame image group and after an end of the swallowing frame image group and are not tagged with the swallowing timing detection.
Preferably, the video signal includes a frame image to be analyzed, and the processor determines whether or not the swallowing timing is present using any one of calculation of an amount of blur of the frame image, calculation of a key point based on the frame image, or a difference between pixel values of the frame images.
Preferably, the processor analyzes the index moving image to specify a type of the swallowing examination.
Preferably, the processor gives an index number used to search for a moving image to the index moving image.
Preferably, the processor automatically plays back the index moving image without any user operation in a case in which the index moving image is displayed on a screen.
Preferably, the processor displays a plurality of the index moving images in a list on a display and automatically plays back the plurality of index moving images simultaneously or continuously.
Preferably, the processor displays at least one of a type of the swallowing examination or an index number in a case in which the index moving image is displayed on a screen.
Preferably, the processor combines a plurality of the index moving images to create a composite index moving image in which the index moving images are capable of being continuously played back.
Preferably, the processor determines whether or not the swallowing timing is present using voice recognition at the time of swallowing.
According to another aspect of the invention, there is provided a method for operating a medical image processing system including a processor. The method comprises: a step of causing the processor to receive a video signal on which a swallowing examination has been recorded by an endoscope; a step of causing the processor to analyze the video signal to determine whether or not a swallowing timing is present and to set a frame image at the swallowing timing as a swallowing frame image tagged with swallowing timing detection; and a step of causing the processor to extract an index moving image including the swallowing frame image from the video signal.
It is possible to automatically extract an index moving image from a moving image obtained by imaging a swallowing examination and to automatically play back the index moving image after the swallowing examination ends. Therefore, the user can efficiently observe swallowing.
As illustrated in
The endoscope 13a is a swallowing endoscope that is inserted from the nasal cavity of a patient and illuminates the vicinity of the pharynx with illumination light to observe and image swallowing. Since swallowing is a movement, a moving image is acquired in the swallowing examination. In addition, unless otherwise specified in the imaging of swallowing, white light is used as the illumination light, and a video signal of 60 frame images (60 fps (frames per second)) is acquired per second.
As illustrated in
In the swallowing examination, an aspect in which the patient puts the food F into the mouth, swallows it in the pharynx, and transports it to the stomach through the esophagus is imaged. Even in a case in which the patient is not able to swallow the food F, the aspect is imaged. In the swallowing examination, it is preferable to continuously check a plurality of swallowing movements in succession instead of checking one swallowing movement at a time. For example, during the swallowing examination, the patient swallows the food F in the order of a colored aqueous solution, milk, his or her saliva, and pudding.
As illustrated in
The image receiving unit 21 receives a moving image file 41, on which the swallowing examination by the endoscope 13a has been recorded, and transmits the moving image file 41 to the index moving image creation unit 30. The index moving image creation unit 30 extracts an index moving image 42 from the moving image file 41 and transmits the index moving image 42 to the display control unit 22. The display control unit 22 performs control to display the index moving image 42 on the display 14. The input receiving unit 23 is connected to the user interface 15. The storage memory 24 can independently implement a storage function and is connected to the database 12 such that it can transmit and receive data. In addition, the moving image file 41 is received after the swallowing examination ends. However, a video signal may be processed in real time during the swallowing examination before the moving image file 41 is created.
As illustrated in
The moving image file 41 transmitted to the index moving image creation unit 30 is temporarily stored in the temporary storage area 31. The temporarily stored moving image file 41 is transmitted to the swallowing timing detection unit 32.
The swallowing timing detection unit 32 performs a swallowing detection process that analyzes the moving image file 41 with a tool only for machine learning using, for example, deep learning to determine whether or not to have the swallowing timing which is a movement in a case in which the food F passes through the pharynx. A portion corresponding to the swallowing timing is the swallowing frame image group 43 and is continuous frame images including the food F which are included in the moving image file 41. The moving image file 41 subjected to the swallowing detection process is transmitted to the index moving image extraction unit 33. Further, it is preferable that the moving image file 41, in which the swallowing timing has been detected, is stored in the storage memory 24 and is deleted from the temporary storage area 31.
The index moving image extraction unit 33 extracts an index moving image including a frame tagged with the “swallowing timing detection” from the moving image file 41. Specifically, the index moving image extraction unit 33 extracts, as the index moving image 42, the swallowing frame image group 43 and frame image groups for a predetermined period which are continuous with the start and end of the swallowing frame image group 43 in the moving image file 41. For the extraction of the swallowing frame image group 43, the frame images tagged with the “swallowing timing detection” are extracted from the moving image file 41. Further, the predetermined period is a time of about 3 seconds or 5 seconds set in advance by the user and is preferably a period required to capture, for example, the movement of the pharynx before and after the passage of the food F.
As illustrated in
The swallowing classification unit 34 analyzes the index moving image 42, gives the types and the results of the swallowing examinations in the index moving image 42, or for example, index numbers. The classification result is the classification of the type of food F swallowed and includes, for example, the swallowing of saliva, the swallowing of milk, the swallowing of a colored aqueous solution, the swallowing of pudding, and whether the swallowing thereof is normal swallowing or aspiration. Score evaluation that gives a point according to the degree of aspiration may be used. These are automatically classified on the basis of data learned by machine learning. It is preferable to give information on the automatically classified type of swallowing examination food to the index moving image 42. The index numbers are numbers that are used by the user to search for the index moving images 42 stored in, for example, the storage memory 24 and are preferably alphanumeric character strings that do not overlap each other, have a small number of digits, and indicate, for example, the order in which the swallowing timing is detected and the type of swallowing examination food. In this case, the types of swallowing are saliva (sa), milk (mi), colored water (cw), pudding (pu), and unknown (un). In addition, the swallowing is classified into normal swallowing (S) or aspiration (A). The index moving image 42 after the classification is transmitted to the display control unit 22.
As illustrated in
For example, the display control unit 22 controls the switching of the screen displayed on the display 14. In a case in which the swallowing examination is performed to image the pharynx with the endoscope 13a, a real-time video observed by the user is displayed as an observation screen. In a case in which the moving image or the like acquired after the end of the imaging is displayed, the moving image is displayed as a playback screen.
The index moving image creation unit 30 automatically performs a process from the acquisition of the moving image file 41 to the transmission of the index moving image 42 to the display control unit 22. In a case in which the imaging by the endoscope 13a ends, the display of the display 14 is switched from the observation screen to the playback screen. The index moving image 42 extracted from the acquired moving image file 41 is displayed on the playback screen.
As illustrated in
In the continuous playback of each index moving image 42, it is preferable to display the moving image that is being played back so as to be highlighted. For example, as illustrated in the index moving image 42b, the frame of the moving image is thickened to make it easier to see. The index moving image 42 is automatically played back at the time of automatic display such that the content displayed in the moving image information display field 50 can be checked and edited by the operation of the user. Information, such as the type of swallowing and the index number given by the swallowing classification unit 34, the name and age of the patient, the name of the photographer, the title of the moving image, and findings, is displayed in the moving image information display field 50. Further, the index number may be applied as the moving image title of the index moving image 42.
On the playback screen, the play button 51 can be pressed to repeatedly play back the index moving image 42 whose automatic playback has been ended, the fast rewind button 52 can be used to go back to the missed scene, the fast forward button 53 can be used to increase a moving image playback speed, and the pause button 54 can be used to stop playback at any time. The playback state of the moving image is represented by the position of the slider 56 on the seek bar 55, and the position of the slider 56 can be moved to freely change a playback point by the operation of the user such as dragging. In a case in which the repeat play button 57 is selected, the index moving image 42 whose playback has been ended is repeatedly played back. It is preferable that the seek bar 55 at the time of continuous playback displays the division of the playback time for each index moving image 42.
As illustrated in
The user can check the information of the index moving image 42 displayed in the moving image information display field 50 and perform editing, such as addition and correction, on the information. For example, the addition of characteristics, such as findings obtained by playing back the index moving image 42 and the gender and age of the patient, and the editing of the automatically classified type of swallowing can be performed from the user interface 15 through the input receiving unit 23.
The user can edit the extraction range of the index moving image 42 in a case in which the information in the moving image information display field 50 is edited. For example, in a case in which the result of the review of the playback screen shows that the swallowing timing is erroneously detected or the extraction is insufficient from the index moving image 42, the index moving image 42 may be reacquired by re-extraction including manual extraction or re-examination with reference to the moving image file 41 stored in the temporary storage area 31.
The index moving image 42 in which various kinds of moving image information have been checked and edited is stored in the storage memory 24 or the database 12. It is preferable to delete the moving image file 41 which is an extraction source stored in the temporary storage area 31.
As illustrated in
As illustrated in
The swallowing detection process will be described. In deep learning, a tool learns the characteristics of an image, which is an object to be detected, in advance. Then, the tool calculates the probability that the analyzed frame image will be the object to be detected, and a frame image having a probability equal to or greater than a threshold value is determined to be the object to be detected. The object to be detected is the pharynx that captures the food F, and the movement of the food F is tracked to detect the swallowing frame image 44. The swallowing frame image 44 is tagged. Further, in deep learning, it is necessary to learn information corresponding to the type of food F used for the swallowing examination.
The swallowing timing is the movement of the pharynx in a case in which the patient swallows the food F. However, deep learning for recognizing the food F is not needed to detect the swallowing timing, and the swallowing detection process may be performed according to, for example, the characteristics of the image showing the swallowing state or a change between the front and rear frame images. Specifically, the swallowing detection process may be performed by a swallowing detection algorithm using, for example, “a difference between the pixel values of the front and rear frame images”, “the amount of blur of the frame image”, and “the number of key points of the image”. In addition, in any swallowing detection process, the frame image determined to shows the swallowing state is tagged with “swallowing timing detection”.
As illustrated in
In the detection algorithm using “the difference between the pixel values of the front and rear frame images”, since the amount of movement of the object between the frame images is large in the images showing the swallowing state, the difference between the simple pixel values of the front and rear frame images is calculated. The swallowing timing detection unit 32 calculates the difference between the pixel values of two frame images that are continuous in time series. It is preferable to use AbsDiff installed in OpenCV (registered trademark) used for image analysis as a function for calculating the difference (simple difference value).
As illustrated in
A specific example will be described in detail. The upper part of
The lower part of
In the detection algorithm using “the amount of blur of the frame image”, since the amount of blur of the object between the frame images is large in the images showing the swallowing state, the amount of edge indicating the amount of blur between the frame images is calculated. The swallowing timing detection unit 32 calculates the amount of edge related to the two frame images that are continuous in time series. It is preferable to use Variance Of Laplacian installed in OpenCV (registered trademark) used for image analysis as a function for calculating the amount of edge.
As illustrated in
A specific example will be explained in detail. The upper part of
The lower part of
In the detection algorithm using the “number of key points”, since the images showing the swallowing state have a large amount of blur of the object between the frame images, the edges of the frame images are unclear and the number of key points is reduced. The key point means a feature point which is a portion having a high probability of being a corner with a large amount of edge among the edges obtained by extracting lines representing the frame image. The swallowing timing detection unit 32 calculates the number of key points related to the frame image. It is preferable to use Count Key Points installed in OpenCV (registered trademark) used for image analysis as a function for calculating the number of key points.
As illustrated in
In addition, Accelerated KAZE (AKAZE) installed in OpenCV (registered trademark) used for image analysis is used to extract the feature points. In the extraction of the feature points in this embodiment, it is preferable to recognize a portion (a portion recognized as a “corner”) having a large amount of edge in the image.
A specific example will be explained in detail. The upper part of
The lower part of
In the above-described embodiment, image analysis is performed on the acquired moving image file 41 to detect the swallowing timing. However, in this embodiment, the determination of whether or not the swallowing timing is present and classification are performed in addition to voice recognition at the time of swallowing. The extraction of the index moving image 42 according to this embodiment will be described below. In addition, the description of the same content as that in the above-described embodiment will not be repeated.
As the swallowing detection algorithm, voice is used to determine swallowing in the oral stage. The user interface 15 connected to the medical image processing device 11 includes a microphone (not illustrated) that acquires voice, and a voice waveform acquired from the microphone is input to the medical image processing device 11 from the input receiving unit 23 whenever the voice waveform is acquired. The acquisition of the voice is performed in operative association with the imaging of the pharynx, and the voice is transmitted to the index moving image creation unit 30 in a form in which it is attached to the moving image file 41. The voice waveform is associated as a voice signal with the moving image file 41, is stored in the temporary storage area 31 of the index moving image creation unit 30, and is then transmitted to the swallowing timing detection unit 32.
As illustrated in
The examination moving image analysis unit 32a performs a process of performing image analysis on the moving image file 41 to detect swallowing, whose content is the same as that performed by the swallowing timing detection unit 32 in the first embodiment. Therefore, the moving image file 41 determined to have the food F recognized therein and to have the swallowing timing is transmitted to the patient voice determination unit 32b. In addition, it is preferable that the moving image file 41 in which the swallowing timing has been detected is stored in the storage memory 24. In this case, the moving image file 41 is deleted from the temporary storage area 31.
The patient voice determination unit 32b analyzes the voice signal that is attached to the moving image file 41 determined to have the swallowing timing to determine whether the voice is uttered from the patient or a person other than the patient. In a case in which the voice signal is determined to indicate the voice uttered from a person other than the patient, the voice signal is recorded together with the examination time. In a case in which the voice determined to be uttered from a person other than the patient is a specific voice (for example, a voice “examination start”) uttered by a doctor or the like, the specific voice and the frame image of the moving image file 41 at the time when the specific voice is uttered may be associated with each other. In a case in which it is determined that the voice signal has the voice uttered from the patient at the swallowing timing, the voice signal is transmitted to the swallowing sound determination unit 32c. Further, the frame image of the moving image file 41 operatively associated with the determined voice may be tagged with a “patient voice” or a “non-patient voice”.
The swallowing sound determination unit 32c determines whether the voice signal is a swallowing-related sound or a non-swallowing-related sound. Examples of the swallowing-related sound include a swallowing sound and epiglottis opening and closing sounds associated with swallowing. Examples of the non-swallowing-related sound include a coughing sound, a choking sound, a breathing sound, and a vocalized sound. In a case in which the voice signal is the swallowing-related sound, the frame image of the moving image file 41 operatively associated with the swallowing-related sound is tagged with the “swallowing-related sound”. Similarly, the frame image of the moving image file 41 operatively associated with the non-swallowing-related sound is tagged with the “non-swallowing-related sound”. The moving image file 41 is transmitted to the index moving image extraction unit 33 regardless of whether the swallowing-related sound is present in the voice signal. In addition, it is preferable to calculate the probability of the swallowing state to determine the swallowing-related sound.
In a case in which it is not possible to determine whether a swallowing reaction or a reaction other than swallowing occurs only using image analysis or in a case in which a reaction other than swallowing, such as the closing of the glottis or the opening and closing of the epiglottis due to coughing, occurs and the amount of movement of the glottis or the epiglottis is large even though accuracy is low, the swallowing timing detection unit 32 having the above-mentioned configuration can exclude these reactions other than swallowing using the voice signal, which makes it possible to improve the accuracy of determining the swallowing state or the non-swallowing state. Further, it is preferable that the index moving image extraction unit 33 extracts the moving image file 41, which has been determined to have the swallowing timing in the image analysis, but has not been tagged with the “swallowing-related sound” on the basis of only the “non-swallowing-related sound” by the swallowing sound determination unit 32c and then the swallowing classification unit 34 classifies whether or not the type of swallowing is aspiration.
The index moving image extraction unit 33 sets the frame images of the moving image file 41 tagged with the “swallowing-related sound” as the swallowing frame image group 43 at the swallowing timing. The index moving image extraction unit 33 extracts, as the index moving image 42, the swallowing frame image group 43 and the frame images for predetermined seconds which are continuous before and after the swallowing frame image group 43. In addition, for the moving image file 41 that has not been tagged with the “swallowing-related sound”, the index moving image 42 is extracted only by the same image analysis as that in the first embodiment.
The swallowing classification unit 34 performs classification based voice analysis on the index moving image 42 in addition to the classification based on the image analysis. At the time when the swallowing-related sound and the non-swallowing-related sound are generated, the swallowing classification unit 34 classifies the types of swallowing into normal swallowing or abnormal swallowing (swallowing disorder) and gives the classification result to the index moving image 42. Specifically, the classification of the normal swallowing or the abnormal swallowing related to the swallowing-related sound and the non-swallowing-related sound is determined after the following are analyzed: the number of swallowing-related sounds and non-swallowing-related sounds; the nature and length of the swallowing sound; breathing sounds before and after swallowing; choke and cough after swallowing; and at what interval the swallowing-related sounds are uttered in a case in which the swallowing-related sound is uttered a plurality of times; and whether or not the epiglottis opening and closing sounds associated with swallowing are related to swallowing disorder. The classification result by the voice analysis can be combined with the classification result by the image analysis to obtain a more specific classification result or a classification result with high accuracy.
After the classification is performed by the swallowing classification unit 34, the index moving image 42 is displayed and then automatically played back on the display 14 through the display control unit 22. It is preferable that the index moving image 42 which is automatically played back is also played back in operative association with the swallowing-related sound. Further, it is preferable that, for example, information of whether or not swallowing is normal is automatically displayed in the information described in the moving image information display field 50.
In each of the above-described embodiments, the medical image processing device 11 acquires the captured moving image file 41 from the endoscope system 13 and extracts the index moving image 42. However, in this embodiment, in addition to the extraction according to each of the above-described embodiments, the index moving image 42 is extracted from the moving image file 41 stored in the database 12. The review of the swallowing examination according to this embodiment will be described below. In addition, the description of the same content as that in the above-described embodiment will not be repeated.
Some swallowing examinations are performed a plurality of times at intervals in order to track a change in the condition of a disease. Therefore, it is desirable to compare the acquired results of the swallowing examination with the results of the swallowing examination performed in the past. The medical image processing device 11 receives the moving image file 41 obtained by capturing the swallowing examination in the past from the database 12 with the image receiving unit 21 and extracts the index moving image with the index moving image creation unit 30.
As illustrated in
In a case in which a specific moving image file 41 is acquired from the database 12, for example, it is preferable that the specific moving image file 41 is acquired by a search from a search screen using the type name of swallowing, the name of the patient, the imaging date, and the like in order to check whether or not the swallowing in the acquired index moving image 42 is normal.
The moving image acquired from the database 12 may be the moving image file 41 to be subjected to the extraction process, or the extracted past index moving image 47 may be directly acquired from the database 12 and then displayed on the display 14. Further, the index moving image 42 and the past index moving image 47 may be combined to obtain the composite index moving image 46 illustrated in
In each of the above-described embodiments, the hardware structures of the processing units executing various processes, such as the central control unit 20, are the following various processors. The various processors include, for example, a central processing unit (CPU) which is a general-purpose processor that executes software (programs) to function as various processing units, a programmable logic device (PLD), such as a field programmable gate array (FPGA), that is a processor whose circuit configuration can be changed after manufacture, and a dedicated electric circuit that is a processor having a dedicated circuit configuration designed to perform various processes.
One processing unit may be configured by one of the various processors or a combination of two or more processors of the same type or different types (for example, a combination of a plurality of FPGAs and a combination of a CPU and an FPGA). Further, a plurality of processing units may be configured by one processor. A first example of the configuration in which a plurality of processing units are configured by one processor is an aspect in which one processor is configured by a combination of one or more CPUs and software and functions as a plurality of processing units. A representative example of this aspect is a client computer or a server computer. A second example of the configuration is an aspect in which a processor that implements the functions of the entire system including a plurality of processing units using one integrated circuit (IC) chip is used. A representative example of this aspect is a system-on-chip (SoC). As described above, various processing units are configured using one or more of the various processors as a hardware structure.
In addition, specifically, the hardware structure of the various processors is an electric circuit (circuitry) obtained by combining circuit elements such as semiconductor elements. Further, the hardware structure of the storage unit is a storage device such as a hard disc drive (HDD) or a solid state drive (SSD).
10: medical image processing system
11: medical image processing device
12: database
13: endoscope system
13
a: endoscope
13
b: endoscope tip portion
14: display
15: user interface
20: central control unit
21: image receiving unit
22: display control unit
23: input receiving unit
24: storage memory
30: index moving image creation unit
31: temporary storage area
32: swallowing timing detection unit
32
a: examination moving image analysis unit
32
b: patient voice determination unit
32
c: swallowing sound determination unit
33: index moving image extraction unit
34: swallowing classification unit
41: moving image file
42: index moving image
42
a: index moving image
42
b: index moving image
42
c: index moving image
42
d: index moving image
42
e: index moving image
43: swallowing frame image group
43
a: swallowing frame image group
43
b: swallowing frame image group
44: swallowing frame image
45: non-swallowing frame image
46: composite index moving image
47: past index moving image
50: moving image information display field
51: play button
52: fast rewind button
53: fast forward button
54: pause button
55: slider
56: seek bar
57: repeat play button
100: example of frame image in
101
a: front frame image in upper part of
101
b: rear frame image in upper part of
101
c: example of difference in upper part of
101
d: front frame image in lower part of
101
e: rear frame image in lower part of
101
f: example of difference in lower part of
101
g: image processing target region in
102
a: frame image in upper part of
102
b: example of amount of edge in upper part of
102
c: frame image in lower part of
102
d: example of amount of edge in lower part of
102
g: image processing target region in
103
a: frame image in upper part of
103
b: frame image in lower part of
103
c: feature point
103
g: image processing target region in
Eg: epiglottis
F: food
R: position
Rg: rima glottidis
Ps: pyriform sinus
Number | Date | Country | Kind |
---|---|---|---|
2021-086546 | May 2021 | JP | national |