The present invention relates to an endoscope system, a medical information processing apparatus, a medical information processing method, a medical information processing program, and a recording medium which perform an audio input and audio recognition.
In the technical field of performing an examination and diagnosis support using medical images, it is known to recognize an audio input by a user and to perform processing based on a recognition result. For example, JP1996-052105A (JP-H08-052105A) discloses that the endoscope is operated by the audio input. In addition, JP2004-102509A discloses that an audio input for report creation can be performed.
In a case of performing an audio input during an examination using medical images, in a case where all words can be recognized regardless of the scene, there is a risk that mutual erroneous recognition between words increases and operability is reduced. However, the techniques in the related art such as JP1996-052105A (JP-H08-052105A) and JP2004-102509A described above have not sufficiently taken these problems into consideration.
The present invention has been made in view of such circumstances, and an object thereof is to provide an endoscope system, a medical information processing apparatus, a medical information processing method, a medical information processing program, and a recording medium capable of improving the accuracy of the audio recognition regarding the medical images.
In order to achieve the object described above, an endoscope system according to a first aspect of the present invention is an endoscope system including an audio input device; an image sensor that images a subject; and a processor, in which the processor acquires a plurality of medical images obtained by the image sensor imaging the subject in chronological order, accepts an input of an audio input trigger during capturing of the plurality of medical images, sets, in a case where the audio input trigger is input, an audio recognition dictionary according to the audio input trigger, and performs audio recognition on audio input to the audio input device after the setting, using the set audio recognition dictionary. In the first aspect, since the audio recognition dictionary according to the audio input trigger is set and the audio recognition is performed using the set audio recognition dictionary, it is possible to improve the accuracy of the audio recognition regarding the medical image using the audio recognition dictionary tailored to a scene of the audio recognition.
In the endoscope system according to a second aspect, in the first aspect, in the audio recognition, the processor recognizes only registered words that are registered in the set audio recognition dictionary, and causes an output device to output a result of the audio recognition for the registered words. According to the second aspect, only the registered words registered in the set audio recognition dictionary are audio-recognized, so that recognition accuracy can be improved.
In the endoscope system according to a third aspect, in the first aspect, in the audio recognition, the processor recognizes registered words that are registered in the set audio recognition dictionary and specific words, and causes an output device to output a result of the audio recognition for the registered words among the recognized words. Note that an example of the “specific word” is a wake word for the audio input device, but the “specific word” is not limited thereto.
In the endoscope system according to a fourth aspect, in any one of the first to third aspects, the processor determines whether or not a specific subject is included in the plurality of medical images, using image recognition, and accepts a determination result indicating that the specific subject is included, as the audio input trigger.
In the endoscope system according to a fifth aspect, in any one of the first to fourth aspects, the processor determines whether or not a specific subject is included in the plurality of medical images, using image recognition, discriminates the specific subject in a case where it is determined that the specific subject is included, and accepts an output of a discrimination result for the specific subject, as the audio input trigger.
In the endoscope system according to a sixth aspect, in the fourth or fifth aspect, the processor determines whether or not a plurality of types of the specific subjects are included in the plurality of medical images, using a plurality of times of image recognition, respectively corresponding to the plurality of types of specific subjects, and sets the audio recognition dictionary corresponding to the type of the specific subject that is determined to be included in the plurality of medical images by any of the plurality of times of image recognition, among the plurality of types of specific subjects.
In the endoscope system according to a seventh aspect, in the sixth aspect, the processor determines whether or not a plurality of specific subjects are included in the plurality of medical images, using the image recognition, and sets the audio recognition dictionary corresponding to the specific subject determined to be included in the plurality of medical images, among the plurality of specific subjects.
In the endoscope system according to an eighth aspect, in any one of the fourth to seventh aspects, the processor performs the image recognition using an image recognizer configured by machine learning.
In the endoscope system according to a ninth aspect, in any one of the fourth to eighth aspects, the processor records the medical image decided to include the specific subject, among the plurality of medical images, a determination result using the image recognition for the specific subject, and a result of the audio recognition in a recording device in association with each other.
In the endoscope system according to a tenth aspect, in any one of the fourth to ninth aspects, the processor decides at least one of a lesion, a lesion candidate region, a landmark, a treated region, a treatment tool, or a hemostat, as the specific subject.
In the endoscope system according to an eleventh aspect, in any one of the fourth to tenth aspects, the processor executes the audio recognition using the set audio recognition dictionary during a period in which a predetermined condition is satisfied after the setting.
In the endoscope system according to a twelfth aspect, in the eleventh aspect, the processor sets the period for each image recognizer that performs the image recognition.
In the endoscope system according to a thirteenth aspect, in the eleventh or twelfth aspect, the processor sets the period depending on a type of the audio input trigger.
In the endoscope system according to a fourteenth aspect, in any one of the eleventh to thirteenth aspects, the processor displays a remaining time of the period on a screen of a display device.
In the endoscope system according to a fifteenth aspect, in any one of the first to fourteenth aspects, the processor performs the audio recognition for site information, findings information, treatment information, and hemostasis information.
In the endoscope system according to a sixteenth aspect, in any one of the first to fifteenth aspects, in a case where any one of an imaging start instruction of the plurality of medical images, an output of a result of image recognition for the plurality of medical images, an operation of switching to a discrimination mode, an operation to an operation device connected to the endoscope system, or an input of a wake word for the audio input device is performed, the processor decides that the audio input trigger is input.
In the endoscope system according to a seventeenth aspect, in any one of the first to sixteenth aspects, the processor displays a result of the audio recognition on a display device.
In order to achieve the object described above, a medical information processing apparatus according to an eighteenth aspect of the present invention is a medical information processing apparatus including a processor, in which the processor acquires a plurality of medical images obtained by an image sensor imaging a subject in chronological order, accepts an input of an audio input trigger during an input of the plurality of medical images, sets, in a case where the audio input trigger is input, an audio recognition dictionary according to the audio input trigger, and performs audio recognition on audio input to the audio input device after the setting, using the set audio recognition dictionary. According to the eighteenth aspect, it is possible to improve the accuracy of the audio recognition regarding the medical image as in the first aspect.
In order to achieve the object described above, a medical information processing method according to a nineteenth aspect of the present invention is a medical information processing method executed by an endoscope system including an audio input device, an image sensor that images a subject, and a processor, the medical information processing method including, via the processor, acquiring a plurality of medical images obtained by the image sensor imaging the subject in chronological order; accepting an input of an audio input trigger during capturing of the plurality of medical images; setting, in a case where the audio input trigger is input, an audio recognition dictionary according to the audio input trigger; and performing audio recognition on audio input to the audio input device after the setting, using the set audio recognition dictionary. According to the nineteenth aspect, it is possible to improve the recognition accuracy of the audio input regarding the medical image as in the first and eighteenth aspects. The nineteenth aspect may have the same configuration as the second to seventeenth aspects.
In order to achieve the object described above, a medical information processing program according to a twentieth aspect of the present invention is a medical information processing program causing an endoscope system including an audio input device, an image sensor that images a subject, and a processor, to execute a medical information processing method, the medical information processing program causing, in the medical information processing method, the processor to acquire a plurality of medical images obtained by the image sensor imaging the subject in chronological order, accept an input of an audio input trigger during capturing of the plurality of medical images, set, in a case where the audio input trigger is input, an audio recognition dictionary according to the audio input trigger, and perform audio recognition on audio input to the audio input device after the setting, using the set audio recognition dictionary. According to the twentieth aspect, it is possible to improve the accuracy of the audio recognition regarding the medical image as in the first, eighteenth, and nineteenth aspects.
The medical information processing method executed by the medical information processing program according to the twentieth aspect may have the same configuration as the second to seventeenth aspects.
In order to achieve the object described above, a recording medium according to a twenty-first aspect of the present invention is a non-transitory and tangible recording medium in which a computer readable code of the medical information processing program according to the twentieth aspect is recorded. In the twenty-first aspect, examples of the “non-transitory and tangible recording medium” include various magneto-optical recording devices and semiconductor memories. The “non-transitory and tangible recording medium” does not include a non-tangible recording medium such as a carrier wave signal itself and a propagation signal itself.
Note that, in the twenty-first aspect, the medical information processing program of which the code is recorded in the recording medium may be one that causes the endoscope system or the medical information processing apparatus to execute a medical information processing program that performs the same processing as in the second to seventeenth aspects.
With the endoscope system, the medical information processing apparatus, the medical information processing method, the medical information processing program, and the recording medium according to the present invention, it is possible to improve accuracy of audio recognition regarding medical images.
Embodiments of an endoscope system, a medical information processing apparatus, a medical information processing method, a medical information processing program, and a recording medium according to the present invention will be described. In the description, reference is made to the accompanying drawings as necessary. Note that, in the accompanying drawings, some constituents may be omitted for convenience of description.
Here, a case where the present invention is applied to an endoscopic image diagnosis support system will be described as an example. The endoscopic image diagnosis support system is a system that supports detection and discrimination of a lesion or the like in an endoscopy. In the following, an example of application to an endoscopic image diagnosis support system that supports detection and discrimination of a lesion and the like in a lower digestive tract endoscopy (large intestine examination) will be described.
As illustrated in
The endoscope system 10 of the present embodiment is configured as a system capable of an observation using special light (special light observation) in addition to an observation using white light (white light observation). In the special light observation, a narrow-band light observation is included. In the narrow-band light observation, a blue laser imaging observation (BLI observation), a narrow band imaging observation (NBI observation; NBI is a registered trademark), a linked color imaging observation (LCI observation), and the like are included. Note that the special light observation itself is a well-known technique, so detailed description thereof will be omitted.
As illustrated in
The endoscope 20 of the present embodiment is an endoscope for a lower digestive organ. As illustrated in
The insertion part 21 is a part to be inserted into a hollow organ (large intestine in the present embodiment). The insertion part 21 includes a distal end portion 21A, a bendable portion 21B, and a soft portion 21C in order from a distal end side.
As illustrated in the figure, in the edge surface of the distal end portion 21A, an observation window 21a, illumination windows 21b, an air/water supply nozzle 21c, a forceps outlet 21d, and the like are provided. The observation window 21a is a window for an observation. The inside of the hollow organ is imaged through the observation window 21a. Imaging is performed via an optical system such as a lens and an image sensor (not illustrated) built in the distal end portion 21A (portion of the observation window 21a). As the image sensor, for example, a complementary metal-oxide-semiconductor image sensor (CMOS image sensor), a charge-coupled device image sensor (CCD image sensor), or the like is used. The illumination windows 21b are windows for illumination. The inside of the hollow organ is irradiated with illumination light via the illumination windows 21b. The air/water supply nozzle 21c is a nozzle for cleaning. A cleaning liquid and a drying gas are sprayed from the air/water supply nozzle 21c toward the observation window 21a. The forceps outlet 21d is an outlet for a treatment tool such as forceps. The forceps outlet 21d functions as a suction port for sucking body fluids and the like.
The bendable portion 21B is a portion that is bent according to an operation of an angle knob 22A of the operation part 22. The bendable portion 21B is bent in four directions of up, down, left, and right.
The soft portion 21C is an elongated portion provided between the bendable portion 21B and the operation part 22. The soft portion 21C has flexibility.
The operation part 22 is a part that is held by an operator to perform various operations. The operation part 22 includes various operation members. As an example, the operation part 22 includes the angle knob 22A for a bending operation of the bendable portion 21B, an air/water supply button 22B for performing an air/water supply operation, and a suction button 22C for performing a suction operation. In addition, the operation part 22 includes an operation member (shutter button) for imaging a static image, an operation member for switching an observation mode, an operation member for switching on and off of various support functions, and the like. In addition, the operation part 22 includes a forceps insertion port 22D for inserting a treatment tool such as forceps. The treatment tool inserted from the forceps insertion port 22D is drawn out from the forceps outlet 21d (refer to
The connection part 23 is a part for connecting the endoscope 20 to the light source device 30, the endoscopic image generation device 40, and the like. The connection part 23 includes a cord 23A extending from the operation part 22, and a light guide connector 23B and a video connector 23C that are provided on a distal end of the cord 23A. The light guide connector 23B is a connector for connecting to the light source device 30. The video connector 23C is a connector for connecting to the endoscopic image generation device 40.
The light source device 30 generates illumination light. As described above, the endoscope system 10 of the present embodiment is configured as a system capable of the special light observation in addition to the normal white light observation. Therefore, the light source device 30 is configured to be capable of generating light (for example, narrow-band light) corresponding to the special light observation in addition to the normal white light. Note that, as described above, the special light observation itself is a well-known technique, so the description for the light generation will be omitted.
The endoscopic image generation device 40 (processor) comprehensively controls the entire operation of the endoscope system 10 together with the endoscopic image processing device 60 (processor). The endoscopic image generation device 40 includes, as a hardware configuration, a processor, a main storage unit (memory), an auxiliary storage unit (memory), a communication unit, and the like. That is, the endoscopic image generation device 40 has a so-called computer configuration as the hardware configuration. The processor is configured by, for example, a central processing unit (CPU), a graphics processing unit (GPU), a field-programmable gate array (FPGA), a programmable logic device (PLD), and the like. For example, the main storage unit is configured by a random-access memory (RAM) and the like. The auxiliary storage unit is configured by, for example, a non-transitory and tangible recording medium such as a flash memory, and can record computer-readable codes of a medical information processing program according to the embodiment of the present invention or of a part thereof, and other data. In addition, the auxiliary storage unit may include various magneto-optical recording devices, semiconductor memories, and the like in addition to or instead of the flash memory.
As illustrated in the figure, the endoscopic image generation device 40 has functions of an endoscope control unit 41, a light source control unit 42, an image generation unit 43, an input control unit 44, an output control unit 45, and the like. Various programs executed by the processor (which may include the medical information processing program according to the embodiment of the present invention or a part thereof) and various kinds of data necessary for control or the like are stored in the auxiliary storage unit described above, and each function of the endoscopic image generation device 40 is realized by the processor executing these programs. The processor of the endoscopic image generation device 40 is an example of the processor in the endoscope system and in the medical information processing apparatus according to the embodiment of the present invention.
The endoscope control unit 41 controls the endoscope 20. The control for the endoscope 20 includes image sensor drive control, air/water supply control, suction control, and the like.
The light source control unit 42 controls the light source device 30. The control for the light source device 30 includes light emission control for a light source, and the like.
The image generation unit 43 generates captured images (endoscopic images) on the basis of signals output from the image sensor of the endoscope 20. The image generation unit 43 can generate a static image and/or a video (a plurality of medical images obtained by an image sensor 25 imaging a subject in chronological order) as the captured image. The image generation unit 43 may perform various kinds of image processing on the generated images.
The input control unit 44 accepts an input of an operation and an input of various kinds of information via the input device 50.
The output control unit 45 controls an output of information to the endoscopic image processing device 60. The information to be output to the endoscopic image processing device 60 includes various kinds of operation information input from the input device 50, and the like in addition to the endoscopic image obtained by imaging.
The input device 50 constitutes a user interface in the endoscope system 10 together with the display device 70. The input device 50 includes a microphone 51 (audio input device) and a foot switch 52 (operation device). The microphone 51 is an input device for performing audio recognition, which will be described later. The foot switch 52 is an operation device that is placed at an operator's feet and operated with the foot, and outputs an operation signal (for example, a signal indicating an audio input trigger or a signal to select a candidate for audio recognition) by stepping on a pedal. Note that, in this embodiment, the microphone 51 and the foot switch 52 are controlled by the input control unit 44 of the endoscopic image generation device 40, but the present invention is not limited to this embodiment, and the microphone 51 and the foot switch 52 may also be controlled via the endoscopic image processing device 60, the display device 70, and the like. In addition, in the operation part 22 of the endoscope 20, an operation device (button, switch, and the like) having the same function as the foot switch 52 may be provided.
In addition, the input device 50 can include a known input device such as a keyboard, a mouse, a touch panel, and a gaze input device as the operation device.
The endoscopic image processing device 60 includes, as a hardware configuration, a processor, a main storage unit, an auxiliary storage unit, a communication unit, and the like. That is, the endoscopic image processing device 60 has a so-called computer configuration as the hardware configuration. The processor is configured by, for example, a CPU, a graphics processing unit (GPU), a field-programmable gate array (FPGA), a programmable logic device (PLD), and the like. The processor of the endoscopic image processing device 60 is an example of the processor in the endoscope system and in the medical information processing apparatus according to the embodiment of the present invention. The processor of the endoscopic image generation device 40 and the processor of the endoscopic image processing device 60 may share the functions of the processor in the endoscope system and in the medical information processing apparatus according to the embodiment of the present invention. For example, a form can be adopted in which the endoscopic image generation device 40 mainly has a function of an “endoscope processor” that generates endoscopic images, and in which the endoscopic image processing device 60 mainly has a function of a “computer-aided diagnosis (CAD) box” that performs image processing on the endoscopic images. However, in the present invention, a form different from such sharing of functions may be adopted.
For example, the main storage unit is configured by a memory such as a RAM. The auxiliary storage unit is configured by, for example, a non-transitory and tangible recording medium (memory) such as a flash memory, and stores computer-readable codes of various programs (which may include the medical information processing program according to the embodiment of the present invention or a part thereof) executed by the processor, and various kinds of data necessary for control or the like. In addition, the auxiliary storage unit may include various magneto-optical recording devices, semiconductor memories, and the like in addition to or instead of the flash memory. For example, the communication unit is configured by a communication interface connectable to a network. The endoscopic image processing device 60 is communicably connected to the endoscope information management system 100 via the communication unit.
As illustrated in the figure, the endoscopic image processing device 60 mainly has functions of an endoscopic image acquisition unit 61, an input information acquisition unit 62, an image recognition processing unit 63, an audio input trigger acceptance unit 64, a display control unit 65, an examination information output control unit 66, and the like. These functions are realized by the processor executing the program (which may include the medical information processing program according to the embodiment of the present invention or a part thereof) stored in the auxiliary storage unit or the like.
The endoscopic image acquisition unit 61 acquires an endoscopic image from the endoscopic image generation device 40. Acquisition of images can be performed in real time. That is, a plurality of medical images obtained by the image sensor 25 (image sensor) imaging the subject in chronological order can be sequentially acquired (sequentially input) in real time.
The input information acquisition unit 62 (processor) acquires information input via the input device 50 and the endoscope 20. The input information acquisition unit 62 includes an information acquisition unit 62A that mainly acquires input information other than the audio information, an audio recognition unit 62B that acquires the audio information and that recognizes audio input via the microphone 51, and an audio recognition dictionary 62C used for audio recognition. The audio recognition dictionary 62C may include a plurality of dictionaries with different contents (for example, dictionaries regarding site information, findings information, treatment information, and hemostasis information).
Information input to the input information acquisition unit 62 via the input device 50 includes information (for example, audio information, an audio input trigger, and information on a candidate selection operation) input via the microphone 51, the foot switch 52, or a keyboard or mouse (not illustrated). In addition, the information input via the endoscope 20 includes information on an imaging start instruction for an endoscopic image (video), an imaging instruction for a static image, and the like. As described later, in the present embodiment, a user can input the audio input trigger, perform the selection operation of the audio recognition candidate, and the like via the microphone 51 and/or the foot switch 52. The input information acquisition unit 62 acquires operation information of the foot switch 52 via the endoscopic image generation device 40.
The image recognition processing unit 63 (processor) performs image recognition on the endoscopic image acquired by the endoscopic image acquisition unit 61. The image recognition processing unit 63 can perform image recognition in real time.
The lesion part detection unit 63A detects a lesion part (lesion; an example of a “specific subject”) such as a polyp from the endoscopic image. The processing of detecting the lesion part includes processing of detecting a part with a possibility of a lesion (benign tumor, dysplasia, or the like; lesion candidate region), processing of recognizing a region after the lesion is treated (treated region) and a part with features that may be directly or indirectly associated with a lesion (erythema or the like), and the like in addition to processing of detecting a part that is definitely a lesion part.
In a case where the lesion part detection unit 63A determines that “the lesion part (specific subject) is included in the endoscopic image”, the discrimination unit 63B performs discrimination processing on the lesion part detected by the lesion part detection unit 63A. In the present embodiment, the discrimination unit 63B performs neoplastic or non-neoplastic (hyperplastic) discrimination processing on the lesion part such as a polyp detected by the lesion part detection unit 63A. Note that the discrimination unit 63B can be configured to output a discrimination result in a case where predetermined criteria are satisfied. As the “predetermined criteria”, for example, a “case where a reliability degree (depending on conditions such as exposure, degree of focus, and blurring of an endoscopic image) of the discrimination result or a statistical value thereof (maximum, minimum, average, or the like within a predetermined period) is equal to or greater than a threshold value” can be adopted, but other criteria may be used.
The specific region detection unit 63C performs processing of detecting a specific region (landmark) in the hollow organ from the endoscopic image. For example, processing of detecting an ileocecum of the large intestine or the like is performed. The large intestine is an example of a hollow organ, and the ileocecum is an example of a specific region. For example, the specific region detection unit 63C may detect a hepatic flexure (right colon), a splenic flexure (left colon), a rectosigmoid, and the like. In addition, the specific region detection unit 63C may detect a plurality of specific regions.
The treatment tool detection unit 63D performs processing of detecting a treatment tool appearing in the image from the endoscopic image, and discriminating the type of the treatment tool. The treatment tool detection unit 63D can be configured to detect a plurality of types of treatment tools such as biopsy forceps and snares. Similarly, the hemostat detection unit 63E performs processing of detecting a hemostat such as a hemostatic clip and discriminating a type of the hemostat. The treatment tool detection unit 63D and the hemostat detection unit 63E may be configured by one image recognizer.
The measurement unit 63F performs measurements (measurements of shape, dimension, and the like) of a lesion, a lesion candidate region, a specific region, a treated region, and the like.
Each unit (the lesion part detection unit 63A, the discrimination unit 63B, the specific region detection unit 63C, the treatment tool detection unit 63D, the hemostat detection unit 63E, the measurement unit 63F, and the like) of the image recognition processing unit 63 can be configured using image recognizers (trained models) configured by machine learning. Specifically, each unit described above can be configured by image recognizers (trained models) trained using a machine learning algorithm such as a neural network (NN), a convolutional neural network (CNN), AdaBoost, and random forest. In addition, as described above regarding the discrimination unit 63B, each of these units can perform an output based on the reliability degree of a final output (discrimination results, type of treatment tool, and the like) by setting a network layer configuration as necessary. In addition, each unit described above may perform image recognition for all frames of the endoscopic image, or may perform image recognition for some frames intermittently.
As described below, the output of the recognition result of the endoscopic image from each of these units or the output of the recognition result satisfying the predetermined criteria (threshold value or the like of the reliability degree) may be used as the audio input trigger, and a period in which such an output is performed may be used as a period in which audio recognition is executed.
In addition, instead of configuring some or all of respective units constituting the image recognition processing unit 63 using the image recognizer (trained model), it is possible to adopt a configuration of calculating a feature amount from the endoscopic image and performing detection or the like using the calculated feature amount.
The audio input trigger acceptance unit 64 (processor) accepts an input of an audio input trigger while capturing (inputting) an endoscopic image, and sets the audio recognition dictionary 62C according to the input audio input trigger. The audio input trigger in the present embodiment is, for example, a determination result (detection result) indicating that a specific subject is included in the endoscopic image, and in this case, an output of the lesion part detection unit 63A can be used as the determination result. In addition, another example of the audio input trigger is an output of a discrimination result for the specific subject, and in this case, an output of the discrimination unit 63B can be used as the discrimination result. As still other examples of the audio input trigger, an imaging start instruction of a plurality of medical images, an input of a wake word for the microphone 51 (audio input device), an operation of the foot switch 52, an operation for another operation device (for example, colonofiberscope position determination device) connected to the endoscope system, and the like can be used. The settings of the audio recognition dictionary and the audio recognition according to these audio input triggers will be described in detail later.
The display control unit 65 (processor) controls display of the display device 70. In the following, main display control performed by the display control unit 65 will be described.
The display control unit 65 displays the image (endoscopic image) captured by the endoscope 20 on the display device 70 in real time during the examination (imaging).
In addition, the display control unit 65 can display, on the screen 70A, an icon 300 indicating a state of the audio recognition, an icon 320 indicating a site being imaged, and a display region 340 where a site of an imaging target (ascending colon, transverse colon, descending colon, or the like) and a result of the audio recognition are displayed in text in real time (without time delay). The display control unit 65 can acquire information on a site via image recognition from the endoscopic image, a user's input via the operation device, an external device (for example, endoscope position detecting unit) connected to the endoscope system 10, or the like.
In addition, as described below, the display control unit 65 can display (output) a result of the audio recognition on the display device 70 (output device, display device).
The examination information output control unit 66 outputs examination information to the recording device 75 and/or the endoscope information management system 100. For example, the examination information includes an endoscopic image captured during an examination, a determination result for a specific subject, a result of audio recognition, information on a site input during an examination, information on a treatment name input during an examination, information on a treatment tool detected during an examination, and the like. For example, the examination information is output for each lesion or each time a specimen is collected. In this case, respective pieces of information are output in association with each other. For example, the endoscopic image in which the lesion part or the like is imaged is output in association with the information on the site being selected. In addition, in a case where a treatment is performed, the information on the selected treatment name and the information on the detected treatment tool are output in association with the endoscopic image and the information on the site. In addition, the endoscopic image captured separately from the lesion part or the like is always output to the recording device 75 and/or the endoscope information management system 100. The endoscopic image is output with the information of imaging date and time added.
The recording device 75 (recording device) includes various magneto-optical recording devices or semiconductor memories, and control devices thereof, and can record endoscopic images (videos, static images), results of image recognition, results of audio recognition, examination information, report creation support information, and the like. These pieces of information may be recorded in a secondary storage unit of the endoscopic image generation device 40 or of the endoscopic image processing device 60, or in a recording device of the endoscope information management system 100.
Audio recognition in the endoscope system 10 configured as described above will be described below.
Note that the start of the audio recognition may be delayed for the settings of the audio recognition dictionary, but it is preferable that the audio recognition is started (zero delay time) immediately after the audio recognition dictionary is set.
The wake words (wakeup words) described above can be divided into two types. The two types are a “wake word regarding a report input” and a “wake word regarding imaging mode control”. The “wake word regarding a report input” is, for example, a “findings input” and a “treatment input”. After such a wake word is recognized, the audio recognition dictionary for “findings” and “treatment” is set, and in a case where a word in the dictionary is recognized, the result of the audio recognition is output. The result of the audio recognition can be associated with the image or used in a report. The association with the image and the use in the report are a form of an “output” of the result of the audio recognition, and the display device 70, the recording device 75, the storage unit of the medical information processing apparatus 80, or the recording device of the endoscope information management system 100 or the like is a form of an “output device”.
The other “wake word regarding imaging mode control” is, for example, “imaging setting” and “setting”. After such a wake word is recognized, it is possible to set a dictionary used to turn on/off or switch a light source with audio (for example, by audio recognition of words such as “white”, “LCI”, and “BLI”), or to turn on/off (for example, by audio recognition of words such as “detection on” and “detection off”) the lesion detection using an endoscope AI (recognizer using artificial intelligence). Note that the “output” and the “output device” are the same as described above for the “wake word regarding a report input”.
In the endoscope system 10, image recognition (as a whole, a plurality of times of image recognition) corresponding to a plurality of types of “specific subjects” (specifically, the lesion, the treatment tool, the hemostat, and the like described above) as the determination (recognition) target can be performed by each unit of the image recognition processing unit 63, and the audio recognition unit 62B can set the audio recognition dictionary corresponding to the type of the “specific subject” determined to be “included in the endoscopic image” by any image recognition by each unit.
In addition, in the endoscope system 10, whether or not a plurality of “specific subjects” are included in the endoscopic image is determined by each unit, and the audio recognition unit 62B can set the audio recognition dictionary corresponding to the specific subject determined to be “included in the endoscopic image” among the plurality of “specific subjects”. As a case where a plurality of “specific subjects” are included in the endoscopic image, for example, a case where a plurality of lesion parts are included, a case where a plurality of treatment tools are included, a case where a plurality of hemostats are included, and the like are considered.
Note that, for some image recognition among a plurality of times of image recognition by the respective units, the audio recognition dictionary may be set according to the type of the “specific subject”.
The audio recognition unit 62B performs audio recognition on the audio input to the microphone 51 (audio input device) after the audio recognition dictionary is set, using the set audio recognition dictionary (illustration is omitted in
In the present embodiment, the audio recognition unit 62B can perform audio recognition for site information, findings information, treatment information, and hemostasis information. Note that, in a case where there are a plurality of lesions or the like, a series of processing (acceptance of audio input triggers, setting audio recognition dictionaries, and audio recognition in a cycle from imaging start to hemostasis) can be repeated for each lesion or the like.
In the endoscope system 10, in the audio recognition, the audio recognition unit 62B and the display control unit 65 (processor) can recognize only the registered words that are registered in the set audio recognition dictionary, and display (output) the result of the audio recognition for the registered word on the display device 70 (output device, display device) (adaptive audio recognition). According to this form, only the registered words registered in the set audio recognition dictionary are audio-recognized, so that the recognition accuracy can be improved. Note that, in such adaptive audio recognition, the registered words of the audio recognition dictionary may be set so that the wake word is not recognized, or the registered words may be set to include the wake word.
In addition, in the endoscope system 10, in the audio recognition, the audio recognition unit 62B and the display control unit 65 (processor) can recognize the registered words that are registered in the set audio recognition dictionary and specific words, and display (output) the result of the audio recognition for the registered word among the recognized words on the display device 70 (output device, display device) (non-adaptive audio recognition). Note that an example of the “specific word” is a wake word for the audio input device, but the “specific word” is not limited thereto.
Note that, in the endoscope system 10, which of the above forms (adaptive audio recognition, non-adaptive audio recognition) is used to perform audio recognition and to display the result can be set on the basis of an instruction input from the user via the input device 50, the operation part 22, or the like.
Note that, in the endoscope system 10, it is preferable that the display control unit 65 (processor) notifies the user that the audio recognition dictionary is set (the fact that the audio recognition dictionary is set and which dictionary is set) and that the audio recognition is possible. As illustrated in
Specifically,
Through such notification, the user can easily ascertain that a specific image recognizer is operating and that a period in which audio recognition is possible is reached. Note that the display control unit 65 may display and switch the icon according to not only the operation situation of each unit of the image recognition processing unit 63 but also an operation situation and an input situation of the microphone 51 and/or the foot switch 52.
The audio recognition unit 62B (processor) can execute the audio recognition using the set audio recognition dictionary during a specific period after the setting (period in which predetermined conditions are satisfied). “Predetermined conditions” may be the output of the recognition results from the image recognizer, may be conditions regarding the output contents, or may specify an execution time itself of the audio recognition (three seconds, five seconds, or the like). In a case of specifying the execution time, it is possible to specify an elapsed time from the dictionary setting, or an elapsed time after the user is notified that the audio input is possible.
In this manner, by executing the audio recognition during a specific period, it is possible to reduce the risk of unnecessary recognition or erroneous recognition, and to perform the examination smoothly.
Note that the audio recognition unit 62B may set the period of the audio recognition for each image recognizer, or may set the period of the audio recognition depending on the type of the audio input trigger. In addition, the audio recognition unit 62B may set “predetermined conditions” and “execution time of the audio recognition” on the basis of an instruction input from the user via the input device 50, the operation part 22, or the like.
In addition, the (b) part of
In a case where the audio recognition based on the manual operation is prioritized in this manner, the period of the audio recognition based on the image recognition may be continuous with the period of the audio recognition associated with the manual operation. For example, in the example illustrated in the (b) part of
The audio recognition unit 62B and the display control unit 65 may display the remaining time of the audio recognition period on the screen of the display device 70. That is, the audio recognition unit 62B and the display control unit 65 may perform the audio recognition during a predetermined period after the audio recognition dictionary is set.
Note that the audio recognition unit 62B and the display control unit 65 may set different periods depending on the audio input trigger and the audio recognition dictionary, as the period in which the audio recognition is performed. In addition, the period may be set according to the user's operation via the input device 50.
Note that the audio recognition unit 62B and the display control unit 65 may output the remaining time using numbers or audio. Note that the remaining time is zero in a case where the screen display of the microphone icon 300 (refer to
The audio recognition unit 62B and the display control unit 65 may display the candidates for the audio recognition on the screen, and may allow the user to select the candidate. In addition, the audio recognition result may be displayed on the screen of the display device 70.
In the present invention, a display mode of the audio recognition result is not limited to the mode illustrated in the example of
The audio recognition unit 62B and the display control unit 65 may set a display position of the selection result or the confirmation result of the audio recognition according to the audio recognition result, the type of the recognized subject, or the like. For example, the audio recognition unit 62B and the display control unit 65 can display the audio recognition result for “findings” near the region of interest of the video (for example, the region of interest ROI of
In the audio recognition described above, the audio recognition unit 62B may switch the audio recognition dictionary 62C according to the quality of the image recognition executed by the image recognition processing unit 63 (refer to
In a case where the lesion candidate (specific subject) is included in the endoscopic image, the period in which the discrimination unit 63B outputs the discrimination result is the audio recognition period (same as in the (a) part of
In this case, as illustrated in the (b) part of
From time point t3 to time point t4 (discrimination mode: the discrimination unit 63B outputs the result), the audio recognition unit 62B performs the audio recognition using the audio recognition dictionary “findings set” as usual.
In addition, from time point t4 to time point t9, since the mode is the detection mode, the audio recognition unit 62B does not normally perform the audio recognition, and from time point t5 to time point t8, since the treatment tool is detected, the audio recognition unit 62B sets a “treatment set” as the audio recognition dictionary 62C, and performs the audio recognition. However, from time point t6 to time point t7, it is assumed that the observation quality is poor. The audio recognition unit 62B can accept a command for an image quality improvement operation during this period (time point t6 to time point t7) similar to time point t1 to time point t2.
In this manner, in the endoscope system 10, it is possible to flexibly set the audio recognition dictionary according to the observation quality and to perform appropriate audio recognition.
In a case where the audio recognition is performed, the examination information output control unit 66 (processor) can associate the endoscopic images (medical images in chronological order) with the results of the audio recognition, and record the endoscopic images and the results in the recording device such as the recording device 75, the storage unit of the medical information processing apparatus 80, and the endoscope information management system 100. The examination information output control unit 66 may associate the endoscopic image in which a specific subject is shown and the determination result (that the specific subject is shown in the image) of the image recognition, and record the endoscopic image and the determination result. The examination information output control unit 66 may perform recording according to the user's operation on the operation device, or may perform recording automatically without depending on the user's operation. With such recording, in the endoscope system 10, it is possible to support the user in creating an examination report.
In the embodiments described above, a case has been described in which the present invention is applied to the endoscope system for a lower digestive tract, but the present invention can also be applied to an endoscope for an upper digestive tract.
The embodiments of the present invention have been described above, but the invention is not limited to the above-described aspects and can have various modifications without departing from the scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2021-146308 | Sep 2021 | JP | national |
The present application is a Continuation of PCT International Application No. PCT/JP2022/033260 filed on Sep. 5, 2022 claiming priority under 35 U.S.C § 119(a) to Japanese Patent Application No. 2021-146308 filed on Sep. 8, 2021. Each of the above applications is hereby expressly incorporated by reference, in its entirety, into the present application.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2022/033260 | Sep 2022 | WO |
Child | 18582650 | US |