The technology disclosed in the present specification (hereinafter, “the present disclosure”) relates to a medical support system, a medical support method, and a computer program that support medical practice by a doctor.
With the evolution of deep learning, high-precision inference beyond humans has been realized. Deep learning is also very promising in the medical field. For example, there has been proposed a medical information processing device including: an acquisition unit that acquires medical information; a learning unit that performs learning for a function in the medical information processing device using the medical information; an evaluation data holding unit that holds evaluation data for evaluating a learning result by the learning unit, the evaluation data having a known correct answer by execution of the function; an evaluation unit that evaluates the learning result acquired by learning on the basis of the evaluation data; and a reception unit that receives an instruction for applying the learning result of the learning unit to the function, in which the user determines whether or not the learning result is applied on the basis of a verification result of validity of learning (see Patent Document 1).
An object of the present disclosure is to provide a medical support system, a medical support method, and a computer program that control an operation of an arm that supports a medical instrument such as an endoscope, for example, on the basis of an estimation result by deep learning.
A first aspect of the present disclosure is a medical support system including:
The control unit further includes a calculation unit that calculates reliability regarding an estimation result of the machine learning model, and outputs the reliability to the information presentation unit. The calculation unit calculates the reliability using Bayesian deep learning.
The machine learning model estimates a target command for an arm supporting a medical instrument. Then, the control unit outputs determination basis information regarding the target command estimated by the machine learning model to an information presentation unit. The control unit outputs information regarding a gaze region observed at the time of estimating the target command and/or a recognized target portion. The control unit outputs a heat map image indicating a gaze region observed at the time of estimating the target command and/or a recognized target portion. The control unit outputs the heat map image generated on the basis of the Grad-Cam algorithm.
Furthermore, a second aspect of the present disclosure is a medical support method in a medical support system, the medical support method including:
Furthermore, a third aspect of the present disclosure is a computer program described in a computer readable format to execute processing of medical support in a medical support system on a computer, the computer program causing the computer to function as:
The computer program according to the third aspect of the present disclosure defines a computer program described in a computer readable format so as to implement predetermined processing on a computer. In other words, by installing the computer program according to the third aspect of the present disclosure in a computer, a cooperative action is exerted on the computer, and it is possible to obtain a similar action and effect to those of the medical support system according to the first aspect of the present disclosure.
According to the present disclosure, for example, it is possible to provide an operation of an arm that supports a medical instrument that presents determination basis information and reliability of a machine learning model that estimates an operation of the arm that supports a medical instrument such as an endoscope.
Note that the effects described in the present specification are merely examples, and the effects brought by the present disclosure are not limited thereto. Furthermore, the present disclosure may further provide additional effects in addition to the effects described above.
Still other objects, features, and advantages of the present disclosure will become apparent from a more detailed description based on embodiments to be described later and the accompanying drawings.
Hereinafter, the technology according to the present disclosure will be described in the following order with reference to the drawings.
A. Overview
B. Configuration of Endoscopic Surgery System
C. Control System of Medical Robot Device
D. Control System Using Neural Network Model
E. Control System for Presenting Determination
Basis
F. Reflection of Determination of Doctor
G. Relearning of Learner
H. Autonomous Learning of Learner
A. Overview
For example, in endoscopic surgery, adjustment of a position, an angle of view, and the like of an endoscope influences a surgery performance, but a control technique is not constant for each operator (scopist). By introducing operation control of a robot based on inference by deep learning into a medical robot device that supports an endoscope, it is expected to reduce costs such as labor costs of an operator and to improve accuracy and safety of a control technology of the endoscope. Meanwhile, when deep learning is used, it is necessary to clarify the validity of the determination basis. This is because it is possible to confirm validity of the determination and no difference from the determination of the doctor by clarifying the basis of the determination of the medical robot device.
In recent years, researches on artificial intelligence that can be explained have become active, and algorithms that visualize the basis of determination and algorithms that reveal data uncertainty have been proposed. According to the present disclosure, when a medical robot device is controlled on the basis of inference by deep learning, not only the result of the inference by the deep learning but also the basis of the determination is presented. Furthermore, according to the present disclosure, uncertainty of control of a medical robot device based on inference by deep learning is also presented. It is possible to clarify whether determination by deep learning cannot be made due to noise.
Regarding the former case of the clarification of the determination basis, an image region regarding the determination basis can be displayed on a captured image of an endoscope in a heat map by Grad-Cam, which is one of explainable AI (XAI) algorithms, for example. The deep learned neural network model calculates target command value related information to the medical robot device using an input image (or a captured image of the endoscope) to the medical robot device, motion information (including the self-position of the camera) of the medical robot device, operation information, and sensor information as input data. The Grad-Cam can explicitly indicate which portion of the input image the neural network model has focused on to output the target command-related information by, for example, a method of displaying a heat map of an image region that is a basis. Here, the determination basis includes, for example, analysis information necessary for improving a data set or a machine learning model used for learning the machine learning model and debugging performance. Furthermore, the determination basis may be a prediction, a score, or the like indicating how much a certain factor influences the final result. The determination basis is only required to be, for example, information for analyzing a cause for a certain result.
The latter case of the uncertainty in deep learning can be mainly divided into uncertainty due to noise and uncertainty due to lack of data. For example, Bayesian deep learning can evaluate the uncertainty of the estimation result by the neural network, that is, how correctly the medical robot device is likely to move and process on the basis of the target command-related information output by the neural network model.
Therefore, by presenting the determination basis of the medical robot device and its uncertainty or reliability, the doctor can perform the endoscopic surgery while confirming whether the medical robot device is controlled without a difference from his/her own determination.
B. Configuration of Endoscopic Surgery System
The medical robot device 120 is basically a multilink structure in which a plurality of links is connected by joint shafts. The medical robot device 120 supports the endoscope 110 at the distal end portion. The medical robot device 120 has a freedom degree structure capable of controlling the posture of the endoscope 110, for example, at four degrees of freedom or more, securing a sufficient operation range of the endoscope during surgery, coping with various manual works, and avoiding interference with the surgeon 101.
Examples of the device mounted on the cart 140 include a camera control unit (CCU) 141, a light source device 142, a robot arm control device 143, an input device 144, a treatment tool control device 145, an air film device 146, a recorder 147, a printer 148, a display device 149, and the like. However, the types of devices mounted on the cart 140 can be appropriately changed according to the types of medical instruments used for endoscopic surgery.
In endoscopic surgery, instead of cutting and opening the abdominal wall of the patient 103, a lens barrel of the endoscope 110 and other medical instruments 131, 132, . . . are inserted into a body cavity of the patient 103 via a plurality of trocars 151, 152, . . . punctured into the abdominal wall. The medical instruments 131, 132, . . . are, for example, forceps, pneumoperitoneum tubes, energy treatment tools, tweezers, retractors, and the like, but are illustrated in a simplified manner in
An image of the surgical site in the body cavity of the patient 103 captured by the endoscope 110 is displayed on the display device 149. The surgeon 101 performs treatment such as resection of the surgical site using the medical instruments 131, 132, . . . while viewing the image of the surgical site displayed on the display device 149 in real time. Furthermore, some of the medical instruments 131, 132, . . . (for example, forceps, a pneumoperitoneum tube, and an energy treatment tool) may be supported by an assistant (not illustrated) instead of the surgeon.
The endoscope 110 includes a lens barrel 111 inserted into the body cavity of the patient 103 at the distal end, and a camera head 112 connected to the proximal end of the lens barrel 111. The lens barrel 111 is assumed to be a rigid mirror including a lens barrel having a configuration, but may be a flexible mirror including a flexible lens barrel. An optical system and an imaging element (both not illustrated) are disposed in the camera head 112. Reflected light (observation light) from an observation target such as a surgical site is imaged on the imaging element by the optical system. The imaging element photoelectrically converts the observation light, generates an image signal corresponding to the observation image, and transmits the image signal to the CCU 141. Note that the camera head 112 has a function of driving the optical system to adjust magnification and a focal length. Furthermore, a plurality of imaging elements may be disposed in the camera head 112 for stereoscopic viewing (3D display). In this case, a plurality of relay optical systems for guiding the observation light to the plurality of imaging elements is disposed inside the lens barrel 111.
The CCU 141 includes a central processing unit (CPU), a graphics processing unit (GPU), and the like, performs control of the camera head 112 of the endoscope 110, processing of a captured image of the abdominal cavity captured by the endoscope 110, and signal processing on a pixel signal acquired by controlling the camera head 112, and controls screen display of the captured image of the endoscope 110 by the display device 149. Specifically, the CCU 141 performs development processing such as demosaic on the image signal received from the camera head 112 and various image processing for displaying an image, and outputs the image signal to the display device 141. In the present embodiment, the CCU 141 includes an image recognizer including a neural network model learned by deep learning, recognizes an object, an environment, and the like in a field of view of the endoscope 110 from a captured image subjected to image processing, and outputs recognition information. Furthermore, the CCU 141 transmits a control signal related to adjustment of the magnification and the focal length and an imaging condition to the camera head 112.
The display device 149 displays a captured image of the endoscope 110 on which image processing has been performed by the CCU 141. It is preferable to use the display device 149 having an appropriate resolution or screen size according to the application. For example, in a case where the endoscope 110 supports high-resolution imaging such as 4K (the number of horizontal pixels 3840×the number of vertical pixels 2160) or 8K (the number of horizontal pixels 7680×the number of vertical pixels 4320), or supports 3D display, it is preferable to use a display device capable of high-resolution or 3D display as the display device 149. By using, for example, a display device having a screen size of 55 inches or more as the display device 149 compatible with high resolution of 4K or 8K, it is possible to give a further immersive feeling to an observer such as the surgeon 101.
The light source device 142 includes, for example, a light source such as a light emitting diode (LED) or a laser, and supplies illumination light to the endoscope 110 when imaging a surgical site.
The robot arm control device 143 includes, for example, a processor such as a CPU, a local memory thereof, and the like, and controls operations of the camera head 112 and the medical robot device 120. The robot arm control device 143 controls driving of the robot arm of the medical robot device 120 according to a predetermined control method such as position control and force control, for example. The medical robot device 120 has a multilink structure in which a plurality of links is connected by joint shafts and the endoscope 110 is mounted on a distal end portion, and at least some of the joint shafts is an active shaft driven by an actuator. The robot arm control device 143 supplies a drive signal to each joint driving actuator.
In the present embodiment, the robot arm control device 143 includes a motion predictor including a neural network model learned by deep learning, and predicts and outputs a target command value for the camera head 112 and the medical robot device 120 to be controlled on the basis of the recognition information recognized by the image recognizer (described above). The target command value is a value indicating a control amount with respect to a control target, and specifically includes information regarding camerawork of the endoscope 110 (camera target position, posture, speed, acceleration, gaze point, and line-of-sight vector (target object position, distance, vector posture)), a predicted image captured by the endoscope 110 (including an electronic cut-out position of the captured image), and a predicted operation of the robot arm of the medical robot device 120 (target position, posture, speed, acceleration, operation force, and the like, of the instruments supported by the robot arm). Note that, instead of individually arranging the neural network model of the image recognizer in the CCU 141 and the neural network model of the motion predictor in the robot arm control device 143, image recognition and motion prediction may be integrated and configured as an End to End (E2E) neural network model. Furthermore, in the present disclosure, a determination basis, uncertainty, or reliability by the neural network model is presented, but details will be described later.
The input device 144 is an input interface for the endoscopic surgery system 100. A user (for example, a surgeon, a doctor, an assistant, and the like) can input various types of information and instructions to the endoscope system 100 via the input device 144. For example, the user inputs various types of information regarding surgery, such as physical information of a patient and information related to surgery, via the input device 144. Furthermore, the user (for example, a surgeon, a doctor, an assistant, and the like) inputs an instruction to drive the medical robot device 120, setting of imaging conditions (type of irradiation light, magnification, focal length, and the like) by the endoscope 110, a drive instruction of the energy treatment tool, and the like via the input device 144. In the present disclosure, the determination basis, uncertainty, or reliability of the neural network model for estimating the operation of the robot arm of the medical robot device 120 is presented, but the user can input an instruction to the medical robot device 120 via the input device 144 according to the presented content.
The type of the input device 144 is not limited. The input device 144 may be, for example, a mouse, a keyboard, a touch panel, a switch, a lever (none of which are illustrated), a foot switch 144a, or the like. The touch panel is superimposed on a screen of the display device 149, for example, and the user can perform an input operation on a captured image of the endoscope 110 displayed on the screen, for example. Furthermore, as the input device 144, a head mounted display or various types of wearable devices may be used to input information according to the user's line-of-sight or gesture. Furthermore, the input device 144 may include a master device of the medical robot device 120. Furthermore, the input device 144 may include a microphone that collects the voice of the user, and may input a voice command from the user. By using a device capable of inputting information in a non-contact manner for the input device 144, a user in a clean area in an operating room can operate a device arranged in an unclean area in a non-contact manner, and the user can input information without releasing his/her hand from the medical instruments 131, 132, . . . .
A treatment tool control device 145 controls driving of an energy treatment tool for cauterization and incision of tissue, sealing of a blood vessel, and the like. For the purpose of securing a field of view by the endoscope 110 and securing a working space of the surgeon, the air film device 146 sends gas into the body cavity of the patient 103 through the undulating tube to inflate the body cavity. The recorder 147 includes, for example, a large-capacity recording device such as a solid state drive (SSD), a hard disc drive (HDD), and the like, and is used for recording various types of information regarding surgery. The printer 148 is a device that prints data such as characters, images, and figures on paper, and is used to print information regarding surgery. The treatment tool control device 145 and the pneumoperitoneum device 146 operate, for example, on the basis of an instruction from the surgeon 101 or an assistant via the input device 144, but may operate on the basis of a control signal from the robot arm control device 143.
C. Control System of Medical Robot Device
The CCU 141 performs image processing on the image signal transmitted from the camera head 112. The image processing includes, for example, signal processing such as development processing, high image quality processing (band coordination processing, super-resolution processing, noise reduction (NR) processing, camera shake correction processing, and the like), and enlargement processing (electronic zoom processing). Furthermore, an image processing unit 212 performs detection processing on an image signal for performing auto exposure but i want tAE), auto focus (AF), and auto white balance (AWB).
The CCU 141 includes, for example, a processor such as a CPU or a GPU, a local memory thereof, and the like, and executes the above-described image processing and detection processing by the processor executing a predetermined program loaded in the local memory. Furthermore, in a case where the image processing unit 212 includes a plurality of GPUs, information regarding an image signal may be appropriately divided, and image processing may be performed in parallel by the plurality of GPUs.
Furthermore, the CCU 141 receives a captured image of a surgical site by the endoscope 110 from the camera head 112, receives motion information of the robot arm and sensor information of the robot arm from the medical robot device 120, recognizes an image of a medical instrument included in the captured image of the endoscope 110 or an environment in a field of view of the endoscope 110, and outputs instrument recognition information and environment recognition information. The instrument recognition information is a type of the medical instrument (for example, forceps, pneumoperitoneum tube, energy treatment tool, tweezers, retractor, and the like) recognized in the field of view of the endoscope 110, a position and a posture of each instrument, an operation state (for example, in the case of forceps, an open/closed state is obtained, and in the case of an energy treatment tool, an energy output state is obtained), and the like. Furthermore, the environment recognition information is information indicating the environment of the operative field, such as depth information, environment map information, arrangement information of organs and instruments in a space, a material of each object (such as an organ or a metal), and the like. Note that the CCU 141 does not necessarily need to output two types of recognition information of the instrument recognition information and the environment recognition information as the image recognition result, and may output three or more types of recognition information separately, or may output all the recognition results collectively as one piece of recognition information.
The robot arm control device 143 supplies the target control command-related information to the CCU 141 and the medical robot device 120. Note that, in the present specification, for example, a plurality of types of target command values such as a camera target position, a posture, a speed, an acceleration, a gaze point, a line-of-sight vector (target object position, distance, vector posture) of the endoscope, an electronic cut-out position of a captured image, a distance, and a joint angle and a joint angular velocity of each joint of a robot arm supporting the endoscope is collectively referred to as target command-related information. The robot arm control device 143 calculates target command-related information including target command values such as a joint angle and a joint angular velocity of each joint of the robot arm on the basis of the instrument recognition information and the environment recognition information obtained by the CCU 141 performing image recognition on the captured image of the endoscope 110, and outputs a control signal to the medical robot device 120. Furthermore, the robot arm control device 143 calculates target command-related information including target command values such as magnification and focus of the captured image on the basis of the instrument recognition information and the environment recognition information, generates a control signal for controlling the drive of the camera head 112, and outputs the control signal to the CCU 141. In a case where a user (for example, a surgeon, a doctor, an assistant, and the like) inputs an imaging condition via the input device 144, the robot arm control device 143 generates a control signal to the medical robot device 120 or the camera head 112 on the basis of the user input. Furthermore, in a case where the endoscope 110 is equipped with an AE function, an AF function, and an AWB function, the robot arm control device 143 calculates the optimum exposure value, focal length, and white balance on the basis of the result of the detection processing by the CCU 141, and outputs control signals for AE, AF, and AWB for the camera head 112 to the CCU 141.
The medical robot device 120 operates the robot arm on the basis of a control signal from the robot arm control device 143, and outputs motion information of the robot arm and sensor information detected by a sensor mounted on the medical robot device 120 to the robot arm control device 143. Furthermore, the camera head 112 receives a control signal from the robot arm control device 143 via the CCU 141, and outputs a captured image of the surgical site captured by the endoscope 110 to the CCU 141.
The CCU 141 causes the display device 149 to display the captured image of the endoscope 110. Furthermore, the control unit 213 may generate surgery support information based on the image recognition result as described above, and display the surgery support information in a superimposed manner when displaying the image of the surgical site captured by the endoscope 110 on the display device 149. The surgeon 101 can proceed with the surgery more safely and reliably on the basis of the surgery support information presented together with the image of the surgical site. According to the present disclosure, a determination basis at the time of automatically operating the medical robot device 120 and information regarding uncertainty or reliability of data used for the automatic operation are presented as surgery support information, and this point will be described later in detail.
The lens unit 301 is an optical system disposed at a connection portion with the lens barrel 111. Observation light taken in from the distal end of the lens barrel 111 is guided to the camera head 112 and enters the lens unit 301. The lens unit 301 is configured by combining a plurality of optical lenses including a zoom lens and a focus lens. Optical characteristics of the lens unit 301 are adjusted such that incident light forms an image on a light receiving surface of an imaging element of the imaging unit 302. Furthermore, the zoom lens and the focus lens are movable in position on the optical axis in order to adjust the magnification and focus of the captured image.
The imaging unit 302 includes a light receiving element and is arranged at a subsequent stage of the lens unit 301. The light receiving element may be, for example, an imaging element such as a complementary metal oxide semiconductor (CMOS), a time of flight (ToF) sensor, or the like. The imaging unit 302 may be disposed immediately after the objective lens inside the lens barrel 111 instead of inside the camera head 112. The imaging unit 302 photoelectrically converts the observation light formed on the light receiving surface of the light receiving element by the lens unit 301, generates a pixel signal corresponding to the observation image, and outputs the pixel signal to the communication unit 303.
The light receiving element may be, for example, an imaging element having the number of pixels corresponding to a resolution of 4K (the number of horizontal pixels 3840×the number of vertical pixels 2160), 8K (the number of horizontal pixels 7680×the number of vertical pixels 4320), or a square 4K (the number of horizontal pixels 3840 or more×the number of vertical pixels 3840 or more). When a high-resolution image of the surgical site is obtained using such an imaging element capable of coping with imaging of a high-resolution image, the surgeon 101 can grasp the state of the surgical site with a high-definition image on the screen of the display device 149, and the surgery can proceed more smoothly. Furthermore, the imaging unit 302 may include a pair of imaging elements so as to support 3D display. By performing the 3D display, the surgeon 101 can more accurately grasp the depth of the living tissue in the surgical site, and can more smoothly progress the surgery.
The drive unit 303 includes an actuator, and moves the zoom lens and the focus lens of the lens unit 301 by a predetermined distance in the optical axis direction under the control of the camera head control unit 305 to adjust the magnification and focus of the image captured by the imaging unit 302.
The communication unit 304 includes a communication device that transmits and receives various types of information to and from the CCU 141 and the robot arm control device 143. The communication unit 304 is used to transmit the image signal obtained from the imaging unit 302 to the CCU 141 via a transmission cable 311. In order for the surgeon 101 to perform surgery more safely and reliably while observing the captured image of the surgical site, it is necessary to display a moving image of the surgical site in real time. Therefore, in order to transmit the image signal obtained from the imaging unit 302 with low latency, the communication unit 304 preferably performs optical communication. In the case of performing optical communication, the communication unit 304 includes an electro-optical conversion module, converts an electric signal into an optical signal, and transmits the optical signal to the CCU 141 via a transmission cable (optical fiber) 311.
Furthermore, the communication unit 304 receives a control signal for controlling driving of the camera head 112 from the robot arm control device 143 side via the transmission cable 312, and supplies the control signal to the camera head control unit 305. The control signal includes information regarding the frame rate of the captured image, information regarding the exposure at the time of imaging, and information regarding the magnification and focus of the captured image. The endoscope 110 may be equipped with AE, AF, and AWB functions. In this case, imaging conditions such as a frame rate, an exposure value, a magnification, and a focus may also be automatically set by the robot arm control device 143 via the CCU 141.
The camera head control unit 305 controls driving of the camera head 112 on the basis of a control signal received from the CCU 141 via the communication unit 304. For example, the camera head control unit 305 controls driving of the imaging element of the imaging unit 302 on the basis of a control signal that specifies a frame rate or an exposure value. Furthermore, the camera head control unit 305 adjusts the positions of the zoom lens and the focus lens of the lens unit 301 in the optical axis direction via the drive unit 303 on the basis of a control signal for designating the magnification and the focus of the captured image. Furthermore, the camera head control unit 305 may have a function of storing information for identifying the lens barrel 111 and the camera head 112.
The transmission cable 311 connecting the camera head 112 and the CCU 141 may be an electric signal cable compatible with electric signal communication, an optical fiber compatible with optical communication, or a composite cable thereof. Alternatively, the camera head 112 and the CCU 141 may be wirelessly connected instead of a wired cable. In a case where the camera head 112 and the CCU 141 are connected by wireless communication, it is not necessary to lay the transmission cable 220 in the operating room, and the movement of the medical staff such as the surgeon 101 and an assistant is not hindered by the transmission cable 311.
Note that by disposing a part of the camera head 112 such as the lens unit 301 and the imaging unit 302 in a sealed structure having high confidentiality and waterproofness, resistance to autoclave sterilization can be imparted.
The medical robot device 120 is, for example, a robot arm including a multilink structure having six or more degrees of freedom. The robot arm has a structure in which the endoscope 110 is supported at a distal end portion. For example, the distal end portion of the robot arm may have a structure in which orthogonal rotation axes with three degrees of freedom for determining the posture of the endoscope 110 are intensively arranged. In
The active joint unit 410 includes an actuator 411 such as a rotary motor that drives a joint, a torque sensor 412 that detects torque acting on the joint, and an encoder 413 that measures a rotation angle of a joint. Furthermore, the passive joint unit 420 includes an encoder 421 that measures a joint angle. The sensor unit 430 includes various sensors disposed outside the joint unit, such as an inertial measurement unit (IMU) and a contact sensor that detects a contact force acting on a medical instrument attached to the distal end of the robot arm.
The robot arm control device 143 generates a target operation of the medical robot device 120 on the basis of the recognition information output from the CCU 141 and a user's instruction input via the input device 144, and controls driving of the medical robot device 120 according to a predetermined control method such as position control and force control. Specifically, the robot arm control device 143 calculates a control amount of the actuator 411 of the active joint unit 410 according to a predetermined control method and supplies a drive signal. The robot arm control device 143 includes, for example, a processor such as a CPU, a local memory thereof, and the like, and executes a predetermined program loaded in the local memory by the processor.
Between the robot arm control device 143 and the medical robot device 120 may be an electric signal cable compatible with electric signal communication, an optical fiber compatible with optical communication, or a composite cable thereof. Furthermore, it may be included in the above-described transmission cable 311.
Furthermore, the robot arm control device 143 and the medical robot device 120 may be connected wirelessly instead of a wired cable.
D. Control System Using Neural Network Model
In
Here, the motion information of the robot arm input to the image recognizer 501 includes information on the position, speed, and acceleration of the medical instrument such as the endoscope 110 supported at the distal end by the robot arm, and the posture of each joint of the robot arm (joint angle measured by the encoder installed on the rotation axis of the joint). Furthermore, the sensor information of the robot arm input to the image recognizer includes information such as acceleration measured by the IMU mounted on the medical robot device 120, torque information acting on each joint, and information such as an external force acting on the medical instrument supported at the distal end of the robot arm.
Then, the image recognizer 501 performs image recognition of the medical instrument included in the captured image of the endoscope 110 and the environment in the field of view of the endoscope 110, and outputs the instrument recognition information and the environment recognition information. The image recognizer 501 recognizes the type of the medical instrument (for example, forceps, pneumoperitoneum tube, energy treatment tool, tweezers, retractor, and the like), the position and posture of each instrument, and the operation state (for example, in the case of forceps, an open/closed state is obtained, and in the case of an energy treatment tool, an energy output state is obtained) recognized in the field of view of the endoscope 110 as the instrument recognition information. Furthermore, the image recognizer 501 recognizes, as the environment recognition information, depth information of an organ or a medical instrument included in the captured image in the field of view of the endoscope 110 (including a shape of the organ or the instrument), an environment map in a surgical site (for example, environment map creation using simultaneous localization and mapping (SLAM) technology), a type of the organ, a type of the medical instrument, a material of each object included in the captured image, and the like. Furthermore, the image recognizer 501 recognizes, for example, each object such as an organ or a medical instrument included in the image of the surgical site and a material thereof, depth information of each object, and an environment map as the environment recognition information. Note that the image recognizer 501 does not necessarily need to output two types of recognition information of the instrument recognition information and the environment recognition information as the image recognition result, and may output three or more types of recognition information separately, or may output all the recognition results collectively as one piece of recognition information.
Furthermore, in
Furthermore, the image recognizer 501 and the motion predictor 502 may not be configured by individual neural network models as illustrated in
As illustrated in
E. Control System for Presenting Determination Basis
As described in section D described above, when deep learning is used for operation control of the medical robot device 120, it is necessary to clarify the validity of the determination basis. In general, deep learning is compared to a black box because it is difficult to understand the basis of determination. Therefore, in the present disclosure, the basis of the determination in the operation prediction of the medical robot device 120 using deep learning is made clear. Therefore, the doctor can smoothly perform the surgery while confirming that the operation of the medical robot device 120 is not different from his/her own determination and the determination basis of the operation prediction.
E-1. Control System (1) for Presenting Determination Basis
The attention information presentation unit 701 presents information of which attention is paid when the motion predictor 502 determines the target command-related information in the image input to the control system 700. Specifically, the image input to the control system 700 is an operative field image captured by the endoscope. The attention information presentation unit 701 uses an algorithm such as gradient-weighted class activation mapping (Grad-Cam) that visualizes a determination basis in the image classification problem or local interpretable model-agnostic explainations (LIME) that interprets a machine learning model to present information of which attention is paid when the image recognizer 501 estimates the instrument recognition information or the environment recognition information.
Grad-Cam is known as a technology for visualizing information that is a basis of determination of a deep-learned neural network model and displaying the information on a heat map. The attention information presentation unit 701 to which Grad-Cam is applied displays a heat map indicating which part of the image, which target object in the image, or the like is focused on to output or estimate the target command-related information from the input image, the motion information of the robot, and the sensor information of the robot.
The operation principle of Grad-Cam is to visualize information serving as a determination basis by displaying a place having a large input gradient with respect to the final layer of a target convolutional neural network (CNN) (in this case, the motion predictor 502). The flow of processing of Grad-Cam includes reading of input data, reading of a model, a prediction class of an input image, loss calculation of the prediction class, calculation of back propagation to a final convolution layer, and weight calculation of each channel in the final convolution layer (calculation of gradient for each channel by global average pooling that is average processing). Here, assuming that the gradient yc of the class c is the feature map activation Ak, the weight αck of the importance level of the neuron is given as in the following formula (1).
The weight for each channel is added to the forward propagation output of the final convolution layer, and Grad-Cam is calculated as in the following formula (2) through the activation function ReLU.
[Mathematical Formula 2]
L
Grad-Cam
c=ReLU(ΣkαckAk) (2)
In the wide angle image illustrated in
The presentation image of the attention information as illustrated in
Therefore, on the basis of the wide angle image illustrated in
Note that the attention information presentation unit 701 may present information of which attention is paid when the control system 700 determines the target command-related information by using a technology other than Grad-Cam, such as LIME, for example. Furthermore, the attention information presentation unit 701 may present the attention information in a display format other than the heat map, or may present the long length information in combination with another output format such as heat map display and voice guidance.
Next, the reliability presentation unit 702 will be described. The reliability presentation unit 702 presents information for explaining the reliability such as how correctly the medical robot device 120 is likely to move on the basis of the target command-related information output by the motion predictor 702. Specifically, the reliability presentation unit 702 presents a numerical value indicating, for example, lack of data and a degree of influence of data when the motion predictor 502 outputs the target command-related information to the medical robot device 120, an unknown environment/condition, a probability indicating accuracy of a prediction result, a statistical score indicating certainty (accuracy) such as variance indicating dispersion in solutions, as data for explaining uncertainty or reliability of the target command-related information. The reliability presentation unit 702 estimates the uncertainty or the reliability of the target command-related information using, for example, a Bayesian deep neural network (DNN), and details of this point will be described later.
Furthermore,
Furthermore,
Note that the attention information presentation unit 701 may estimate information of which attention is paid when the motion predictor 502 determines the target command-related information by using a learned neural network model. Furthermore, the reliability presentation unit 702 may estimate the uncertainty or the reliability of the target command-related information output by the motion predictor 702 by using a learned neural network model. The neural network used by the attention information presentation unit 701 and the reliability presentation unit 702 may be a neural network independent of the image recognizer 501 and the motion predictor 502, or may be a neural network incorporating at least some neurons of the image recognizer 501 and the motion predictor 502.
Furthermore,
E-2. Bayesian DNN
Next, the Bayesian DNN used by the reliability presentation unit 703 to estimate the uncertainty or the reliability of the target command-related information will be described. The uncertainty of the determination result in the deep-learned neural network model can be divided into two types: aleatoric uncertainty and epistemic uncertainty.
The former case of aleatoric uncertainty is caused by noise due to the observing environment or the like, and is not caused by lack of data. For example, a hidden and invisible image (occlusion) or the like corresponds to the aleatoric uncertainty. Since a mouth of a face of a masked person is originally hidden by the mask, it cannot be observed as data. In an operative field image, a part of an organ hidden by a surgical tool cannot be observed as data. Meanwhile, the latter case of epistemic uncertainty represents uncertainty due to lack of data. Given the presence of sufficient data, the epistemic uncertainty can be improved.
Although it has been difficult to reveal epistemic uncertainty in the image field, the Bayesian deep learning proposal has made it possible to reveal uncertainty (see, for example, Non Patent Document 1). Bayesian deep learning is configured by combining Bayesian inference and deep learning. By using Bayesian inference, how the estimation result varies can be understood, and thus, uncertainty can be evaluated.
Bayesian deep learning is a method of estimating from a result of variance obtained in inference using a dropout in learning of deep learning. Dropout is a technique used to reduce overfitting by randomly reducing the number of neurons in each layer. The loss function in Bayesian deep learning is given by the following formula (3) according to the role of the dropout.
For a detailed mathematical theory of the above formula, refer to, for example, Non Patent Document 2. In conclusion, using dropout in deep learning is performing Bayesian learning. The value obtained by learning is not deterministic and can be calculated by combining the posterior distribution of weights with the dropout. The variance of the posterior distribution can be estimated from the dispersion in which the plurality of outputs is generated by the plurality of dropout coefficients. The Bayesian deep learning performs sampling from the weight distribution by using the dropout not only at the time of learning but also at the time of inference (Monte Carlo dropout). The uncertainty of the inference result can be obtained by repeating the inference many times for the same input. The network learned using the dropout has a structure in which some neurons are missing. Therefore, when an input image is input and inferred, it is possible to obtain an output that passes through neurons missing by the dropout and is characterized by the weight. Moreover, when a same image is input, it takes different paths in the network to output, so the weighted output is different for each case. That is, the network by the dropout can obtain different output distributions at the time of inference for the same input image. A large variance of the output means that the model has a large uncertainty. The average of the distribution by multiple inferences means a final prediction value, and the variance means uncertainty of the prediction value. Bayesian deep learning represents uncertainty from the variance of the output at the time of this inference.
The input data to the learning in the control system 700 illustrated in
E-4. Operation Procedure
First, a captured image of the endoscope 110, motion information of the robot, and sensor information of the robot are input to the control system 700 (step S1301).
The image recognizer 501 performs image recognition on the basis of the data input in step S1301 using the learned neural network model, and outputs the instrument recognition information and the environment recognition information (step S1302).
Then, the attention information presentation unit 701 visualizes the basis of determination when the neural network model used in the image recognizer 501 estimates the instrument recognition information and the environment recognition information by the Grad-Cam algorithm, and performs heat map display (step S1303).
Next, the motion predictor 502 predicts and outputs information related to the target command for the medical robot device 120 on the basis of the recognition information of the instrument recognition information and the environment recognition information output from the image recognizer 501 using the learned neural network model (step S1304).
The reliability presentation unit 702 presents information for explaining how correctly the medical robot device 120 is likely to move on the basis of the target command-related information output by the neural network model used in the motion predictor 502 or the like by the Bayesian DNN (step S1305). In step S1305, the reliability presentation unit 702 presents a numerical value indicating the lack of data, the unknown environment/condition, the variance and accuracy of the prediction result, and the like when the motion predictor 502 outputs the target command-related information to the medical robot device 120, as data for explaining the uncertainty or the reliability of the target command-related information.
Then, the operation of the arm of the medical robot device 120 is controlled on the basis of the target command-related information output by the motion predictor 502 in step S1303 (step S1306). In step S1306, the arm of the medical robot device 120 is driven by a control signal based on the target command-related information output by the motion predictor 502. However, in a case where the operator such as a doctor instructs correction of the operation of the arm on the basis of the information for explaining the uncertainty or the reliability presented in step S1304, the arm of the medical robot device 120 is operated on the basis of the instruction.
E-5. Control System (2) for Presenting Determination Basis
The attention information presentation unit 1401 presents information that the E2E predictor 601 pays attention to when determining the target command-related information. Specifically, the image input to the control system 700 is an operative field image captured by the endoscope. Using an algorithm such as Grad-Cam that visualizes the determination basis in the image classification problem, the attention information presentation unit 701 presents information of which attention is paid when the image recognizer 501 estimates the instrument recognition information and the environment recognition information. The Grad-Cam is as described above.
The attention information presentation unit 1401 presents the attention information in the form illustrated in
Furthermore, information for explaining how correctly the medical robot device 120 is likely to move on the basis of the target command-related information output by the E2E predictor 601 is presented. Specifically, the reliability presentation unit 1402 estimates a numerical value indicating lack of data, an unknown environment/condition, variance and accuracy of a prediction result, and the like when the E2E predictor 601 outputs the target command-related information to the medical robot device 120, using, for example, a Bayesian DNN. The Bayesian DNN is as described above.
The reliability presentation unit 1402 presents the explanation of the uncertainty or the reliability of the target command-related information to the medical robot device 120 in a form as illustrated in any of
E-6. Operation Procedure
First, a captured image of the endoscope 110, motion information of the robot, and sensor information of the robot are input to the control system 1400 (step S1501).
The E2E predictor 601 predicts and outputs information related to the target command for the medical robot device 120 on the basis of the data input in step S1501 using the learned neural network model (step S1502).
The attention information presentation unit 1401 visualizes the basis of the determination when the neural network model used in the E2E predictor 601 estimates the target command-related information by the Grad-Cam algorithm, and performs heat map display (step S1503).
Furthermore, the reliability presentation unit 1402 presents information for explaining how correctly the medical robot device 120 is likely to move on the basis of the target command-related information output by the neural network model used in the E2E predictor 601 by the Bayesian DNN (step S1504). In step S1504, the reliability presentation unit 702 presents a numerical value indicating the lack of data, the unknown environment/condition, the variance and accuracy of the prediction result, and the like when the motion predictor 502 outputs the target command-related information to the medical robot device 120, as data for explaining the uncertainty or the reliability of the target command-related information.
Then, the operation of the arm of the medical robot device 120 is controlled on the basis of the target command-related information output by the motion predictor 502 in step S1502 (step S1505). In step S1505, the arm of the medical robot device 120 is driven by a control signal based on the target command-related information output by the motion predictor 502. However, in a case where the operator such as a doctor instructs correction of the operation of the arm on the basis of the information for explaining the uncertainty or the reliability presented in step S1503, the arm of the medical robot device 120 is operated on the basis of the instruction.
E-4. Presentation Example of Determination Basis
The doctor can smoothly perform the surgery while confirming that the operation of the medical robot device 120 is not different from his/her own determination by viewing the image presenting the determination basis of the target command-related information of the medical robot device 120 output by the control system 700 (or 1400) together with the operative field image captured by the endoscope 110.
The form in which the determination basis is presented is arbitrary. However, it is preferable that the doctor can simultaneously confirm the operative field image and the determination basis during the surgery. For example, a region for displaying a presentation image of a determination basis may be provided in a format such as PinP in a screen for displaying an operative field image by the endoscope 110 as a main video on the display device 149, and the operative field image by the endoscope 110 and the presentation image of the determination basis may be simultaneously displayed. Furthermore, the operative field image by the endoscope 110 and the presentation image of the determination basis may be alternately displayed by using one screen. Alternatively, in addition to the main display that displays an image by the endoscope 110, a sub display that displays the presentation image of the attention information may be added.
On the main surgery video display unit 1601, an operative field image electronically cut out from an image captured by the endoscope 110 at the current position is displayed. Meanwhile, the information presentation unit 1602 displays a heat map image generated by the attention information presentation unit 701 (or 1401) and an image that presents uncertainty (lack of data, unknown environment/conditions, variance and accuracy of prediction results, and the like) or reliability in motion prediction generated by the reliability presentation unit 702 (or 1402) or the reliability presentation unit 702.
The attention information presentation unit 701 may generate a plurality of types of heat map images indicating the attention information. Furthermore, the reliability presentation unit 702 may generate a plurality of types of presentation images indicating lack of data, unknown environments/conditions, variance and accuracy of prediction results in motion prediction, and the like. Then, a plurality of types of heat map images and a presentation image of uncertainty or reliability may be simultaneously presented to the information presentation unit 1602. In the example illustrated in
In this way, by presenting the determination basis by the neural network model in a plurality of forms using the information presentation unit 1602, the doctor can smoothly perform the surgery while accurately confirming in a short time whether or not the operation of the medical robot device 120 is different from his/her determination.
F. Reflection of Determination of Doctor
In
The doctor looks at the operative field image displayed on the main surgery video display monitor 1901 and the determination basis displayed on the information presentation monitor 1902 (step S2001), and checks whether or not the operation of the medical robot device 120 predicted by the control system 700 (or 1400) using the neural network model is different from his/her own determination (step S2002).
Here, in a case where the doctor can confirm that the determination basis presented on the information presentation monitor 1902 is not different from his/her own determination (Yes in step S2002), the operation of the medical robot device 120 based on the target command-related information output by the motion predictor 502 (or the E2E predictor 601) meets the intention of the doctor. Therefore, without receiving a correction instruction from a doctor. On the basis of the target command-related information output from the motion predictor 502 (or the E2E predictor 601), the operation of the arm of the medical robot device 120 is controlled as it is (step S2004).
Meanwhile, in a case where the doctor confirms that the determination basis presented on the information presentation monitor 1902 is different from his/her own determination (No in step S2002), the doctor corrects the determination basis displayed on information presentation monitor 1902 using the input device 144 (step S2003). For example, the doctor manually instructs (for example, via UI) to change the position of the heat map of the instrument, environment, and gaze point on which the image is recognized for the heat map image displayed on the information presentation monitor 1902 as the determination basis.
Note that the doctor can instruct correction of the determination basis by, for example, a touch operation on the screen of the information presentation monitor 1902 or voice using the input device 144. Furthermore, the doctor may directly correct the operation of the arm of the medical robot device 120 using the master device of the medical robot device 120.
When the doctor issues a correction instruction of the determination basis, the motion predictor 502 (or the E2E predictor 601) corrects and outputs the target command-related information on the basis of the determination basis corrected and instructed by the doctor, and controls the operation of the arm of the medical robot device 120 (step S2004).
Note that, in a case where a correction instruction of the determination basis is issued by the doctor, the control system 700 (or 1400) may perform reinforcement learning of the learner (image recognizer 501 and motion predictor 502, or E2E predictor 602) according to the correction instruction from the doctor. Alternatively, the control system 700 (or 1400) may correct the target command on a rule basis in response to a correction instruction from the doctor.
When the arm of the medical robot device 120 is operated, the line-of-sight direction and the field of view of the endoscope 110 supported at the distal end thereof are changed, and the position where the operative field image is electronically cut out from the captured image of the endoscope 110 is moved. Then, the operative field image after the angle of view is moved is displayed on the main surgery video display monitor 1901 (step S2005).
The doctor observes the new operative field image displayed on the main surgery video display monitor 1901. Furthermore, the control system 700 (or 1400) repeatedly executes the motion prediction of the arm of the medical robot device 120 and the presentation of the determination basis of the prediction to the information presentation monitor 1902 on the basis of the captured image of the endoscope 110 after the movement and the motion information and the sensor information output from the medical robot device 120.
According to the operation procedure illustrated in
G. Relearning of Learner
In section F described above, it has been described that, in a case where the determination basis of the learner (image recognizer 501 and motion predictor 502 or E2E predictor 601) used by the control system 700 (or 1400) is different from the doctor's determination, it is necessary to perform reinforcement learning of the learner and correction of the target command on a rule basis.
In this section, processing of performing relearning so that the determination basis of the learner does not differ from the determination of the doctor (or the difference between the determination basis of the learner and the determination of the doctor is reduced or becomes zero) will be described.
As illustrated in
The operation data here includes a combination of input data to the learner, output data, and doctor's determination. Specifically, the input data to the learner includes a captured image of the endoscope 110, motion information of the robot, and sensor information of the robot. Furthermore, the output data from the learner is target command-related information of the medical robot device 120 predicted by the learner. Furthermore, the doctor's determination includes information regarding a doctor's instruction (presence or absence of correction instruction and contents of correction of determination basis) for presentation of the determination basis of the learner.
Here, when a trigger for relearning of the learner occurs (Yes in step S2202), the learner is updated by relearning using the accumulated operation data (step S2203).
Note that the relearning trigger is an arbitrary event. For example, when the accumulated operation data reaches a certain amount or when an operator such as a doctor instructs relearning may be used as a trigger of relearning.
Then, the control system 700 (or 1400) is operated by the learner updated by relearning (step S2204), and the operation of the endoscopic surgery system 100 is continued.
According to the operation procedure illustrated in
H. Autonomous Learning of Learner
In section F described above, it has been described that, in a case where the determination basis of the learner (image recognizer 501 and motion predictor 502 or E2E predictor 601) used by the control system 700 (or 1400) is different from the doctor's determination, it is necessary to perform reinforcement learning of the learner and correction of the target command on a rule basis.
In this section, a process of performing autonomous learning so that the determination basis of the learner does not differ from the determination of the doctor will be described.
As illustrated in
The operation data here includes a combination of input data to the learner, output data, and doctor's determination. Specifically, the input data to the learner includes a captured image of the endoscope 110, motion information of the robot, and sensor information of the robot. Furthermore, the output data from the learner is target command-related information of the medical robot device 120 predicted by the learner. Furthermore, the doctor's determination includes information regarding a doctor's instruction (presence or absence of correction instruction and contents of correction of determination basis) for presentation of the determination basis of the learner.
Then, whether the data of the learner is insufficient is verified by the Bayesian DNN (step S2402). Here, in a case where the lack of data of the learner is recognized by the Bayesian DNN (Yes in step S2403), data is added from the database to compensate for the lack of data (step S2404), relearning of the learner is performed, and the learner is updated (step S2405). That is, in this operation procedure, the estimation result by the Bayesian DNN serves as a trigger for relearning. The database may be an external database.
Then, the control system 700 (or 1400) is operated by the learner updated by relearning (step S2406), and the operation of the endoscopic surgery system 100 is continued.
According to the operation procedure illustrated in
The present disclosure has been described in detail above with reference to specific embodiments. However, it is obvious that those skilled in the art can make modifications and substitutions of the embodiments without departing from the gist of the present disclosure.
In the present specification, the embodiments in which the present disclosure is applied to a medical robot device that supports an endoscope has been mainly described, but the gist of the present disclosure is not limited thereto. The present disclosure can be similarly applied to a medical robot device that supports a medical instrument other than an endoscope, for example, forceps, a pneumoperitoneum tube, an energy treatment tool, tweezers, a retractor, or the like at the distal end, and further a robot device that performs information presentation, operation instruction, or the like without using a support tool or the like, to present a determination basis, uncertainty, or reliability of an estimation result by deep learning.
In short, the present disclosure has been described in the form of exemplification, and the contents described in the present specification should not be interpreted in a limited manner. In order to determine the gist of the present disclosure, the claims should be taken into consideration.
Note that the present disclosure can have the following configurations.
(1) A medical support system including:
(2) The medical support system according to (1),
(3) The medical support system according to any one of (1) and (2),
(4) The medical support system according to (2),
(5) The medical support system according to (3),
(6) The medical support system according to any one of (1) to (5),
(7) The medical support system according to (5),
(8) The medical support system according to (7),
(9) The medical support system according to any one of (7) and (8),
(10) The medical support system according to any one of (2) and (4),
(11) The medical support system according to any one of (2) and (4),
(12) The medical support system according to (3),
(13) The medical support system according to any one of (1) to (12), further including
(14) The medical robot device according to (2),
(15) A medical support method in a medical support system, the medical support method including:
(16) A computer program described in a computer readable format to execute processing of medical support in a medical support system on a computer, the computer program causing the computer to function as:
Number | Date | Country | Kind |
---|---|---|---|
2020-131216 | Jul 2020 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/022041 | 6/10/2021 | WO |