METHODS AND SYSTEMS FOR EXTRACTING MEDICAL IMAGES

FIELD

The present disclosure relates generally to medical imaging, and more specifically to extracting a subset of images from a series of images (e.g., surgical video feeds) for training machine-learning models and/or conducting various downstream analyses.

BACKGROUND

Medical systems, instruments or tools are utilized pre-surgery, during surgery, or post-operatively for various purposes. Some of these medical tools may be used in what are generally termed endoscopic procedures or open field procedures. For example, endoscopy in the medical field allows internal features of the body of a patient to be viewed without the use of traditional, fully invasive surgery. Endoscopic imaging systems incorporate endoscopes to enable a surgeon to view a surgical site, and endoscopic tools enable minimally invasive surgery at the site. Such tools may be shaver-type devices which mechanically cut bone and hard tissue, or radio frequency (RF) probes which are used to remove tissue via ablation or to coagulate tissue to minimize bleeding at the surgical site, for example.

In endoscopic surgery, the endoscope is placed in the body at the location at which it is necessary to perform a surgical procedure. Other surgical instruments, such as the endoscopic tools mentioned above, are also placed in the body at the surgical site. A surgeon views the surgical site through the endoscope in order to manipulate the tools to perform the desired surgical procedure. Some endoscopes are usable along with a camera head for the purpose of processing the images received by the endoscope. An endoscopic camera system typically includes a camera head connected to a camera control unit (CCU) by a cable. The CCU processes input image data received from the image sensor of the camera via the cable and then outputs the image data for display. The resolution and frame rates of endoscopic camera systems are ever increasing and each component of the system must be designed accordingly.

Another type of medical imager that can include a camera head connected to a CCU by a cable is an open-field imager. Open-field imagers can be used to image open surgical fields, such as for visualizing blood flow in vessels and related tissue perfusion during plastic, microsurgical, reconstructive, and gastrointestinal procedures.

During a surgical operation, a large volume of image data (e.g., video data) may be collected. The image data can be useful for various downstream analyses and training machine-learning models. However, due to the large size and the duplicative nature of the image data, it may be inefficient to process and analyze the image data in its entirety. Accordingly, it would be desirable to extract only a subset of data from the original image data for further processing.

Conventional approaches to image extraction suffer from a number of deficiencies. For example, with a fixed frame rate approach, image frames are sampled at a predefined, constant temporal resolution. However, the fixed frame rate approach may lose relevant frames (e.g., maneuvering of surgical tools) occurring between samplings, and still result in duplicative images that may bias models and downstream analyses. As another example, machine-learning models have been implemented to extract a particular pattern or feature in video feeds. However, these machine-learning models are restricted to detecting predefined features and thus fail to capture features that are not predefined but may be nevertheless relevant for downstream analyses.

SUMMARY

Disclosed herein are exemplary devices, apparatuses, systems, methods, and non-transitory storage media for medical image extraction. The systems, devices, and methods may be used to extract images from video data of a surgical operation, such as an endoscopic imaging procedure or an open field surgical imaging procedure. In some examples, the systems, devices, and methods may also be used to extract medical images from image data captured pre-operatively, post-operatively, and during diagnostic imaging sessions and procedures.

Examples of the present disclosure comprise automated de-duplication techniques with a variable frame rate for extracting images from a series of medical images (e.g., a surgical video feed). In the resulting extracted image set, replicative images that may bias downstream analyses or models are eliminated or reduced, but distinct images that capture potentially relevant actions (e.g., events during a surgical operation) are retained. The extracted images can improve various downstream analyses and the quality of machine-learning models trained using such data. As discussed herein, examples of the present disclosure provide variable image frame extraction using probabilistic modeling, which considers more images while an event occurs while minimizing similarity in image frames otherwise. The learning-based frame selection is superior to hard thresholding. The use of finite mixture models (“FMM”) provides a unique way to learn underlying parametric distribution and thus helps to provide better variable frame rate selection. Neighboring frames may be included through a spatial Markov Random Field (“MRF”) constraint. Further, examples of the present disclosure can maintain a target frame rate (e.g., specified by a user) and reduce motion blur and noise from the extracted images. Thus, techniques of the present disclosure ensure a generic way to extract relevant image frames by focusing more on frame-to-frame difference rather than on one feature alone in a single frame, ultimately providing effective selection of relevant frames while ensuring data variability.

An exemplary system can first obtain an image representation for each image of a series of images. The image representation represents feature context of an image in a generic manner. In some embodiments, the image representation is a hash value of the image. The system can then determine how different consecutive images in the series of images are, for example, by calculating difference values where each difference value is indicative of the difference between the hash values of two consecutive images in the series of images. The system then performs a smooth selection of images using probabilistic modeling of image hash difference values to select images based on the underlying distribution of difference values, which ensures variability in the selected images while minimizing the similarity between images. For example, the system generates a plurality of image clusters by clustering the difference values. To cluster the plurality of difference values, the system fits a finite mixture model using an expectation-maximization (“EM”) algorithm to learn the underlying parametric distribution using unsupervised-learning techniques. MRF constraint may be used for neighborhood dependency enabling a smooth transition from one frame onto other, rather than using a hard cut-off from cluster occupancy. MRF helps to provide a type of temporal modeling, because of which neighboring predictions tend to remain similar. It allows for a smooth gradation of cluster occupancy instead of sharp shifts. Finally, the system can select one or more image clusters from the plurality of image clusters (e.g., based on a target frame rate) and produce a subset of surgical images using the selected one or more image clusters.

In some examples, the subset of images obtained by examples of the present disclosure can be used to train a machine-learning model. The machine-learning model can be any machine-learning model that is configured to receive one or more surgical images and provide an output, such as a machine-learning model configured to receive a surgical image and detect objects and/or events in the surgical image. Rather than using all images of a video to train the model, only a subset of images needs to be provided to the machine-learning model to train the model. The subset of images may be equally or more effective at training the model because it includes the representative images in the video without including duplicative images to create bias in the model. At the same time, the required time, the processing power, and the computer memory to train the model can be significantly reduced due to the smaller number of training images. In some examples, the deduplication process can be used for data reduction and missing frames can be generated from reduced data using generative models.

In some examples, the subset of images obtained by examples of the present disclosure can be processed by an algorithm to analyze the surgical operation. Rather than providing an entire video stream to the algorithm, only the subset of images can be provided to the algorithm. The subset of images does not compromise the quality of the analysis because it includes the representative images in the original video. At the same time, the required time, the processing power, and the computer memory to conduct the analysis can be significantly reduced due to the smaller number of images that need to be processed.

In some examples, an algorithm can be used to process the subset of images and automatically identify events depicted in the subset of images. The system can then store an association between a given event and the timestamp of the image(s) depicting the given event for a later lookup. For example, a surgeon may want to review at a particular event or phase of surgery (e.g., a critical view of safety in laparoscopic cholecystectomy). Based on the event, the system can identify the timestamp(s) associated with the event and retrieve the image(s) for a quick review rather than requiring the surgeon to view the entire video to find the event.

In some examples, the subset of images obtained by examples of the present disclosure can be displayed on a display. If a medical practitioner would like to review a surgery, he or she can simply review the subset of images (e.g., as a shorter series of images or as a shortened video). Accordingly, the review time can be significantly reduced without compromising the thoroughness of the review.

While some examples of the present disclosure involve processing a series of images to obtain a subset of images, it should be appreciated that the examples of the present disclosure can be applied to process a series of videos to obtain a subset of videos. In some examples, examples of the present disclosure can be performed real time during a surgery. The extracted subset of images can be saved locally for display and/or uploaded through a network for downstream analyses (e.g., training machine-learning models).

According to some aspects, an exemplary method for obtaining a subset of surgical images from a series of video images of a surgery comprises: hashing image data for each image of the series of video images of the surgery to obtain a series of hash values; calculating a plurality of difference values for the series of hash values, each of the plurality of difference values indicative of a difference between two consecutive hash values in the series of hash values; generating a plurality of image clusters by clustering the plurality of difference values; selecting one or more image clusters from the plurality of image clusters; and producing the subset of surgical images from the series of video images using the selected one or more image clusters from the plurality of image clusters.

According to some aspects, the series of video images is captured by an endoscopic imaging system.

According to some aspects, the series of video images is captured by an open-field imaging system.

According to some aspects, the subset of surgical images includes an image depicting an event in the surgery.

According to some aspects, the subset of surgical images includes a single image depicting the event in the surgery.

According to some aspects, the event comprises: introduction of a surgical tool, removal of the surgical tool, movement of the surgical tool, identification of anatomical landmarks during surgery, critical view of safety in laparoscopic cholecystectomy, identification of critical structures during surgery, removal of organs, navigating through tissue structures as part of preparation, monitoring suture, checking for extravasation or leakage (blood, bile, or other fluids), cauterization, clipping, cutting, or any combination thereof.

According to some aspects, the method further comprises: training a machine-learning model based on the subset of surgical images from the series of video images.

According to some aspects, the machine-learning model is a generative model, the method further comprising: generating one or more images using the trained machine-learning model.

According to some aspects, the method further comprises: displaying the subset of surgical images from the series of video images.

According to some aspects, the method further comprises: detecting an event in an image in the subset of surgical images; and storing a timestamp associated with the image.

According to some aspects, each hash value of the series of hash values is an N-bit binary representation.

According to some aspects, hashing image data for each image of the series of video images of the surgery comprises: reducing the resolution of each image in the series of video images; and after reducing the resolution, applying a hash algorithm to the image to obtain a corresponding hash value.

According to some aspects, the hash algorithm comprises: an average hash algorithm, a difference hash algorithm, a perceptual hash algorithm, a wavelet hash algorithm, a locality-sensitive hash algorithm, or any combination thereof.

According to some aspects, each difference value of the plurality of difference values is a Hamming distance.

According to some aspects, the Hamming distance between two hash values is computed by performing a bit-wise OR operation between the two hash values.

According to some aspects, clustering the plurality of difference values comprises performing probabilistic clustering, K-means clustering, fuzzy C-means clustering, mean-shift clustering, hierarchical clustering, or any combination thereof.

According to some aspects, performing probabilistic clustering comprises performing unsupervised learning of finite mixture models (FMMs).

According to some aspects, performing probabilistic clustering comprises: (A) performing an expectation step to obtain an a posteriori probability for each cluster of a predefined number of clusters; (B) performing a maximization step to obtain one or more parameters for each cluster of the predefined number of clusters; and (C) repeating steps (A)-(B) until a convergence is reached.

According to some aspects, the one or more parameters comprises one or more distribution parameters.

According to some aspects, performing the maximization step further comprises calculating one or more prior probability values for each cluster of the predefined number of clusters.

According to some aspects, the one or more prior probability values include a spatial Markov Random Field (“MRF”) prior estimated from a posterior probability.

According to some aspects, selecting one or more image clusters from the plurality of image clusters comprises: assigning each difference value of the plurality of difference values to one of the plurality of image clusters based on the maximum a posteriori (MAP) rule; and ordering the plurality of image clusters.

According to some aspects, the first image of the series of video images is included in the subset of surgical images by default.

According to some aspects, the method further comprises: receiving a minimum frame selection window; and including one or more images from an unselected image cluster to the subset of surgical images based on the minimum frame selection window.

According to some aspects, the method further comprises: determining whether an image in the subset of surgical images comprises a motion artifact or noise.

According to some aspects, the method further comprises: in accordance with a determination that the image comprises a motion artifact or noise, removing the image from the subset of surgical images.

According to some aspects, the method further comprises: in accordance with a determination that the image comprises a motion artifact or noise, repairing the image.

According to some aspects, the method further comprises: in accordance with a determination that the image comprises a motion artifact or noise, including the image in the subset of surgical images.

According to some aspects, a system for obtaining a subset of surgical images from a series of video images of a surgery comprises: one or more processors; one or more memories; and one or more programs, wherein the one or more programs are stored in the one or more memories and configured to be executed by the one or more processors, the one or more programs including instructions for: hashing image data for each image of the series of video images of the surgery to obtain a series of hash values; calculating a plurality of difference values for the series of hash values, each of the plurality of difference values indicative of a difference between two consecutive hash values in the series of hash values; generating a plurality of image clusters by clustering the plurality of difference values; selecting one or more image clusters from the plurality of image clusters; and producing the subset of surgical images from the series of video images using the selected one or more image clusters from the plurality of image clusters.

According to some aspects, the series of video images is captured by an endoscopic imaging system.

According to some aspects, the series of video images is captured by an open-field imaging system.

According to some aspects, the subset of surgical images includes an image depicting an event in the surgery.

According to some aspects, the subset of surgical images includes a single image depicting the event in the surgery.

According to some aspects, the one or more programs further include instructions for: training a machine-learning model based on the subset of surgical images from the series of video images.

According to some aspects, the machine-learning model is a generative model, the system further comprising: generating one or more images using the trained machine-learning model.

According to some aspects, the one or more programs further include instructions for: displaying the subset of surgical images from the series of video images.

According to some aspects, the one or more programs further include instructions for detecting an event in an image in the subset of surgical images; and storing a timestamp associated with the image.