Systems and methods for multi-modal sensing of depth in vision systems for automated surgical robots

Description

BACKGROUND

Embodiments of the present disclosure generally relate to multi-modal sensing of three-dimensional position information of a surface of an object.

SUMMARY

According to embodiments of the present disclosure, systems for, methods for, and computer program products for determining a three-dimensional coordinate on an object are provided. In the method, an image is recorded. The image includes an object, a first plurality of markers disposed on the object, a second plurality of markers disposed on the object, and a third plurality of markers disposed on the object. A first depth is computed using the image and the first plurality of markers. A second depth is computed using the image and the second plurality of markers. A third depth is computed using the image and the third plurality of markers. A first weight is assigned to the first depth, a second weight is assigned to the second depth, and a third weight is assigned to the third depth. A weighted average depth is computed based on the first depth, second depth, third depth, first weight, second weight, and third weight.

In various embodiments, a system is provided for determining a three-dimensional coordinate on an object. The system includes an imaging device and a computing node including a computer readable storage medium having program instructions embodied therewith. The program instructions are executable by a processor of the computing node to cause the processor to perform a method where an image is recorded by the imaging device. The image includes an object, a first plurality of markers disposed on the object, a second plurality of markers disposed on the object, and a third plurality of markers disposed on the object. A first depth is computed using the image and the first plurality of markers. A second depth is computed using the image and the second plurality of markers. A third depth is computed using the image and the third plurality of markers. A first weight is assigned to the first depth, a second weight is assigned to the second depth, and a third weight is assigned to the third depth. A weighted average depth is computed based on the first depth, second depth, third depth, first weight, second weight, and third weight.

In various embodiments, a computer program product is provided for determining a three-dimensional coordinate on an object. The computer program product includes a computer readable storage medium having program instructions embodied therewith. The program instructions are executable by a processor to cause the processor to perform a method where an image is recorded. The image includes an object, a first plurality of markers disposed on the object, a second plurality of markers disposed on the object, and a third plurality of markers disposed on the object. A first depth is computed using the image and the first plurality of markers. A second depth is computed using the image and the second plurality of markers. A third depth is computed using the image and the third plurality of markers. A first weight is assigned to the first depth, a second weight is assigned to the second depth, and a third weight is assigned to the third depth. A weighted average depth is computed based on the first depth, second depth, third depth, first weight, second weight, and third weight.

In various embodiments, systems for, methods for, and computer program products for determining a three-dimensional coordinate on an object are provided. In the method, an image is recorded. The image includes an object, a first plurality of markers disposed on the object, and a second plurality of markers disposed on the object. A first depth is computed using the image and the first plurality of markers. A second depth is computed using the image and the second plurality of markers. A first weight is assigned to the first depth and a second weight is assigned to the second depth. A weighted average depth is computed based on the first depth, second depth, first weight, and second weight.

In various embodiments, an integrated surgical device is provided including an endoscope having a proximal end and a distal end, an imaging device optically coupled to the distal end of the endoscope, and a computing node comprising a computer readable storage medium having program instructions embodied therewith. The program instructions are executable by a processor of the computing node to cause the processor to perform a method where an image is recorded. The image includes an object, a first plurality of markers disposed on the object, a second plurality of markers disposed on the object, and a third plurality of markers disposed on the object. A first depth is computed using the image and the first plurality of markers. A second depth is computed using the image and the second plurality of markers. A third depth is computed using the image and the third plurality of markers. A first weight is assigned to the first depth, a second weight is assigned to the second depth, and a third weight is assigned to the third depth. A weighted average depth is computed based on the first depth, second depth, third depth, first weight, second weight, and third weight.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an exemplary image of a surface having fiducial markers in which the image may be used as a baseline image according to embodiments of the present disclosure.

FIG. 2 illustrates an exemplary image of a surface having a matrix of structured light markers overlaying the baseline image according to embodiments of the present disclosure.

FIG. 3A illustrates an exemplary image of simulated biological tissue according to embodiments of the present disclosure.

FIG. 3B illustrates an exemplary image of a depth map of simulated biological tissue according to embodiments of the present disclosure.

FIG. 4A illustrates an exemplary image of simulated biological tissue having a contrast agent applied to the surface according to embodiments of the present disclosure.

FIG. 4B illustrates an exemplary image of a depth map of simulated biological tissue having a contrast agent applied to the surface according to embodiments of the present disclosure.

FIG. 5 illustrates a 3D surface imaging system imaging a tissue according to embodiments of the present disclosure.

FIG. 6 shows a diagram illustrating a 3D surface imaging system according to embodiments of the present disclosure.

FIG. 7 shows an exemplary flowchart of a method for determining a three-dimensional coordinate on an object according to embodiments of the present disclosure.

FIG. 8 shows a table of analyzed sensors and their specifications according to embodiments of the present disclosure.

FIGS. 9A, 9B, and 9C illustrate graphs of the results of sensor bias according to embodiments of the present disclosure.

FIGS. 10A, 10B, and 10C illustrate graphs of the results of sensor precision according to embodiments of the present disclosure.

FIG. 11 shows a table of lateral noise of various sensors according to embodiments of the present disclosure.

FIGS. 12A, 12B, 12C, and 12D illustrate graphs of the precision ratios for different materials and lighting conditions (lower is better) according to embodiments of the present disclosure.

FIG. 13A illustrates a graph of the precision and FIG. 13B illustrates a graph of nan ratios (lower is better) in multi sensor setups where the indices represent the distance to the target according to embodiments of the present disclosure.

FIGS. 14A, 14B, and 14C illustrate graphs of the influence of additional sensors according to embodiments of the present disclosure.

FIG. 15 shows a schematic of an exemplary computing node according to embodiments of the present disclosure.

DETAILED DESCRIPTION

The ability to accurately discern three-dimensional position information (X, Y, Z) of target objects (e.g., biological tissue) is a necessary and critical requirement of an automated surgical robotic system. One approach is to use fiducial markers of a known size and shape directly attached to a surface of an object to determine positional information about the surface; however, spatial resolution of any method using fiducial markers is limited to the number of fiducials applied to the tissue. Fiducial markers must be large enough for computer vision systems to detect, but also small enough to maximize spatial resolution of the surface to which they are attached. Because of these conflicting requirements, there is an upper bound to the spatial resolution provided by fiducial markers, especially in surgical settings where automated surgical robot systems may be operating in small, confined spaces.

Many surgical maneuvers (e.g., suturing) require highly dexterous and highly accurate motion of surgical tools to achieve a satisfactory surgical outcome. In fully automated robotic surgical procedures having no active human control, the accuracy of the surgical tools controlled by the robot is highly dependent on the spatial resolution of the computer vision system. Because surgical outcomes are heavily dependent on the positional accuracy of the computer vision systems guiding the robotic tools, spatial resolution of the surgical sites is even more important in fully automated robotic surgical procedures. Solely using fiducial markers to guide fully automate surgical robots does not provide adequate spatial resolution of surgical sites to ensure satisfactory outcomes.

Accordingly, a need exists for a system and method to accurately and reliably sense positional information with a high resolution which enables accurate surgical planning and execution to improve enable robotic-assisted surgery.

Embodiments of the present disclosure generally relate to multi-modal sensing of three-dimensional position information of a surface of an object. In particular, the present disclosure describes multiple visualization modalities used to collect distinctive positional information of the surface of the object that is then combined using weighting factors to compute a final three-dimensional position. While the present disclosure generally focuses on sensing three-dimensional position with respect to automated surgical robots, the systems, methods, and computer program products are suitable for use in other fields that employ computer vision techniques to identify three-dimensional position, such as virtual reality or augmented reality applications.

A system for determining a three-dimensional coordinate on a surface of an object (e.g., a biological tissue) generally includes a first imaging system used to establish a baseline image of the object. The baseline image may be established using, e.g., a series of fiducial markers affixed to the surface of the object, to generate positional information for the surface of the object. For example, fiducial markers may be placed on the surface of a tissue via a spray applicator (e.g., spray catheter) In general, fiducial markers are special markers that may be recognized by a computer vision system to determine specific position information about the surface to which they are affixed. Non-limiting examples of fiducial markers may include symbols (e.g., alphanumeric), patterns (e.g., QR codes), liquid (e.g., infrared ink), or physical shapes (2D or 3D). This position information may be used to map the surface of the object and create a computer simulation of that surface in three-dimensions. The fiducial markers may be affixed to the object in a particular pattern (e.g., a grid pattern) or no particular pattern (e.g., randomized placement).

In various embodiments, the fiducial marker is applied to target tissue in a liquid state through a syringe needle. Applying a liquid marker to target tissue has a number of advantages. First, the marker can be mixed onsite which improves the stability of the marker. Second, a liquid marker allows the precise control over location and application to target tissue. Third, the marker can be applied as any irregular shape. By applying a liquid marker with syringe, the irrigated surgical field causes an exothermic reaction to solidify the marker in a circular shape to target tissue. A circular marker may be beneficial for tracking single points of interest on target tissue during a surgical procedure.

In various embodiments, a marking tip such as a syringe needle or felt nib may be used to dispense the fiducial marker in a linear pattern. By applying the fiducial marker as a continuous line, one can use the marker to define boundaries on target tissue. Defining boundaries may be useful to identify regions of diseased tissue or regions where a surgical procedure should not be performed. In yet another embodiment, the liquid marker may be sprayed onto the target tissue to create a speckled pattern when polymerized. A speckled pattern may be of interest to define large regions of tissue from each other. In one example, background tissue may be speckled to distinguish it from foreground tissue. Other components in robotic or semi-autonomous workflow may use background and foreground information to plan or control their motions or suggestions.

In other embodiments, the liquid marker may be applied though a predefined mask to apply the marker in any arbitrary and predefined shape on target tissue.

To acquire the position information of the surface of the object using fiducial markers, the first imaging system may include one or more cameras (e.g., one, two, three, four, or five). In various embodiments, the one or more cameras may include a stereoscopic camera. In various embodiments, the stereoscopic camera may be implemented by two separate cameras. In various embodiments, the two separate cameras may be disposed at a predetermined distance from one another. In various embodiments, the stereoscopic camera may be located at a distal-most end of a surgical instrument (e.g., laparoscope, endoscope, etc.). The camera(s) may cross-reference detected positions for each of the fiducial markers against a known reference (e.g., the known size and shape of the fiducial) to determine a positional information (e.g., depth) for each of the fiducial markers. Positional information, as used herein, may generally be defined as (X, Y, Z) in a three-dimensional coordinate system.

The one or more cameras may be, for example, infrared cameras, that emit infrared radiation and detect the reflection of the emitted infrared radiation. In other embodiments, the one or more cameras may be digital cameras as are known in the art. In other embodiments, the one or more cameras may be plenoptic cameras. The one or more cameras may be connected to a computing node as described in more detail below.

The present disclosure improves on the single mode approaches employing solely fiducial markers by also incorporating other visualization modalities in addition to fiducial marker tracking to improve the accuracy of the resulting positional information. A second imaging system may be used to generate position information for the surface of the object either individually or in combination with the other imaging systems described herein (e.g., after a baseline image is recorded using the first imaging system and positional information is acquired for each of the fiducial markers). The structured pattern projected from the structured light source may change shape, size, and/or spacing of pattern features when projected on a surface. The second imaging system may detect these changes and determine positional information based on the changes to the structured light pattern given a known pattern stored by the second imaging system. For example, the second imaging system may include a structured light source (e.g., a projector) that projects a specific structured pattern of lines (e.g., a matrix of dots or a series of stripes) onto the surface of the object. The pattern of lines produces a line of illumination that appears distorted from other perspectives than that of the source and these lines can be used for geometric reconstruction of the surface shape, thus providing positional information about the surface of the object.

The second imaging system may include one or more cameras (e.g., one, two, three, four, or five) capable of detecting the projected pattern from the source of structured light. The one or more cameras may be digital camera(s) as are known in the art and may be the same or different camera(s) as used with the first imaging system. The one or more cameras may be connected to a computing node as described in more detail below. Using the images from the one or more cameras, the computing node may compute positional information (X, Y, Z) for any suitable number of points along the surface of the object to thereby generate a depth map of the surface.

A third imaging system may be used to generate additional position information for the surface of the object. The third imaging system may include one or more cameras, such as a light-field camera (e.g., a plenoptic camera), and may be the same or different camera(s) as the camera(s) used for the first imaging system and the second imaging system. The plenoptic camera may be used to generate accurate positional information for the surface of the object by having appropriate zoom and focus depth settings.

One type of light-field (e.g., plenoptic) camera that may be used according to the present disclosure uses an array of micro-lenses placed in front of an otherwise conventional image sensor to sense intensity, color, and directional information. Multi-camera arrays are another type of light-field camera. The “standard plenoptic camera” is a standardized mathematical model used by researchers to compare different types of plenoptic (or light-field) cameras. By definition the “standard plenoptic camera” has micro lenses placed one focal length away from the image plane of a sensor. Research has shown that its maximum baseline is confined to the main lens entrance pupil size which proves to be small compared to stereoscopic setups. This implies that the “standard plenoptic camera” may be intended for close range applications as it exhibits increased depth resolution at very close distances that can be metrically predicted based on the camera's parameters. Other types/orientations of plenoptic cameras may be used, such as focused plenoptic cameras, coded aperture cameras, and/or stereo with plenoptic cameras.

Once positional information is generated using the first imaging system, the second imaging system and the third imaging system, a combined position may be calculated by computing a weighted average of the three imaging systems. As shown below in Equation 1, a combined pixel depth may be calculated by a weighted average of the depth generated from each of the three imaging systems.

$\begin{matrix} pixel depth = \frac{C_{M}}{C_{M} + C_{SL} + C_{P}} * {Depth}_{M} + \frac{C_{SL}}{C_{M} + C_{SL} + C_{P}} * {Depth}_{SL} + \frac{C_{P}}{C_{M} + C_{SL} + C_{P}} * {Depth}_{P} & (Eqn . 1) \end{matrix}$

In Equation 1, C_Mrepresents the weight assigned to the first imaging system (e.g., the marker-based system), C_SLrepresents the weight assigned to the second imaging system (e.g., the structured light-based system), C_Prepresents the weight assigned to the third imaging system (e.g., the structured light-based system), Depth_Mrepresents the depth of the pixel generated from the first imaging system, Depth_SLrepresents the depth of the pixel generated from the second imaging system, and Depth_Prepresents the depth of the pixel generated from the third imaging system. In various embodiments, each of the weights may be a value between zero (0) and one (1), and the sum of all weight values may add up to unity (1).

In various embodiments, the weight C_Massigned to the first imaging system may be equal to the weight C_SLassigned to the second imaging system and the weight C_Passigned to the third imaging system. In other embodiments, the weight C_SLassigned to the second imaging system is greater than the weight C_Massigned to the first imaging system and/or the weight C_Passigned to the third imaging system. In yet another embodiment, the weight C_Passigned to the third imaging system is greater than the weight C_Massigned to the first imaging system and/or the weight C_SLassigned to the second imaging system.

In various embodiments, weight for each variable in Equation 1 may be determined based on one or more factors selected based on the type of imaging system(s) used. For example, if light field imaging is used, factors may include: (1) amount of contrast in the image, (2) number of saturated pixels (which may be used to measure light intensity), and (3) localized change in depth of a specific area of the image. A high weight value may correspond to an image having high contrast within a scene, little to no saturated pixels, and low local change in depth.

In another example, if structured light imaging is used, factors may include: (1) amount of pattern recognized and (2) number of saturated pixels. A high weight value may correspond to an image having most or all of a pattern recognized and little to no saturated pixels.

In yet another example, if fiducial markers are used, factors may include (1) number of saturated pixels, (2) ability to recognize the shape/size of fiducial marker(s), and (3) ability to discern the fiducial marker(s) from the surrounding environment. A high weight value may correspond to an image having little to no saturated pixels, ability to recognize most or all of the fiducial markers, and the ability to discern the fiducials from the surrounding environment.

In various embodiments, any combination of two imaging modalities described herein may be used to compute first and second depths of a surface of an object. In this embodiment, each of the two imaging modalities may have a respective weighting factor that is applied to the depth determined by that particular modality. In various embodiments, the two weighting factors may add up to unity. In various embodiments, the pixel depth function is computed in a similar manner to that described above in Equation 1, but in contrast, the pixel depth for two modalities is dependent on only two weighted depth computations (instead of three).

In various embodiments, the weights associated with each imaging system may be dependent on the overall quality of the particular imaging system. For example, one particular imaging system may provide more accurate data overall than another imaging system. In this example, the data received the imaging system with the higher accuracy would be given a higher weight than the data received from the imaging system with the lower accuracy. In various embodiments, the accuracy and/or precision of various imaging systems may be dependent on the distance away from the object to be imaged, the material being imaged, and/or the lighting of the operating environment. In various embodiments, the accuracy and/or precision of various imaging systems may be dependent on a location in the field of view of the imaging system—for example a first imaging system may have high accuracy at the center of the field of view with a rapid decline towards the edges, while another imaging system may have a consistent accuracy across the field of view.

A discussion of how various sensors perform in different situations can be found in “An Empirical Evaluation of Ten Depth Cameras” by Halmetschlager-Funek et al., which is hereby incorporated by reference in its entirety. FIG. 8 shows a table of analyzed sensors in the Halmetschlager-Funek paper. FIGS. 9A, 9B, 9C, 10A, 10B, 10C, 11, 12A, 12B, 12C, 12D, 13A, 13B, 14A, 14B, and 14C illustrate various graphs from the Halmetschlager-Funek paper regarding the bias, precision, lateral noise, effects of materials/lighting/distance, and effects of additional sensors. In particular, regarding bias (shown in FIGS. 9A, 9B, and 9C), the paper describes that while the Kinectv2 offers low bias over the whole range, a significant increase of the bias for sensors using structured light was observed starting from d>3 m. While all three structured light sensors and the two active stereo cameras (ZR300 and D435) offer a lower bias than the Kinectv2 for distances d<1 m, three sensors (ZR300, Orbbec, and Structure IO) offer an even lower bias for depth values d<2.5 m. A quadratic increase of the bias was observed for all sensors [full range: d=0-8 m, FIG. 9A; zoom in: d=0-3 m, FIG. 9B]. The near-range sensors, F200 and SR300 [FIG. 9C], show a slightly higher bias than their far-range counterparts, while the Ensenso N35 provides a low bias over the whole measurement range.

As for precision (as shown in FIGS. 10A, 10B, and 10C), a quadratic decrease of precision was found in all far-range sensors [full range: d=0-8 m, FIG. 10A; zoom in: d=0-3, m, FIG. 10B], but the structured light sensors differ in scale compared to the Kinectv2. Overall, the R200 and ZR300 sensors have the worst performance, while the Structure IO and Orbbec sensors perform very similarly. At distances d<2 m, all structured light sensors were observed to generate less noisy measurements than the Kinec-tv2. Moreover, the D435 was able to gather more precise results than the Kinectv2 at distances d<1 m. The precision results for the D435 were observed to be more scattered than for the other sensors. The near-range sensors [FIG. 10C] experience noise levels up to 0.0007 m. In the ranges specified by the manufacturers, precision values under 0.004 m were able to be obtained.

As for lateral noise (FIG. 11), the analysis of lateral noise shows similar results for the three far-range structured light sensors and distances. For d<3 m, the noise level was independent of the distance, with three pixels for the structured light sensors and one for the Kinectv2 (FIG. 11). Two active stereo sensors (D435 and ZR300) offer a low lateral noise level similar to that of the Kinectv2. The R200 achieves a lower lateral noise of two pixels for distances closer than 2 m. In the near-range sensor, the Ensenso N35 achieves the highest lateral noise value.

As for materials/lighting/distance (FIGS. 12A, 12B, 12C, and 12D), a total of 384 data points were gathered to determine how the sensors' precision was influenced by the reflection and absorption properties of six different materials in combination with four different lighting conditions from 4.2 to 535.75 lux (FIGS. 12A, 12B, 12C, and 12D). The tests reveal that the Structure 10 sensor best handles the varying object reflectances and lighting conditions. Although it has a lower precision compared to the other sensors for distances of d>1.5 m, it was able to gather information for high-reflective surfaces, such as aluminum, and under bright lighting conditions. While the Structure 10 sensor gives a dense depth estimation, the Xtion was not able to determine a depth value. The Orbbec may fail to gather depth information for four of the six surfaces under bright lighting conditions. The Kinectv2 may fails to gather reliable depth data for aluminum at distances of d=1 m and d=1.5 m and under bright lighting conditions. The F200 and SR300 sensors may have a significantly lower precision for bright lighting conditions. During the setup of the experiments, the active stereo cameras (Ensenso and R200) were expected to be able to handle different lighting conditions better than the structured light sensors due to the nature of their technology. In FIGS. 12A, 12B, 12C, and 12D, a precision of zero indicates that the sensor is not able to gather any depth information.

As for noise induced by additional sensors (FIGS. 13A, 13B, 14A, 14B, and 14C), the results (FIGS. 13A and 13B) reveal that the far-range structured light sensors can handle noise induced by one and two additional sensors. An exception occurs when the distance to the target is d=1.5 m and two additional sensors are introduced to the scene. A similar effect was not observed for the Kinectv2. The sensor may give stable results for precision independent of one or two additional sensors. The near-range sensors F200 and SR300 may be less precise with an additional sensor, and the Ensenso N35 is only slightly affected by a third observing sensor. At this point, we note that the high nan ratio for the close-range devices can be partially derived from our setup. Half of the scene is out of the sensor's range (FIGS. 14A, 14B, and 14C). To summarize, the first experiment with one sensor provides a baseline for the measurements with two and three sensors observing the scene. The first differences may be visible if only one sensor is added. In particular, the SR300 and F200 sensors may have a significant increase in the nan ratio if another Realsense device is added to the scene. For a closer analysis, the corresponding depth images are shown. In FIGS. 14A, 14B, and 14C, it is clear that the depth extraction is heavily influenced by an additional sensor. The Ensenso and Kinectv2 sensors may be unaffected by the additional sensors.

In various embodiments, as described above, depth data received from one or more cameras may be higher quality (e.g., more reliable) than depth data from other cameras in the imaging system. In various embodiments, the quality of the depth data may be dependent on supporting features that are external to the imaging system. For example, depth data may be higher quality and therefore given a higher weight when a camera (e.g., infrared camera) can clearly read a predetermined number of fiducial markers on a tissue. In various embodiments, if the camera cannot read the predetermined number of markers, the depth data may be of a lower quality and therefore depth data from the camera may be given a lower weight. In a similar example, when a camera can clearly read a structured light pattern from a structured light projector, the depth data resulting from the structured light may be a higher quality and therefore given a higher weight.

In various embodiments, the weights associated with each imaging system may be dependent on the confidence of the depth and/or the quality of each pixel. In various embodiments, because some imaging systems have one or more “sweet-spot” in an image with higher quality image data and one or more “dead-zone” with lower quality image data, each of the weights associated with the imaging system(s) may be parameterized at the pixel-level of an image. In various embodiments, one or more (e.g., all) of the weights may be a function of 2-dimensional points (x, y) representing pixels in an image. In various embodiments, pixels in an image may be assigned coordinate points in any suitable way as is known in the art. For example, the bottom left corner of an image may be assigned a coordinate of (0, 0) and the top right corner of the image may be assigned the maximum number of pixels in each respective axis (max x pixels, max y pixels). In an example, one imaging system (e.g., stereoscopic camera) may have high-quality image data in the center of an image and low-quality image data on the periphery. In this particular example, a higher weight may be assigned to pixels in the center of the image and the weight may decrease as the pixels move radially away from the center of the image. In various embodiments, the parametric function may be a continuous function. In various embodiments, the parametric function may be a discontinuous function (e.g., piece-wise function). In various embodiments, the parametric function may include a linear function. In various embodiments, the parametric function may include an exponential function.

In various embodiments, when an imaging system cannot compute a depth at a particular pixel, that particular pixel may be assigned a weight of zero for the particular imaging system (i.e., the particular imaging system will not contribute to the determination of depth at that particular pixel).

In various embodiments, the imaging system may include stereoscopic depth sensing. In various embodiments, stereoscopic depth sensing may work best when there are one or more uniquely identifiable features in an image (or video frame). In various embodiments, stereoscopic depth sensing may be performed using two cameras (e.g., digital cameras). In various embodiments, the cameras may be calibrated with one another. For example, the imaging system may be calibrated based on latency, frame rate, three-dimensional distance between the two cameras, various distances away from the imaging system, various lighting levels, marker types/shapes/colors, etc. In various embodiments, software known in the art may be used to control the two cameras and implement stereoscopic depth sensing. In various embodiments, a first image (or frame of a video) is captured at a first camera and a second image (or frame of a video) is captured at a second camera that is located at a predetermined distance away from the first camera. In various embodiments, a pixel disparity is computed between the first image (or frame of a video) and the second image (or frame of a video). In various embodiments, a depth may be determined from the pixel disparity value. In various embodiments, closer objects have a higher pixel disparity value and further objects have a lower pixel disparity value. In various embodiments, three-dimensional coordinates (x, y, z) may be computed from the determined depth and the camera calibration parameters. In various embodiments, stereoscopic depth sensing may be used with fiducial markers to determine depth.

In various embodiments, the imaging system may include active stereoscopic depth sensing. In various embodiments, a projector may project a pattern that is unique on a local scale. In various embodiments, any suitable pattern may be used and the pattern does not have to be known to the imaging system in advance. In various embodiments, the pattern may change over time. In various embodiments, active stereoscopic depth sensing with a projector may provide depth information for featureless images in unstructured environments.

In various embodiments, a static mask may be projected onto a surface of an object (e.g., a tissue) in a scene. For example, a physical pattern (e.g., wire mesh) may be positioned in front of a source of light and lenses may be used to focus the light pattern onto the surface.

In various embodiments, a digital micromirror (DMD) projector may be used to project a pattern on the surface of the object. In this embodiment, light shines onto an array of micromirrors (e.g., 1,000,000 mirrors arranged in a rectangle). The mirrors may be controlled to either allow or prevent the light from entering and illuminating the scene. Lenses may be used to focus the light pattern onto the scene. In various embodiments, the DMD projector may allow for programmable patterns (e.g., QR code, letter, circle, square, etc.). It will be appreciated that a similar effect may be obtained using optical metasurfaces in place of a DMD.

In various embodiments, a scanned laser projector may be used to project a pattern on the surface of the object. In this embodiments, one or more laser sources are used to project a single pixel on the surface. A high definition image may be created by shining one pixel at a time at a high frequency. In various embodiments, focusing of a pattern may not be required with a scanned laser projector. In various embodiments, the scanned laser projector may allow for programmable patterns (e.g., QR code, letter, circle, square, etc.).

In various embodiments, custom algorithms may be developed for the stereoscopic camera to detect the known programmable pattern and to determine depth data from a surface onto which the pattern is projected. In various embodiments, the depth data is computed by determining a disparity value between a first image (or video frame) from the first camera and a second image (or video frame) from the second camera.

In various embodiments, a predetermined wavelength of light may be projected onto a surface of an object depending on the material of the surface. Different materials may have different absorption and/or reflectance properties across a continuum of wavelengths of light. In various embodiments, a wavelength is selected such that light reflects off of the outer-most surface of the object. In various embodiments, if a wavelength of light is selected that penetrates the surface of the object, the resulting image may have a washed out appearance resulting in inaccurate depth data (e.g., lower accuracy, high spatiotemporal noise).

In various embodiments, the imaging system may include an interferometer. In various embodiments, a light source may illuminate a scene with an object and a sensor may measure the phase difference between the emitted and reflected light. In various embodiments, depth may be computed directly from the sensor measurement. In various embodiments, this approach may have low computational resource requirements, faster processing, work on featureless scenes, and/or work at various illumination levels.

In various embodiments, the resulting depth map including the computed depths at each pixel may be post-processed. Depth map post-processing refers to processing of the depth map such that it is useable for a specific application. In various embodiments, depth map post-processing may include accuracy improvement. In various embodiments, depth map post-processing may be used to speed up performance and/or for aesthetic reasons. Many specialized post-processing techniques exist that are suitable for use with the systems and methods of the present disclosure. For example, if the imaging device/sensor is run at a higher resolution than is technically necessary for the application, sub-sampling of the depth map may decrease the size of the depth map, leading to throughput improvement and shorter processing times. In various embodiments, subsampling may be biased. For example, subsampling may be biased to remove the depth pixels that lack a depth value (e.g., not capable of being calculated and/or having a value of zero). In various embodiments, spatial filtering (e.g., smoothing) can be used to decrease the noise in a single depth frame, which may include simple spatial averaging as well as non-linear edge-preserving techniques. In various embodiments, temporal filtering may be performed to decrease temporal depth noise using data from multiple frames. In various embodiments, a simple or time-biased average may be employed. In various embodiments, holes in the depth map can be filled in, for example, when the pixel shows a depth value inconsistently. In various embodiments, temporal variations in the signal (e.g., motion in the scene) may lead to blur and may require processing to decrease and/or remove the blur. In various embodiments, some applications may require a depth value present at every pixel. For such situations, when accuracy is not highly valued, post processing techniques may be used to extrapolate the depth map to every pixel. In various embodiments, the extrapolation may be performed with any suitable form of extrapolation (e.g., linear, exponential, logarithmic, etc.).

In various embodiments, the first imaging system, the second imaging system, and the third imaging system use the same one or more cameras (e.g., plenoptic cameras) connected to a computing node. The computing node may process a single recorded image to extract the fiducial markers, the structure light pattern, and the light-field data as separate components. Each of the separate components may be used to compute positional information (e.g., a depth map) of a surface of the object. Weighting factors may be applied to each of the computed positional information to compute a weighted average depth.

In various embodiments, systems can use any combination of the above-mentioned imaging modalities/systems to determine positional information about the surface of a tissue. In various embodiments, the systems may determine that a weight value in Equation 1 is zero (0). In this case, a system uses multiple imaging modalities/systems to acquire positional data, but determines at least one of those imaging modalities/systems does not provide reliable positional data and thus disregards the particular imaging modality/system(s) that does not provide reliable data when applying Equation 1.

In some embodiments, a stereoscopic camera may be used as an imaging system either by itself or in combination with any of the above-mentioned imaging systems.

The object from which positional information is obtained may be any suitable biological tissue. For example, the object may be an internal bodily tissue, such as esophageal tissue, stomach tissue, small/large intestinal tissue, and/or muscular tissue. In other embodiments, the object may be external tissue, such as dermal tissue on the abdomen, back, arm, leg, or any other external body part. Moreover, the object may be a bone, internal organ, or other internal bodily structure. The systems and method of the present disclosure would similarly work for animals in veterinary applications.

In various embodiments, the systems and methods described herein may be used in any suitable application, such as, for example, diagnostic applications and/or surgical applications. As an example of a diagnostic application, the systems and methods described herein may be used in colonoscopy to image a polyp in the gastrointestinal tract and determine dimensions of the polyp. Information such as the dimensions of the polyp may be used by healthcare professionals to determine a treatment plan for a patient (e.g., surgery, chemotherapy, further testing, etc.). In another example, the systems and methods described herein may be used to measure the size of an incision or hole when extracting a part of or whole internal organ. As an example of a surgical application, the systems and methods described herein may be used in handheld surgical applications, such as, for example, handheld laparoscopic surgery, handheld endoscopic procedures, and/or any other suitable surgical applications where imaging and depth sensing may be necessary. In various embodiments, the systems and methods described herein may be used to compute the depth of a surgical field, including tissue, organs, thread, and/or any instruments. In various embodiments, the systems and methods described herein may be capable of making measurements in absolute units (e.g., millimeters).

Various embodiments may be adapted for use in gastrointestinal (GI) catheters, such as an endoscope. In particular, the endoscope may include an atomized sprayer, an IR source, a camera system and optics, a robotic arm, and an image processor.

In various embodiments, a contrast agent may be applied to the surface of the object, such as the surface of a biological tissue, to provide contrast to the surface of which three-dimensional positional information is to be generated by a computer vision system. When using some visualization modalities where precision is directly proportional to contrast and texture (e.g., light-field imaging), the contrast agent may be utilized to provide contrast to the surface. In various embodiments, where soft tissue is being imaged, the surface may be substantially uniform in color and have very little texture. In this case, a contrast agent, such as an atomized dye that adheres to the tissue (e.g., the serous membrane), may be applied to the tissue. The dye may be fluoresced and provide an artificial contrast to greatly improve the level of precision in the light-field imaging system.

When contrast is used on the surface of the tissue, a calibration may be obtained prior to the application of the contrast agent to determine depth information.

FIG. 1 illustrates an exemplary image 100 of a surface 102 having fiducial markers 104 in which the image may be used as a baseline image. In FIG. 1, fiducial markers 104 are provided on the surface 102 in the form of liquid markers. The fiducial markers 104 are painted in a matrix format such that a computer vision system running on a computing node can recognize the fiducial markers 104 and compute a three dimensional surface from the image. The computer vision system may include one or more cameras that record images of the object and provide the images to the computing node running computer vision software.

In various embodiments, the computer vision system generates three-dimensional position information (X, Y, Z) for each of the fiducial markers 104. The computer vision system may further interpolate positional information between the fiducial markers 104 or may extrapolate to generate a three-dimensional model of the surface 102 of the object.

FIG. 2 illustrates an exemplary image 200 of a surface 202 having a matrix of structured light markers 206 overlaying the baseline image 100 of FIG. 1. The matrix of structured light markers 206 are in the form of a grid of dots. The structured light markers 206 are projected onto the surface 202 of the object from a source of structured light (e.g., a laser) such that a computer vision system running on a computing node can recognize the structured light markers 206 and compute a three dimensional surface from the image. The computer vision system may include one or more cameras that record images of the structured light markers 206 projected onto the object and provide the images to the computing node running computer vision software. The computer vision software may analyze the structured light markers 206 from images taken at different visual angles and perform geometric reconstruction to generate positional information of the surface 202. As shown in FIG. 2, the matrix of structured light markers 206 has more markers projected onto the surface 202 than the fiducial markers 104 shown in FIG. 1. Thus, three-dimensional positional information will be more accurate using the structured light markers 206 as there are more data points from which the computer vision software can generate the three-dimensional model of the surface 202.

FIG. 3A illustrates an exemplary image of simulated biological tissue 310 while FIG. 3B illustrates an exemplary image of a depth map 315 of the same simulated biological tissue 310. As shown in FIG. 3A, the simulated biological tissue 310 (e.g., a serous membrane) is substantially uniform in color, is not textured, and has no artificial markers. The depth map 315 shown in FIG. 3B represents a depth map produced by light-field imaging of the simulated tissue 310. As shown in FIG. 3B, the depth map 315 has very little to no depth data in areas of little contrast—namely, the areas of the tissue 310 away from the edges. Depth data exists at the edges because of the contrast between the simulated tissue 310 and the background.

FIG. 4A illustrates an exemplary image of simulated biological tissue 410 having a contrast agent applied to the surface while FIG. 4B illustrates an exemplary image of a depth map 415 of the same simulated biological tissue 410 having the contrast agent. As shown in FIG. 4A, a contrast agent (e.g., an atomized blue dye) is applied to the simulated biological tissue 410 (e.g., a serous membrane). The depth map 415 shown in FIG. 4B represents a depth map produced by light-field imaging of the simulated tissue 410 having the contrast agent. As shown in FIG. 4B, the depth map 415 has much more data than the depth map 315 shown in FIG. 3B because of the contrast agent applied to the surface of the tissue. Based on the depth map 415, a computer vision system would recognize that the tissue 410 has a curved surface.

FIG. 5 illustrates a 3D surface imaging system 500 imaging a tissue according to embodiments of the present disclosure. The imaging system 500 includes an endoscope 520 having cameras 521a, 521b that, when used together, generate stereoscopic images of a tissue 502 (e.g., stomach). In various embodiments, the endoscope 520 may optionally, or additionally, include an infrared camera. The tissue 502 has fiducial markers 504 disposed thereon such that a camera (e.g., infrared camera) can detect the markers 504 against the background of the tissue 502. In various embodiments, the imaging system 500 further includes a projector 522. In various embodiments, the projector 522 may be configured to project structured light 506 (e.g., a dot pattern) onto the tissue 502. In various embodiments, the projector is configured to project infrared light. The imaging system 500 further includes a light-field (e.g., plenoptic) camera 524. In various embodiments, the tissue 502 may be sprayed with a contrast liquid as described above to allow the imaging system 500 to determine depth of the tissue 502.

FIG. 6 shows a diagram illustrating a 3D surface imaging system. The system combines three visualization modalities to improve the 3D imaging resolution. The system includes a camera system that can be moved by a robotic arm. For each of the visualization modalities, the camera system captures images of target tissue through a light guide in an endoscope and an optics mechanism. The images are processed by an image processor to determine a virtually constructed 3D surface.

In one visualization modality, the camera system includes a light-field (e.g, plenoptic) camera for capturing a plenoptic image of the target tissue. The image processor uses standard techniques to determine 3D surface variation and shape from the plenoptic image.

In a second visualization modality, the system uses an IR (infrared) source/projector for generating an IR spot pattern, which is projected on the target tissue via the optics mechanism and a light guide in the endoscope. The spot pattern can be predefined or random. The camera system includes an IR sensor that captures an image of the IR spots on the target tissue. The image is transmitted to the image processor, which detects distortions in the spot pattern projected on the target tissue to determine 3D surface variation and shape.

In a third visualization modality, the system uses an atomizer/sprayer in the endoscope to apply an atomized liquid dye to selected areas of the target tissue to increase the number of fiducial spots. The atomized dye adheres to the target tissue in a random spot pattern with a higher spot concentration than the IR spot pattern. The dye can be fluoresced to provide an augmented contrast with the tissue to improve precision of the imaging system.

The image processor determines which visualization modality data is most appropriate in a given situation, and combines the data where appropriate to further improve the 3D imaging resolution. The data can be combined using a weighting algorithm. The system thereby accurately and reliably senses depth with a high resolution, which is needed for accurate robotic surgical planning and execution.

FIG. 7 shows a flowchart 700 of a method for determining a three-dimensional coordinate on an object. At 702, the method includes recording an image, the image comprising an object, a first plurality of markers disposed on the object, a second plurality of markers disposed on the object, and a third plurality of markers disposed on the object. At 704, the method includes computing a first depth using the image and the first plurality of markers. At 706, the method includes computing a second depth using the image and the second plurality of markers. At 708, the method includes computing a third depth using the image and the third plurality of markers. At 710, the method includes assigning a first weight to the first depth, a second weight to the second depth, and a third weight to the third depth. At 712, the method includes computing a weighted average depth based on the first depth, second depth, third depth, first weight, second weight, and third weight.

Referring now to FIG. 15, a schematic of an exemplary computing node is shown that may be used with the computer vision systems described herein. Computing node 10 is only one example of a suitable computing node and is not intended to suggest any limitation as to the scope of use or functionality of embodiments described herein. Regardless, computing node 10 is capable of being implemented and/or performing any of the functionality set forth hereinabove.

In computing node 10 there is a computer system/server 12, which is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with computer system/server 12 include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputer systems, mainframe computer systems, and distributed cloud computing environments that include any of the above systems or devices, and the like.

Computer system/server 12 may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks or implement particular abstract data types. Computer system/server 12 may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

As shown in FIG. 15, computer system/server 12 in computing node 10 is shown in the form of a general-purpose computing device. The components of computer system/server 12 may include, but are not limited to, one or more processors or processing units 16, a system memory 28, and a bus 18 coupling various system components including system memory 28 to processor 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Computer system/server 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer system/server 12, and it includes both volatile and non-volatile media, removable and non-removable media.

System memory 28 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32. Computer system/server 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only, storage system 34 can be provided for reading from and writing to a non-removable, non-volatile magnetic media (not shown and typically called a “hard drive”). Although not shown, a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a “floppy disk”), and an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media can be provided. In such instances, each can be connected to bus 18 by one or more data media interfaces. As will be further depicted and described below, memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the disclosure.

Program/utility 40, having a set (at least one) of program modules 42, may be stored in memory 28 by way of example, and not limitation, as well as an operating system, one or more application programs, other program modules, and program data. Each of the operating system, one or more application programs, other program modules, and program data or some combination thereof, may include an implementation of a networking environment. Program modules 42 generally carry out the functions and/or methodologies of embodiments described herein.

Computer system/server 12 may also communicate with one or more external devices 14 such as a keyboard, a pointing device, a display 24, etc.; one or more devices that enable a user to interact with computer system/server 12; and/or any devices (e.g., network card, modem, etc.) that enable computer system/server 12 to communicate with one or more other computing devices. Such communication can occur via Input/Output (I/O) interfaces 22. Still yet, computer system/server 12 can communicate with one or more networks such as a local area network (LAN), a general wide area network (WAN), and/or a public network (e.g., the Internet) via network adapter 20. As depicted, network adapter 20 communicates with the other components of computer system/server 12 via bus 18. It should be understood that although not shown, other hardware and/or software components could be used in conjunction with computer system/server 12. Examples, include, but are not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data archival storage systems, etc.

In other embodiments, the computer system/server may be connected to one or more cameras (e.g., digital cameras, light-field cameras) or other imaging/sensing devices (e.g., infrared cameras or sensors).

The present disclosure includes a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure.

The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.

Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.

Computer readable program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In various embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present disclosure.

Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In various alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.

The descriptions of the various embodiments of the present disclosure have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A system for depth sensing, the system comprising: one or more imaging devices configured to obtain a plurality of images of an object in a surgical scene using a plurality of different imaging modalities, wherein the plurality of images comprises a plurality of markers on or near the object, and wherein the plurality of imaging modalities comprises at least two of: RGB imaging, infrared imaging, depth imaging, fiducial marker imaging, structured light pattern imaging, or light field imaging; anda processor configured to: (a) compute a plurality of depth measurements for at least a portion of the object based on the plurality of images; and(b) determine positional information for at least the portion of the object based on the plurality of depth measurements, wherein one or more weights are assigned to the plurality of depth measurements, wherein each of the one or more weights has a value between zero (0) and one (1), wherein a sum of the one or more weight values assigned to the plurality of depth measurements equals one (1), and wherein the plurality of depth measurements are weighted based at least in part on a type of imaging modality used to obtain one or more images of the plurality of images.
2. The system of claim 1, wherein the positional information comprises (i) a three-dimensional position of at least the portion of the object or (ii) one or more three-dimensional coordinates for at least the portion of the object.
3. The system of claim 1, wherein the plurality of depth measurements are weighted based at least in part on (i) a quality or a property of the one or more images or (ii) a reliability, an accuracy, or a precision of one or more depth measurements of the plurality of depth measurements.
4. The system of claim 1, wherein the plurality of depth measurements are weighted based at least in part an imaging performance, an imaging condition, or an imaging parameter of the one or more imaging devices.
5. The system of claim 1, wherein the one or more weights are parameterized at a pixel-level for the one or more images.
6. The system of claim 1, wherein the plurality of markers comprises different types of markers that are detectable using different imaging modalities.
7. The system of claim 1, wherein the plurality of images comprises (i) a baseline image comprising at least a subset of the plurality of markers and (ii) an additional image comprising a different subset of the plurality of markers.
8. The system of claim 1, wherein the plurality of markers comprises one or more fiducials that are physically applied to the object or the surgical scene.
9. The system of claim 8, wherein the one or more fiducials comprise a symbol, a pattern, a shape, a marker, a liquid, an ink, or a dye.
10. The system of claim 1, wherein the plurality of markers comprises one or more fiducials that are projected onto the object or the surgical scene, wherein the one or more fiducials comprise one or more structured light markers or optical markers.
11. The system of claim 10, further comprising a light source configured to project a pattern onto a surface of the object, wherein the pattern comprises or corresponds to the one or more structured light markers or optical markers.
12. The system of claim 11, wherein the processor is configured to (i) detect one or more changes in a size, a shape, or a configuration of the pattern when the pattern is projected on the surface of the object and (ii) determine the positional information for the object based on the one or more detected changes.
13. The system of claim 11, wherein the processor is configured to geometrically reconstruct a portion of the surface of the object based on a comparison between (i) the pattern projected onto the surface of the object and (ii) a known or predetermined pattern.
14. The system of claim 1, wherein the processor is configured to generate a depth map or a three-dimensional map of a surface of the object based on the positional information.
15. The system of claim 14, wherein the processor is configured to post-process the depth map or the three-dimensional map of the surface of the object by implementing one or more subsampling, spatial filtering, temporal filtering, blur removal, time-biased averaging, or extrapolation operations or techniques.
16. The system of claim 1, wherein the processor is configured to determine or measure one or more dimensions of the object based on the positional information.
17. The system of claim 1, further comprising an interferometer configured to measure a phase difference between (i) light emitted or transmitted to the surgical scene and (ii) light reflected from the surgical scene to obtain one or more additional depth measurements usable to determine or update the positional information.
18. The system of claim 1, wherein the one or more imaging devices comprise an RGB camera, an infrared camera, a stereoscopic camera, a light-field camera, a plenoptic camera, or a structured light detection unit.
19. The system of claim 1, wherein the object comprises a biological material, a tissue, an organ, an internal bodily structure, or an external bodily structure.
20. A system for depth sensing, the system comprising: one or more imaging devices configured to obtain a plurality of images of an object in a surgical scene using a plurality of different imaging modalities, wherein the plurality of images comprises a plurality of markers on or near the object, and wherein the plurality of imaging modalities comprises at least of: RGB imaging, infrared imaging, depth imaging, fiducial marker imaging, structured light pattern imaging, or light field imaging; anda processor configured to: (a) compute a plurality of depth measurements for at least a portion of the object based on the plurality of images; and(b) determine positional information for at least the portion of the object based on the plurality of depth measurements, wherein one or more weights are assigned to the plurality of depth measurements, wherein the one or more normalized weights have values between a predefined range, and wherein the plurality of depth measurements are weighted based on a type of imaging modality used to obtain one or more images of the plurality of images.
21. A system for depth sensing, the system comprising: one or more imaging devices configured to obtain a plurality of images of an object in a surgical scene using a plurality of different imaging modalities, wherein the plurality of images comprises a plurality of markers on or near the object, and wherein the plurality of imaging modalities comprises at least two of: RGB imaging, infrared imaging, depth imaging, fiducial marker imaging, structured light pattern imaging, or light field imaging; anda processor configured to: (a) compute a plurality of depth measurements for at least a portion of the object based on the plurality of images; and(b) determine positional information for at least the portion of the object based on the plurality of depth measurements, wherein one or more weights are assigned to the plurality of depth measurements, wherein each of the one or more weights has a value between a first value and a second value that is greater than the first value, wherein a sum of the one or more weight values assigned to the plurality of depth measurements sums to unity, and wherein the plurality of depth measurements are weighted based at least in part on a type of imaging modality used to obtain one or more images of the plurality of images.

CROSS-REFERENCE

This application is a continuation of U.S. patent application Ser. No. 17/150,701, filed on Jan. 15, 2021 and issued as U.S. Pat. No. 11,179,218, which application is a continuation application of International Application No. PCT/US2019/042647 filed on Jul. 19, 2019, which claims priority to U.S. Provisional Patent Application No. 62/700,700 filed on Jul. 19, 2018, which applications are incorporated herein by reference in their entirety for all purposes.

US Referenced Citations (316)

Number	Name	Date	Kind
6088105	Link	Jul 2000	A
6373963	Demers et al.	Apr 2002	B1
6491702	Heilbrun et al.	Dec 2002	B2
6503195	Keller et al.	Jan 2003	B1
6542249	Kofman et al.	Apr 2003	B1
6549288	Migdal et al.	Apr 2003	B1
6563105	Seibel et al.	May 2003	B2
6564086	Marchitto et al.	May 2003	B2
6613041	Schrunder	Sep 2003	B1
6697164	Babayoff et al.	Feb 2004	B1
6800057	Tsujita et al.	Oct 2004	B2
6850872	Marschner et al.	Feb 2005	B1
6873867	Vilsmeier	Mar 2005	B2
6885464	Pfeiffer et al.	Apr 2005	B1
RE38800	Barbour	Sep 2005	E
6965690	Matsumoto	Nov 2005	B2
6977732	Chen et al.	Dec 2005	B2
6987531	Kamon	Jan 2006	B2
7006236	Tomasi et al.	Feb 2006	B2
7068825	Rubbert et al.	Jun 2006	B2
7092107	Babayoff et al.	Aug 2006	B2
7099732	Geng	Aug 2006	B2
7124066	Marschner et al.	Oct 2006	B2
7152024	Marschner et al.	Dec 2006	B2
7184150	Quadling et al.	Feb 2007	B2
7200262	Sawada	Apr 2007	B2
7224384	Iddan et al.	May 2007	B1
7230725	Babayoff et al.	Jun 2007	B2
7242997	Geng	Jul 2007	B2
7305110	Rubbert et al.	Dec 2007	B2
7313264	Crampton	Dec 2007	B2
7319529	Babayoff	Jan 2008	B2
7363201	Marschner et al.	Apr 2008	B2
7385708	Ackerman et al.	Jun 2008	B2
7433807	Marschner et al.	Oct 2008	B2
7435217	Wiklof	Oct 2008	B2
7450783	Talapov et al.	Nov 2008	B2
7477402	Babayoff et al.	Jan 2009	B2
7489408	Harding et al.	Feb 2009	B2
7491956	Knoche et al.	Feb 2009	B2
7492927	Marschner et al.	Feb 2009	B2
7511829	Babayoff	Mar 2009	B2
7522764	Schwotzer	Apr 2009	B2
7577299	Kawamata et al.	Aug 2009	B2
7620209	Stevick et al.	Nov 2009	B2
7630089	Babayoff et al.	Dec 2009	B2
7704206	Suzuki et al.	Apr 2010	B2
7724378	Babayoff	May 2010	B2
7724932	Ernst et al.	May 2010	B2
7751871	Rubbert	Jul 2010	B2
7763841	McEldowney	Jul 2010	B1
7794388	Draxinger et al.	Sep 2010	B2
7821649	Bendall et al.	Oct 2010	B2
7854700	Orihara	Dec 2010	B2
7898651	Hu et al.	Mar 2011	B2
7944569	Babayoff et al.	May 2011	B2
7951073	Freed	May 2011	B2
7961912	Stevick et al.	Jun 2011	B2
7967743	Ishihara	Jun 2011	B2
7990548	Babayoff et al.	Aug 2011	B2
7995798	Krupnik et al.	Aug 2011	B2
8027710	Dannan	Sep 2011	B1
8038609	Kohno et al.	Oct 2011	B2
8084753	Joshi et al.	Dec 2011	B2
8264536	McEldowney	Sep 2012	B2
8279418	Yee et al.	Oct 2012	B2
8280152	Thiel et al.	Oct 2012	B2
8310683	Babayoff et al.	Nov 2012	B2
8320621	McEldowney	Nov 2012	B2
8326020	Lee et al.	Dec 2012	B2
8330804	Lutian et al.	Dec 2012	B2
8400494	Zalevsky et al.	Mar 2013	B2
8406859	Zuzak et al.	Mar 2013	B2
8471897	Rodriguez Ramos et al.	Jun 2013	B2
8517928	Orihara	Aug 2013	B2
8553939	Craig et al.	Oct 2013	B2
8558873	McEldowney	Oct 2013	B2
8593507	Yahav	Nov 2013	B2
8610665	Craig et al.	Dec 2013	B2
8649024	Colonna De Lega	Feb 2014	B2
8659765	Ando	Feb 2014	B2
8723118	McEldowney et al.	May 2014	B2
8723923	Bloom et al.	May 2014	B2
8744167	Kang et al.	Jun 2014	B2
8755053	Fright et al.	Jun 2014	B2
8792098	Dewald et al.	Jul 2014	B2
8803952	Katz et al.	Aug 2014	B2
8823790	Dunn et al.	Sep 2014	B2
8872824	Phillips et al.	Oct 2014	B1
8891087	Zuzak et al.	Nov 2014	B2
8896594	Xiong et al.	Nov 2014	B2
8974378	Imaizumi et al.	Mar 2015	B2
9001190	Olivier, III et al.	Apr 2015	B2
9057784	Hudman	Jun 2015	B2
9068824	Findeisen et al.	Jun 2015	B2
9070194	Lee et al.	Jun 2015	B2
9072445	Berguer et al.	Jul 2015	B2
9074868	Bendall et al.	Jul 2015	B2
9089277	Babayoff et al.	Jul 2015	B2
9119552	Baumann et al.	Sep 2015	B2
9135502	Haker et al.	Sep 2015	B2
9142025	Park et al.	Sep 2015	B2
9147253	Yee et al.	Sep 2015	B2
9149348	Wu et al.	Oct 2015	B2
9157728	Ogawa	Oct 2015	B2
9157733	Dillon et al.	Oct 2015	B2
9198578	Zuzak et al.	Dec 2015	B2
9204952	Lampalzer	Dec 2015	B2
9220570	Kim et al.	Dec 2015	B2
9226645	Ntziachristos	Jan 2016	B2
9226673	Ferguson, Jr. et al.	Jan 2016	B2
9247865	Igarashi et al.	Feb 2016	B2
9254076	McDowall	Feb 2016	B2
9254078	McDowall	Feb 2016	B2
9254103	Krishnaswamy et al.	Feb 2016	B2
9261356	Lampert et al.	Feb 2016	B2
9261358	Atiya et al.	Feb 2016	B2
9271658	Ferguson, Jr. et al.	Mar 2016	B2
9274047	Velten et al.	Mar 2016	B2
9282926	Schwotzer et al.	Mar 2016	B2
9294758	Xiong et al.	Mar 2016	B2
9297889	Hudman et al.	Mar 2016	B2
9304603	Miller	Apr 2016	B2
9330464	Ackerman et al.	May 2016	B1
9345389	Nie et al.	May 2016	B2
9345392	Saito	May 2016	B2
9345397	Taylor et al.	May 2016	B2
9351643	Sharonov	May 2016	B2
9364300	Tchouprakov et al.	Jun 2016	B2
9375844	Itkowitz et al.	Jun 2016	B2
9377295	Fright et al.	Jun 2016	B2
9380224	Keskin et al.	Jun 2016	B2
9389068	Ri	Jul 2016	B2
9404741	Schick	Aug 2016	B2
9432593	Yang et al.	Aug 2016	B2
9439568	Atiya et al.	Sep 2016	B2
9443310	Hudman et al.	Sep 2016	B2
9444981	Bellis et al.	Sep 2016	B2
9451872	Yokota	Sep 2016	B2
9462253	Hudman et al.	Oct 2016	B2
9471864	Zatloukal et al.	Oct 2016	B2
9491441	Sarmast et al.	Nov 2016	B2
9494418	Schmidt	Nov 2016	B2
9506749	Bellis et al.	Nov 2016	B2
9513113	Yang et al.	Dec 2016	B2
9513768	Zhao et al.	Dec 2016	B2
9545220	Sidlesky	Jan 2017	B2
9557574	McEldowney	Jan 2017	B2
9581802	Yokota	Feb 2017	B2
9615901	Babayoff et al.	Apr 2017	B2
9622644	Yokota	Apr 2017	B2
9622662	Zuzak et al.	Apr 2017	B2
9638801	Boufounos et al.	May 2017	B2
9674436	Crane et al.	Jun 2017	B2
9675429	Lampert et al.	Jun 2017	B2
9690984	Butler et al.	Jun 2017	B2
9696427	Wilson et al.	Jul 2017	B2
9720506	Kim et al.	Aug 2017	B2
9729860	Cohen et al.	Aug 2017	B2
9737239	Kimmel	Aug 2017	B2
9739594	Koerner et al.	Aug 2017	B2
9746318	Sugano	Aug 2017	B2
9752867	Atiya et al.	Sep 2017	B2
9782056	McDowall	Oct 2017	B2
9788903	Kim et al.	Oct 2017	B2
9799117	Chen et al.	Oct 2017	B2
9817159	Hudman	Nov 2017	B2
9833145	Jeong et al.	Dec 2017	B2
9841496	Hudman	Dec 2017	B2
9844427	Atiya et al.	Dec 2017	B2
9901409	Yang et al.	Feb 2018	B2
9918640	Ntziachristos et al.	Mar 2018	B2
9922249	Kang et al.	Mar 2018	B2
9939258	Lampert et al.	Apr 2018	B2
9943271	Dirauf et al.	Apr 2018	B2
9947099	Bleyer et al.	Apr 2018	B2
9953428	Gren et al.	Apr 2018	B2
9955140	Rhemann et al.	Apr 2018	B2
9955861	Gao et al.	May 2018	B2
9958585	Powell et al.	May 2018	B2
9958758	Hudman	May 2018	B2
9962244	Esbech et al.	May 2018	B2
9970753	Han et al.	May 2018	B2
10011014	Divoky et al.	Jul 2018	B2
10018464	Boles et al.	Jul 2018	B2
10024968	Hudman et al.	Jul 2018	B2
10039439	Aoyama	Aug 2018	B2
10045882	Balicki et al.	Aug 2018	B2
10055856	Sabater et al.	Aug 2018	B2
10058256	Chen et al.	Aug 2018	B2
10066997	Korner et al.	Sep 2018	B2
10089737	Krieger et al.	Oct 2018	B2
10244991	Shademan et al.	Apr 2019	B2
10390718	Chen et al.	Aug 2019	B2
10398519	Kim et al.	Sep 2019	B2
10675040	Kim et al.	Jun 2020	B2
10722173	Chen et al.	Jul 2020	B2
10792492	Chen et al.	Oct 2020	B2
10948350	Ferguson, Jr. et al.	Mar 2021	B2
11135028	Kim et al.	Oct 2021	B2
11179218	Calef et al.	Nov 2021	B2
11278220	Tucker et al.	Mar 2022	B2
20030195623	Marchitto et al.	Oct 2003	A1
20040114033	Eian et al.	Jun 2004	A1
20040239673	Schmidt	Dec 2004	A1
20050096515	Geng	May 2005	A1
20050116950	Hoppe	Jun 2005	A1
20050253849	Reddy et al.	Nov 2005	A1
20070115484	Huang et al.	May 2007	A1
20070146719	Wedel	Jun 2007	A1
20070165243	Kang et al.	Jul 2007	A1
20070280423	Schmidt	Dec 2007	A1
20080107305	Vanderkooy et al.	May 2008	A1
20080123910	Zhu	May 2008	A1
20080266391	Lee et al.	Oct 2008	A1
20080291463	Milner	Nov 2008	A1
20090221874	Vinther et al.	Sep 2009	A1
20090244260	Takahashi et al.	Oct 2009	A1
20100113921	Fear et al.	May 2010	A1
20100265557	Sallander	Oct 2010	A1
20110015518	Schmidt et al.	Jan 2011	A1
20110043609	Choi et al.	Feb 2011	A1
20110057930	Keller et al.	Mar 2011	A1
20110080471	Song et al.	Apr 2011	A1
20110123098	Ernst et al.	May 2011	A1
20120056982	Katz et al.	Mar 2012	A1
20120075432	Bilbrey et al.	Mar 2012	A1
20120095354	Dunn et al.	Apr 2012	A1
20120165681	Keller	Jun 2012	A1
20120176481	Lukk et al.	Jul 2012	A1
20120206587	Oz et al.	Aug 2012	A1
20120268491	Sugden et al.	Oct 2012	A1
20120294498	Popovic	Nov 2012	A1
20120310098	Popovic	Dec 2012	A1
20130023732	Kim et al.	Jan 2013	A1
20130129194	Gusis et al.	May 2013	A1
20130253313	Kang et al.	Sep 2013	A1
20130274596	Azizian et al.	Oct 2013	A1
20130296712	Durvasula	Nov 2013	A1
20140031665	Pinto et al.	Jan 2014	A1
20140052005	Yokota	Feb 2014	A1
20140071257	Yokota	Mar 2014	A1
20140092281	Nisenzon et al.	Apr 2014	A1
20140169629	Beaty et al.	Jun 2014	A1
20140177943	Cho et al.	Jun 2014	A1
20140184769	Ishihara et al.	Jul 2014	A1
20140194747	Kruglick et al.	Jul 2014	A1
20150062370	Shroff et al.	Mar 2015	A1
20150086956	Savitsky et al.	Mar 2015	A1
20150164329	Schmidt et al.	Jun 2015	A1
20150213646	Ma et al.	Jul 2015	A1
20150238276	Atarot et al.	Aug 2015	A1
20150377613	Small et al.	Dec 2015	A1
20160128553	Geng	May 2016	A1
20160139039	Ikehara et al.	May 2016	A1
20160239978	Cole et al.	Aug 2016	A1
20160260206	Jung et al.	Sep 2016	A1
20160262615	Jung et al.	Sep 2016	A1
20160278678	Valdes et al.	Sep 2016	A1
20160300348	Nadeau et al.	Oct 2016	A1
20160307325	Wang et al.	Oct 2016	A1
20160307326	Wang	Oct 2016	A1
20160309068	Nadeau et al.	Oct 2016	A1
20160335472	Lee et al.	Nov 2016	A1
20170014030	Rentschler et al.	Jan 2017	A1
20170020393	Rentschler et al.	Jan 2017	A1
20170026633	Riza	Jan 2017	A1
20170030710	Rentschler et al.	Feb 2017	A1
20170032531	Nagata et al.	Feb 2017	A1
20170059305	Nonn et al.	Mar 2017	A1
20170079724	Yang et al.	Mar 2017	A1
20170100024	Shahmoon et al.	Apr 2017	A1
20170143237	Yokota	May 2017	A1
20170164836	Krishnaswamy et al.	Jun 2017	A1
20170172382	Nir et al.	Jun 2017	A1
20170172384	Yokota	Jun 2017	A1
20170209031	Nakamura et al.	Jul 2017	A1
20170227942	Thomson et al.	Aug 2017	A1
20170228879	Sato	Aug 2017	A1
20170251900	Hansen et al.	Sep 2017	A1
20170258526	Lang	Sep 2017	A1
20170280970	Sartor et al.	Oct 2017	A1
20170328704	Atiya et al.	Nov 2017	A1
20170337703	Wu et al.	Nov 2017	A1
20170347043	Rephaeli et al.	Nov 2017	A1
20170366773	Kiraly et al.	Dec 2017	A1
20170367766	Mahfouz	Dec 2017	A1
20170372504	Jang	Dec 2017	A1
20180003943	Chan	Jan 2018	A1
20180008371	Manus	Jan 2018	A1
20180042466	Kang et al.	Feb 2018	A1
20180047165	Sato	Feb 2018	A1
20180104009	Abhari et al.	Apr 2018	A1
20180125586	Sela et al.	May 2018	A1
20180130255	Hazeghi et al.	May 2018	A1
20180165823	Ludwig	Jun 2018	A1
20180168769	Wood et al.	Jun 2018	A1
20180174318	Wang et al.	Jun 2018	A1
20180199902	Erhard	Jul 2018	A1
20180235715	Amiot et al.	Aug 2018	A1
20180243043	Michihata et al.	Aug 2018	A1
20180253593	Hu et al.	Sep 2018	A1
20180253909	Chen et al.	Sep 2018	A1
20180256264	McLachlin et al.	Sep 2018	A1
20180261009	Tepper et al.	Sep 2018	A1
20180276877	Mountney et al.	Sep 2018	A1
20190011703	Robaina et al.	Jan 2019	A1
20190053691	Hansen et al.	Feb 2019	A1
20200015893	Walach	Jan 2020	A1
20200110158	Ecins et al.	Apr 2020	A1
20200158874	Li et al.	May 2020	A1
20200305721	Chen et al.	Oct 2020	A1
20210030277	Ferguson, Jr. et al.	Feb 2021	A1
20210282654	Cha et al.	Sep 2021	A1
20220012954	Buharin	Jan 2022	A1
20220331558	Averbuch	Oct 2022	A1

Foreign Referenced Citations (21)

Number	Date	Country
102385237	Mar 2012	CN
2013544449	Dec 2013	JP
567693	Dec 2003	TW
WO-2010096447	Aug 2010	WO
WO-2010096453	Aug 2010	WO
WO-2012033578	Mar 2012	WO
WO-2012096878	Jul 2012	WO
WO-2014152753	Sep 2014	WO
WO-2014177604	Nov 2014	WO
WO-2016061052	Apr 2016	WO
WO-2016153741	Sep 2016	WO
WO-2017075602	May 2017	WO
WO-2017180097	Oct 2017	WO
WO-2018112424	Jun 2018	WO
WO-2019045971	Mar 2019	WO
WO-2020006454	Jan 2020	WO
WO-2020018931	Jan 2020	WO
WO-2020140044	Jul 2020	WO
WO-2021211986	Oct 2021	WO
WO-2022029308	Feb 2022	WO
WO-2022058499	Mar 2022	WO

Non-Patent Literature Citations (15)

Entry
Extended European Search Report for European Patent Application No. 19838270.7 dated Mar. 28, 2022.
Halmetschlager-Funek et al. An Empirical Evaluation of Ten Depth Cameras: Bias, Precision, Lateral Noise, Different Lighting Conditions and Materials, and Multiple Sensor Setups in Indoor Environments, IEEE Robotics & Automation Magazine pp. 99:1-1 Aug. 2018.
PCT/US19/42647 International Search Report dated Oct. 18, 2019.
PCT/US2021/02771 International Search Report & Written Opinion dated Sep. 17, 2021.
U.S. Appl. No. 17/150,701 Notice of Allowance dated Sep. 29, 2021.
U.S. Appl. No. 17/150,701 Office Action dated Jun. 21, 2021.
Co-pending U.S. Appl. No. 17/938,614, filed Oct. 6, 2022.
Dunn, et al. Laser speckle contrast imaging in biomedical optics. Journal of Biomedical Optics 15(1), 011109 (Jan./Feb. 2010).
EP19905077.4 Extended Search Report dated Aug. 8, 2022.
Holstein-Rathlou et al. Nephron blood flow dynamics measured by laser speckle contrast imaging. Am J Phsiol Renal Physiol 300: F319-F329, 2011.
PCT/US19/68760 Search Report & Written Opinion dated Apr. 1, 2020.
Richards et al. Intraoperative laser speckle contrast imaging with retrospective motion correction for quantitative assessment of cerebral blood flow. Neurophotonics 1(1), 015006 (Jul.-Sep. 2014).
Richards et al. Low-cost laser speckle contrast imaging of blood flow using a webcam. 2013 Optical Society of America.
U.S. Appl. No. 17/349,713 Office Action dated Aug. 12, 2022.
U.S. Appl. No. 17/349,713 Office Action dated Mar. 17, 2023.

Related Publications (1)

	Number	Date	Country
	20220117689 A1	Apr 2022	US

Provisional Applications (1)

	Number	Date	Country
	62700700	Jul 2018	US

Continuations (2)

	Number	Date	Country
Parent	17150701	Jan 2021	US
Child	17512543		US
Parent	PCT/US2019/042647	Jul 2019	US
Child	17150701		US

Systems and methods for multi-modal sensing of depth in vision systems for automated surgical robots

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Term Extension

Abstract