A number of different techniques are known for generating three-dimensional (3D) images of a spatial scene in real time. For example, 3D images of a spatial scene may be generated using triangulation based on multiple two-dimensional (2D) images. However, a significant drawback of such a technique is that it generally requires very intensive computations, and can therefore consume an excessive amount of the available computational resources of a computer or other processing device.
Other known techniques include directly generating a 3D image using a 3D imager such as a structured light (SL) camera or a time of flight (ToF) camera. Cameras of this type are usually compact, provide rapid image generation, and emit low amounts of power, and operate in the near-infrared part of the electromagnetic spectrum in order to avoid interference with human vision. As a result, SL and ToF cameras are commonly used in image processing system applications such as gesture recognition in video gaming systems or other systems requiring a gesture-based human-machine interface.
Unfortunately, the 3D images generated by SL and ToF cameras typically have very limited spatial resolution. For example, SL cameras have inherent difficulties with precision in an x-y plane because they implement light pattern-based triangulation in which pattern size cannot be made arbitrarily fine-granulated to achieve high resolution. Also, in order to avoid eye injury, both overall emitted power across the entire pattern as well as spatial and angular power density in each pattern element (e.g., a line or a spot) are limited. The resulting image therefore exhibits low signal-to-noise ratio and provides only a limited quality depth map, potentially including numerous depth artifacts.
Although ToF cameras are able to determine x-y coordinates more precisely than SL cameras, ToF cameras also have issues with regard to spatial resolution. For example, depth measurements in the form of z coordinates are typically generated in a ToF camera using techniques requiring very fast switching and temporal integration in analog circuitry, which can limit the achievable quality of the depth map, again leading to an image that may include a significant number of depth artifacts.
Embodiments of the invention provide image processing systems that process depth maps or other types of depth images in a manner that allows depth artifacts to be substantially eliminated or otherwise reduced in a particularly efficient manner. One or more of these embodiments involve applying a super resolution technique that utilizes at least one 2D image of substantially the same scene, but possibly from another image source, in order to reconstruct depth information associated with one or more depth artifacts in a depth image generated by a 3D imager such as an SL camera or a ToF camera.
In one embodiment, an image processing system comprises an image processor configured to identify one or more potentially defective pixels associated with at least one depth artifact in a first image, and to apply a super resolution technique utilizing a second image to reconstruct depth information of the one or more potentially defective pixels. Application of the super resolution technique produces a third image having the reconstructed depth information. The first image may comprise a depth image and the third image may comprise a depth image corresponding generally to the first image but with the depth artifact substantially eliminated. The first, second and third images may all have substantially the same spatial resolution. An additional super resolution technique may be applied utilizing a fourth image having a spatial resolution that is greater than that of the first, second and third images. Application of the additional super resolution technique produces a fifth image having increased spatial resolution relative to the third image.
Embodiments of the invention can effectively remove distortion and other types of depth artifacts from depth images generated by SL and ToF cameras and other types of real-time 3D imagers. For example, potentially defective pixels associated with depth artifacts can be identified and removed, and the corresponding depth information reconstructed using a first super resolution technique, followed by spatial resolution enhancement of the resulting depth image using a second super resolution technique.
Embodiments of the invention will be illustrated herein in conjunction with exemplary image processing systems that include image processors or other types of processing devices and implement super resolution techniques for processing depth maps or other depth images to detect and substantially eliminate or otherwise reduce depth artifacts. It should be understood, however, that embodiments of the invention are more generally applicable to any image processing system or associated device or technique in which it is desirable to substantially eliminate or otherwise reduce depth artifacts.
The image sources 104 comprise, for example, 3D imagers such as SL and ToF cameras as well as one or more 2D imagers such as 2D imagers configured to generate 2D infrared images, gray scale images, color images or other types of 2D images, in any combination. Another example of one of the image sources 104 is a storage device or server that provides images to the image processor 102 for processing.
The image destinations 106 illustratively comprise, for example, one or more display screens of a human-machine interface, or at least one storage device or server that receives processed images from the image processor 102.
Although shown as being separate from the image sources 104 and image destinations 106 in the present embodiment, the image processor 102 may be at least partially combined with one or more image sources or image destinations on a common processing device. Thus, for example, one or more of the image sources 104 and the image processor 102 may be collectively implemented on the same processing device. Similarly, one or more of the image destinations 106 and the image processor 102 may be collectively implemented on the same processing device.
In one embodiment the image processing system 100 is implemented as a video gaming system or other type of gesture-based system that processes images in order to recognize user gestures. The disclosed techniques can be similarly adapted for use in a wide variety of other systems requiring a gesture-based human-machine interface, and can also be applied to applications other than gesture recognition, such as machine vision systems in robotics and other industrial applications.
The image processor 102 in the present embodiment is implemented using at least one processing device and comprises a processor 110 coupled to a memory 112. Also included in the image processor 102 are a pixel identification module 114 and a super resolution module 116. The pixel identification module 114 is configured to identify one or more potentially defective pixels associated with at least one depth artifact in a first image received from one of the image sources 104. The super resolution module 116 is configured to utilize a second image received from possibly a different one of the image sources 104 in order to reconstruct depth information of the one or more potentially defective pixels, so as to thereby produce a third image having the reconstructed depth information.
In the present embodiment, it is assumed without limitation that the first image comprises a depth image of a first resolution from a first one of the image sources 104 and the second image comprises a 2D image of substantially the same scene and having a resolution substantially the same as the first resolution from another one of the image sources 104 different than the first image source. For example, the first image source may comprise a 3D image source such as a structured light or ToF camera, and the second image source may comprise a 2D image source configured to generate the second image as an infrared image, a gray scale image or a color image. As indicated above, in other embodiments the same image source supplies both the first and second images.
The super resolution module 116 may be further configured to process the third image utilizing a fourth image in order to produce a fifth image having increased spatial resolution relative to the third image. In such an arrangement, the first image illustratively comprises a depth image of a first resolution from a first one of the image sources 104 and the fourth image comprises a 2D image of substantially the same scene and having a resolution substantially greater than the first resolution from another one of the image sources 104 different than the first image source.
Exemplary image processing operations implemented using pixel identification module 114 and super resolution module 116 of image processor 102 will be described in greater detail below in conjunction with
The processor 110 and memory 112 in the
The pixel identification module 114 and the super resolution module 116 or portions thereof may be implemented at least in part in the form of software that is stored in memory 112 and executed by processor 110. A given such memory that stores software code for execution by a corresponding processor is an example of what is more generally referred to herein as a computer-readable medium or other type of computer program product having computer program code embodied therein, and may comprise, for example, electronic memory such as random access memory (RAM) or read-only memory (ROM), magnetic memory, optical memory, or other types of storage devices in any combination. As indicated above, the processor may comprise portions or combinations of a microprocessor, ASIC, FPGA, CPU, ALU, DSP or other image processing circuitry.
It should also be appreciated that embodiments of the invention may be implemented in the form of integrated circuits. In a given such integrated circuit implementation, identical die are typically formed in a repeated pattern on a surface of a semiconductor wafer. Each die includes image processing circuitry as described herein, and may include other structures or circuits. The individual die are cut or diced from the wafer, then packaged as an integrated circuit. One skilled in the art would know how to dice wafers and package die to produce integrated circuits. Integrated circuits so manufactured are considered embodiments of the invention.
The particular configuration of image processing system 100 as shown in
Referring now to the flow diagram of
In step 202, one or more potentially defective pixels associated with at least one depth artifact in the depth image D arc identified. These potentially defective pixels are more specifically referred to in the context of the present embodiment and other embodiments herein as “broken” pixels, and should be generally understood to include any pixels that are determined with a sufficiently high probability to be associated with one or more depth artifacts in the depth image D. Any pixels that are so identified may be marked or otherwise indicated as broken pixels in step 202, so as to facilitate removal or other subsequent processing of these pixels. Alternatively, only a subset of the broken pixels may be marked for removal or other subsequent processing based on thresholding or other criteria.
In step 204, the “broken” pixels identified in step 202 are removed from the depth image D. It should be noted that in other embodiments, the broken pixels need not be entirely removed. Instead, only a subset of these pixels could be removed, based on thresholding or other specified pixel removal criteria, or certain additional processing operations could be applied to at least a subset of these pixels so as to facilitate subsequent reconstruction of the depth information. Accordingly, explicit removal of all pixels identified as potentially defective in step 202 is not required.
In step 206, a super resolution technique is applied to the modified depth image D using a second image 208 illustratively referred to in this embodiment as a regular image from another origin. Thus, for example, the second image 208 may be an image of substantially the same scene but provided by a different one of the image sources 104, such as a 2D imager, and will therefore generally not include depth artifacts of the type found in the depth image D. The second image 208 in this embodiment is assumed to have the same resolution as the depth image D, and is therefore an M×N image, but comprises a regular image as contrasted to a depth image. However, in other embodiments, the second image 208 may have a higher resolution than the depth image D. Examples of regular images that may be used in this embodiment and other embodiments described herein include infrared images, gray scale images or color images generated by a 2D imager.
Accordingly, step 206 in the present embodiment generally utilizes two different types of images, a depth image with broken pixels removed and a regular image, both having substantially the same size.
Application of the super resolution technique in step 206 utilizing regular image 208 serves to reconstruct depth information of the broken pixels removed from the image in step 204, producing a third image 210. For example, depth information for the broken pixels removed in step 204 may be reconstructed by combining depth information from neighboring pixels in the depth map D with intensity data from an infrared, gray scale or color image corresponding to the second image 208.
This operation may be viewed as recovering from depth glitches or other depth artifacts associated with the removed pixels, without increasing the spatial resolution of the depth image D. The third image 210 in this embodiment comprises a depth image E of resolution M×N that does not include the broken pixels but instead includes the reconstructed depth information. The super resolution technique of step 206 should be capable of dealing with non-regular sets of depth points, as the corresponding pixel grid includes gaps where broken pixels at random positions were removed in step 204.
As will be described in more detail below, the super resolution technique applied in step 206 may be based at least in part, for example, on a Markov random field model. It is to be appreciated, however, that numerous other super resolution techniques suitable for reconstructing depth information associated with removed pixels may be used.
Also, the steps 202, 204 and 206 may be iterated in order to locate and substantially eliminate additional depth artifacts.
In the
The depth image E generated by the
Exemplary techniques for identifying potentially defective pixels in the depth image D in step 202 of the
Other techniques for identifying potentially defective pixels in the depth image D include detecting areas of contiguous potentially defective pixels, as illustrated in
Referring now to
|mean{di: pixel i is in the area}−mean{dj: pixel j is in the border}|>dT
where dT is a threshold value. If such unexpected depth areas are detected, all pixels inside each of the detected areas are marked as broken pixels. Numerous other techniques may be used to identify an area of contiguous potentially defective pixels corresponding to a given depth artifact in other embodiments. For example, the above-noted inequality can be more generally expressed to utilize a statistic as follows:
|statistic{di: pixel i is in the area}−statistic{dj: pixel j is in the border}|>dT
where statistic can be a mean as given previously, or any of a wide variety of other types of statistics, such as a median, or a p-norm distance metric. In the case of a p-norm distance metric, the statistic in the above inequality may be expressed as follows:
where xi in this example more particularly denotes an element of a vector x associated with a given pixel, and where p≧1.
By way of example, the neighborhood of pixels for the particular pixel p illustratively comprises a set Sp of n neighbors of pixel p:
Sp{p1, . . . , pn},
where the n neighbors each satisfy the inequality:
∥p−pi∥<d,
where d is a threshold or neighborhood radius and ∥·∥ denotes Euclidian distance between pixels p and pi in the x-y plane, as measured between their respective centers. Although Euclidean distance is used in this example, other types of distance metrics may be used, such as a Manhattan distance metric, or more generally a p-norm distance metric of the type described previously. An example of d corresponding to a radius of a circle is illustrated in
Again by way of example, a given particular pixel p can be identified as a potentially defective pixel and marked as broken if the following inequality is satisfied:
|zp−m|>kσ,
where zp is the depth value of the particular pixel, m and σ are the mean and standard deviation, respectively, of the depth values of the respective pixels in the neighborhood of pixels, and k is a multiplying factor specifying a degree of confidence. As one example, the confidence factor in some embodiments is given by k=3. A variety of other distance metrics may be used in other embodiments.
The mean m and standard deviation σ in the foregoing example may be determined using the following equations:
It is to be appreciated, however, that other definitions of σ may be used in other embodiments.
Individual potentially defective pixels identified in the manner described above may correspond, for example, to depth artifacts comprising speckle-like noise attributable to physical limitations of the 3D imager used to generate depth map D.
Although the thresholding approach for identifying individual potentially defective pixels may occasionally mark and remove pixels from a border of an object, this is not problematic as the super resolution technique applied in step 206 can reconstruct the depth values of any such removed pixels.
Also, multiple instances of the above-described techniques for identifying potentially defective pixels can be implemented serially in step 202, possibly with one or more additional filters, in a pipelined implementation.
As noted above, the
The super resolution technique applied in step 212 in the present embodiment is generally a different technique than that applied in step 206. For example, as indicated above, the super resolution technique applied in step 206 may comprise a Markov random field based super resolution technique or another super resolution technique particularly well suited for reconstruction of depth information. Additional details regarding an exemplary Markov random filed based super resolution technique that may be adapted for use in an embodiment of the invention can be found in, for example, J. Diebel et al., “An Application of Markov Random Fields to Range Sensing,” NIPS, MIT Press, pp. 291-298, 2005, which is incorporated by reference herein. In contrast, the super resolution technique applied in step 212 may comprise a super resolution technique particularly well suited for increasing spatial resolution of a low resolution image using a higher resolution image, such as a super resolution technique based at least in part on bilateral filters. An example of a super resolution technique of this type is described in Q. Yang et al., “Spatial-Depth Super Resolution for Range Images,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007, which is incorporated by reference herein.
The above are just examples of super resolution techniques that may be used in embodiments of the invention. The term “super resolution technique” as used herein is intended to be broadly construed so as to encompass techniques that can be used to enhance the resolution of a given image, possibly by using one or more other images.
Application of the additional super resolution technique in step 212 produces a fifth image 216 having increased spatial resolution relative to the third image. The fourth image 214 is a regular image having a spatial resolution or size in pixels of M1×N1 pixels, where it is assumed that M1>M and N1>N. The fifth image 216 is a depth image generally corresponding to the first image 200 but with one or more depth artifacts substantially eliminated and the spatial resolution increased.
Like the third image 208, the fourth image 214 is a 2D image of substantially the same scene as the first image 200, illustratively provided by a different imager than the 3D imager used to generate the first image. For example, the fourth image 214 may be an infrared image, a gray scale image or a color image generated by a 2D imager.
As noted above, different super resolution techniques are generally used in steps 206 and 212. For example, a super resolution technique used in step 206 to reconstruct depth information for removed broken pixels may not provide sufficiently precise results in the x-y plane. Accordingly, the super resolution technique applied in step 212 may be optimized for correcting lateral spatial errors. Examples include super resolution techniques based on bilateral filters, as mentioned previously, or super resolution techniques that are configured so as to be more sensitive to edges, contours, borders and other features in the regular image 214 than it is to features in the depth image E. Depth errors are not particularly important at this step of the
The dashed arrow from the M1×N1 regular image 214 to the M×N regular image 208 in
In the
It should also be noted that the
The embodiment of
In these and other embodiments, distortion and other types of depth artifacts are effectively removed from depth images generated by SL and ToF cameras and other types of real-time 3D imagers.
It should again be emphasized that the embodiments of the invention as described herein are intended to be illustrative only. For example, other embodiments of the invention can be implemented utilizing a wide variety of different types and arrangements of image processing circuitry, pixel identification techniques, super resolution techniques and other processing operations than those utilized in the particular embodiments described herein. In addition, the particular assumptions made herein in the context of describing certain embodiments need not apply in other embodiments. These and numerous other alternative embodiments within the scope of the following claims will be readily apparent to those skilled in the art.
Number | Date | Country | Kind |
---|---|---|---|
2012145349 | Oct 2012 | RU | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US13/41507 | 5/17/2013 | WO | 00 | 1/10/2014 |