The present invention relates generally to image processing and, more particularly, to apparatuses, systems, and methods for foreground biased depth map refinement method for Depth Image Based Rendering (“DIBR”) View Synthesis.
Video-plus-depth format is an efficient way to represent 3D video. This format typically includes 2D color texture video and depth map with per pixel depth information. This is a very compact format, which has been especially suitable for mobile 3D video applications. Moreover, video-plus-depth format has high feasibility to render views with variable baseline by DIBR. Thus, stereo video and multiview video can be generated for stereoscopic or auto-stereoscopic 3D display devices using such methods.
Synthesizing new views using DIBR involves three major steps: (1) depth map preprocessing, (2) 3D image warping, and (3) hole filling. One challenge to synthesize high quality virtual views is to reconstruct the large disoccluded areas after the 3D image warping process. For example, as illustrated in
For example, as shown in
Common methods for filling disoccluded regions include linear interpolation and depth-aid horizontal extrapolation methods. Unfortunately, both of these methods generally leave artifacts or unwanted degredation of the image, which can be very annoying to a viewer of the image. Other hole-filling methods include multidirectional extrapolation and image inpainting. These methods analyze the surrounding texture information in the image and use that information to fill the holes in the synthesized views. Unfortunately, these hole-filling method also produce annoying artifacts. The main reason is that the disocclusion regions normally involve large depth discontinuities. Thus, the hole-filling techniques that only consider the planar image information cannot solve the problem.
Artifacts in the synthesized views using depth map information are mainly due to low depth map quality associated with incorrect depth values, especially for texture edge pixels that include foreground and background color pixels. In addition, object edges may be fuzzy and contain transitional edge pixels. Consequently, unprocessed depth map usually cause artifacts after the hole filling process. These artifacts are commonly due to the fact that transitional edge pixels are mapped to background regions in the image warping process and these pixels' information are then used to fill up the holes.
One approach to depth map improvement is to use the smoothing filters such as average filtering, Gaussian filtering, asymmetric filtering and/or adaptive filtering to blur the boundaries of depth map in order to eliminate holes or reduce the sizes of the large holes. The artifacts created in such hole-filling processes may be reduced, but the depth map may be highly degraded. The highly degraded depth map may cause a poverty of 3D perception of the synthesized view.
Another approach, called reliability-based approach, uses reliable warping information from other views to fill up holes and remove the artifacts. This method requires more than one view to solve this problem and is not suitable for the view synthesis with single texture video such as video-plus-depth based DIBR applications.
The present embodiments include methods, systems, and apparatuses for foreground biased depth map refinement in which horizontal gradient of the texture edge in color image is used to guide the shifting of the foreground depth pixels around the large depth discontinuities in order to make the whole texture edge pixels assigned with foreground depth values. In such an embodiment, only background information may be used in hole-filling process. Such embodiments may significantly improve the quality of the synthesized view by avoiding incorrect use of foreground texture information in hole-filling. Additionally, the depth map quality may not be significantly degraded when such methods are used for hole-filling.
Embodiments of a method for foreground biased depth map refinement for use in DIBR view synthesis are presented. In one embodiment, the method includes receiving texture information associated with a plurality of pixels in a video frame. The method may also include receiving depth information associated with the plurality of pixels in the video frame. Additionally, the method may include computing a gradient value associated with a change in the texture between a subset of the plurality of pixels in the video frame, and refining the depth information associated with the subset of the plurality of pixels in the video frame in response to the gradient value. In a further embodiment, refining the depth information may include adjusting the depth information to correspond to a value associated with a foreground portion of the video frame.
The method may also include calculating a depth difference value between two or more of the plurality of pixels in the video frame and comparing the depth difference value with a depth difference threshold, wherein computing the gradient value is performed in response to a determination that the depth difference value is greater than the depth difference threshold. Calculating the depth difference value may be performed for each of a plurality of pixels in a horizontal line of pixels in the video frame and for each of the plurality of pixels in a set of horizontal lines comprising the video frame.
Embodiments of the method may also include comparing the gradient value with a gradient threshold, wherein refining the depth information for each pixel is performed in response to a determination that the gradient value is greater than the gradient threshold. The texture information may include one or more color components. Alternatively, the texture information comprises one or more grayscale components. The depth information comprises a depth pixel in a depth map.
Embodiments of a system for foreground biased depth map refinement for use in DIBR view synthesis are also presented. In one embodiment, the system includes an input device configured to receive texture information and depth information associated with a plurality of pixels in a video frame. The system may also include a processor coupled to the input device. The processor may compute a gradient value associated with a change in the texture between a subset of the plurality of pixels in the video frame, and refine the depth information associated with the subset of the plurality of pixels in the video frame in response to the gradient value.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims. The novel features which are believed to be characteristic of the invention, both as to its organization and method of operation, together with further objects and advantages will be better understood from the following description when considered in connection with the accompanying figures. It is to be expressly understood, however, that each of the figures is provided for the purpose of illustration and description only and is not intended as a definition of the limits of the present invention.
For a more complete understanding of the present invention, reference is now made to the following descriptions taken in conjunction with the accompanying drawing, in which:
Characteristics of depth map and natural color image have many differences. Depth map represents distance between an object and a camera as a gray that has large homogeneous regions within scene objects and sudden changes of depth values at object boundaries. Thus, the edges of depth map are typically very sharp. However, most of the edges in color image are changing smoothly over a transition region. To illustrate these differences,
To illustrate this phenomenon, the example of
Based on the above observations, if the depth map can be refined in the preprocessing stage for fixing the misalignment problem with the foreground region to cover the whole transitional region of texture edges, the annoying hole filling artifacts should be significantly minimized in the synthesized views. Based on this idea, a foreground biased depth map refinement is disclosed for refining the sharp depth edges positions to the background regions based on the horizontal gradient of corresponding edges in color image.
For example, as illustrated in
If the depth map can be refined in the preprocessing stage with the foreground region to cover the whole transitional region of texture edges, the annoying hole filling artifacts may be significantly minimized in the synthesized views. In such an embodiment, the depth values of these transitional edge pixels are refined in order to make them become foreground pixels as shown in
In one embodiment, only the depth values of transitional edge pixels with large depth discontinuity are refined. Although the boundary artifacts appear around object boundaries, gradually changing depth values does not generate annoying artifacts since small depth discontinuities create only very small holes in the warped image. Artifacts are only observed in the large holes. In one embodiment, a pre-defined depth threshold is used to trigger the refinement process and this depth discontinuity threshold is derived based on the relationship of the hole's size with the depth values difference. The relationship between hole's size and depth values difference between two horizontal adjacent pixels based on shift-sensor model for DIBR may be devised as
where Δd is the depth values difference between two horizontal adjacent depth pixels, tc and f are the baseline distance and the focal length, respectively. The zn and zf represent the nearest distance and the farthest distance in the scene. In the proposed algorithm, hole's sizes greater or equal to 3 (h≧3) are classified as large holes. Thus, the pre-defined depth discontinuity threshold Td is given by
For any absolute depth values difference larger than Td, the hole's size in warped image will be larger than 3 pixels and the proposed foreground biased depth refinement will be performed around neighborhood's depth pixels.
The proposed refinement method is a line-by-line process aiming at extending the foreground depth values to cover the whole transitional region of texture edges based on the horizontal gradient at the color edges. The refinement process is triggered by the horizontal depth values changing from low to high with the difference larger than the pre-defined depth threshold Td(di−di+1<−Td), which similar to the sharp depth edge on the left side of
The refinement process is also triggered by the horizontal depth values change from high to low with the depth difference larger than the pre-defined depth threshold Td(di−di+1>Td) similar to the sharp depth edge on the right side of
One of the possible implementation of the proposed foreground biased depth refinement algorithm can be summarized as the following steps:
(9) If j<M, then input next horizontal line of the depth map and go to Step (2)
(10) End of the process
In the pseudo code described above, i is horizontal position on the line, j is a line in image, j=0 is first line in image. D is a depth difference value, and di represents the depth value of each pixel in the line. Step 5 describes a left-hand side shifting method, and step 6 above describes a right-hand side shifting method. Td is depth threshold. It is negative because we are moving from low to high depth values. The variable k is an index for shifting, and k defines the number of shifts of the depth map to cover the entire foreground. The variable Gj, i+k represents the gradient between pixel j and pixel i+k, and Gh is the gradient threshold. So if the gradient is bigger than the threshold, then shift. Optionally, the amount of shifting may be limited by setting a shifting window to avoid over-shifting using the W operator. N describes a maximum number of pixels on each line, which can be used to see if the last pixel in the line has been reached. M is the total number of lines in the image, which can be used to see if the last line in the image has been reached.
To further simplify the depth map refinement process, only low to high or high to low refinement process is applied for synthesizing the virtual left or right view in the DIBR process. The proposed method can be extended to the general case of 3D image warping process with a small modification. We can replace the method of finding large hole of comparing depth difference with threshold by checking the depth values of neighbor pixels that create larger hole in the warping process. Thus, the proposed method can be easily integrated into the DIBR based 3D image/video systems.
Although the present invention and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims. Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. As one of ordinary skill in the art will readily appreciate from the disclosure of the present invention, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized according to the present invention. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.