The invention relates to a method of generating a depth map comprising depth values representing distances to a viewer, for respective pixels of an image.
The invention further relates to a depth map generating unit for generating a depth map comprising depth values representing distances to a viewer, for respective pixels of an image.
The invention further relates to an image processing apparatus comprising:
The invention further relates to a computer program product to be loaded by a computer arrangement, comprising instructions to generate a depth map comprising depth values representing distances to a viewer, for respective pixels of an image, the computer arrangement comprising processing means and a memory.
In order to generate a 3D impression on a multi-view display device, images from different virtual view points have to be rendered. This requires either multiple input views or some 3D or depth information to be present. This depth information can be recorded, generated from multiview camera systems or generated from conventional 2D video material. For generating depth information from 2D video several types of depth cues can be applied: such as structure from motion, focus information, geometric shapes and dynamic occlusion. The aim is to generate a dense depth map, i.e. per pixel a depth value. This depth map is subsequently used in rendering a multi-view image to give the viewer a depth impression. In the article “Synthesis of multi viewpoint images at non-intermediate positions” by P. A. Redert, E. A. Hendriks, and J. Biemond, in Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Vol. IV, ISBN 0-8186-7919-0, pages 2749-2752, IEEE Computer Society, Los Alamitos, Calif., 1997 a method of extracting depth information and of rendering a multi-view image on basis of the input image and the depth map are disclosed.
It is an object of the invention to provide a method of the kind described in the opening paragraph, which is based on a new depth cue.
This object of the invention is achieved in that the method comprises:
The invention is based on the following observation. Objects in a scene to be imaged have different sizes, luminances, and colors and have a certain spatial disposition. Some of the objects occlude other objects in the image. Differences between luminance and/or color values of pixels in an image are primarily related to the differences between optical characteristics of the surfaces of the objects and related to the spatial positions of objects relative to light sources within the scene. Optical characteristics of surfaces comprise e.g. color and reflectiveness. Hence, a relatively large transition in luminance and/or color, i.e. a relatively big difference between pixel values of neighboring pixels corresponds to a transition between a first image segment and a second image segment, whereby the first image segment corresponds to a first object and the second image segment corresponds to a second object in the scene being imaged. By determining for the pixels of the image the number of and extend of transitions in luminance and/or color, i.e. differences between pixel values on a path from the respective pixels to a predetermined location of the image, respective measures related to the spatial disposition of the objects in the scene can be achieved. These measures, i.e. cost values are subsequently translated into depth values. This translation is preferably a multiplication of the cost value with a predetermined constant. Alternatively, this translation corresponds to a mapping of the respective cost values to a predetermined range of depth values by means of normalization.
It should be noted that the background also forms one or more objects, e.g. the sky or a forest or a meadow.
The depth value which is based on the luminance and/or color transients can be directly used as depth value for rendering a multi-view image, e.g. as described in the cited article. Preferably, the depth value according to the invention is combined with other depth values, which are based on alternative depth cues as mentioned above.
In an embodiment of the method according to the invention, a first one of the differences is equal to a difference between respective values of neighboring pixels, which are disposed on the path. Computing a difference between two adjacent pixels is relatively easy. Alternatively, a difference is based on more than two pixel values. In a further alternative, a difference is computed between two pixels which are both on the path but which are not adjacent, e.g. the difference is computed between a minimum and a maximum pixel value whereby the minimum and maximum pixel value corresponds to respective pixels which are located within a predetermined distance. Preferably, an absolute difference between respective values of pixels, which are disposed on the path, is computed.
In an embodiment of the method according to the invention, the cost value for the first one of the pixels is computed by accumulating the differences between the values of the pixels, which are disposed on the path. Accumulation, i.e. integration, summation or addition of differences is relatively easy to implement. Preferably, only the differences which are larger than a predetermined threshold are combined by means of accumulation. An advantage of applying a threshold is that the depth value determination is less sensitive to noise within the image.
In another embodiment of the method according to the invention, the cost value for the first one of pixels is computed by accumulating products of differences between the values of the pixels, which are disposed on the path, and respective weighting factors for the differences. By applying weighting factors, it is possible to control the contributions of pixel value differences for the computation of depth values corresponding to the respective pixels. For example, a first one of the weighting factors, which is related to a difference between a value of a particular pixel and a value of its neighboring pixel, is based on a distance between the particular pixel and the first one of the pixels. The first one of the weighting factors is typically relatively low if the distance between the particular pixel and the first one of the pixels is relatively high. For example, a second one of the weighting factors, which is related to a difference between a value of a particular pixel and a value of its neighboring pixel, is based on the location of the neighboring pixel related to the particular pixel. E.g. the second one of the weighting factors is relatively high if the neighboring pixel is located above the particular pixel and is relatively low if the neighboring pixel is located below the particular pixel. Alternatively, the second one of the weighting factors is related to the angle between a first vector and a second vector, whereby the first vector corresponds to the location of the neighboring pixel related to the particular pixel and the second vector corresponds to the location of the first one of pixels related to the second one of the pixels.
An embodiment according to the invention, further comprises:
In this embodiment according to the invention, the first one of depth values is based on a particular selection of multiple values related to multiple paths, i.e. the optimum path from the first one of the pixels to the second one of the pixels. Notice that the second one of pixels and the third one of the pixels may be mutually equal, i.e. the same. An alternative type of selection or combination of the cost values related to respective paths is advantageous. For instance an average of cost values related to multiple paths can be computed.
Another embodiment of the method according to the invention further comprises computing a second cost value for a third one of the pixels on basis of the cost value for the first one of the pixels. Making reuse of already computed cost values results in a computing efficient implementation. Typically, computing the second cost value is performed by combining the cost value of the first one of the pixels with a difference between further values of further pixels which are disposed on a second path from the third one of the pixels to the first one of the pixels.
In an embodiment of the method according to the invention, whereby cost values corresponding to respective pixels of the image are successively computed on basis of further cost values being computed for further pixels, a first scan direction of successive computations of cost values for a first row of pixels of the image is opposite to a second scan direction of successive computations of cost values for a second row of pixels of the image. Typically for each of the pixels of the image a depth value has to be computed. Preferably, usage is made of cost values already computed for other pixels when computing a particular cost value for a particular pixel. The order in which the successive pixels are processed, i.e. the depth values are computed, is relevant. Preferably, the order is such that the pixels of the image are processed row-by-row or alternatively column-by-column. If the pixels are processed row-by-row then it is advantageous to processes the subsequent rows in reverse order, e.g. the even rows from left to right and the odd rows from right to left or vice versa. The inventors have observed that this zigzag type of processing results in much better results than a processing whereby all rows are processed in the same scan direction. The quality of the depth map created on basis of this zigzag type of processing, is comparable with results from more expensive methods of determining cost values for respective paths. With more expensive is meant that more paths are evaluated in order to determine the optimal path.
It is a further object of the invention to provide a depth map generating unit of the kind described in the opening paragraph, which is based on a new depth cue.
This object of the invention is achieved in that the generating unit comprises:
It is a further object of the invention to provide an image processing apparatus comprising a depth map generating unit of the kind described in the opening paragraph, which is arranged to generate a depth map based on a new depth cue.
This object of the invention is achieved in that the generating unit comprises:
It is a further object of the invention to provide a computer program product of the kind described in the opening paragraph, which is based on a new depth cue.
This object of the invention is achieved in that the computer program product, after being loaded, provides said processing means with the capability to carry out:
Modifications of the depth map generating unit and variations thereof may correspond to modifications and variations thereof of the image processing apparatus, the method and the computer program product, being described.
These and other aspects of the depth map generating unit, of the image processing apparatus, of the method and of the computer program product, according to the invention will become apparent from and will be elucidated with respect to the implementations and embodiments described hereinafter and with reference to the accompanying drawings, wherein:
Same reference numerals are used to denote similar parts throughout the Figures.
In general, a pixel in an image is connected to 8 neighboring pixels, i.e. 2 pixels being horizontally located relative to the pixel, 2 pixels being vertically located relative to the pixel and 4 pixels being diagonally located relative to the pixel. Pairs of pixels of the path 112 are mutually located in one of these 3 ways, i.e. horizontally, vertically or diagonally relative to each other.
In
The depth map 122 is generated on basis of the method according to the invention. For the generation of the depth value 130 corresponding to the first pixel 108 the following steps are performed:
The second pixel 110 belongs to a predetermined subset of the pixels of the image 100. In this case the predetermined subset comprises pixels at the border of the image. In alternative embodiments the subset comprises pixels of a part of the border, e.g. only the pixels of the upper border of the image or the lower border of the image. In a further alternative the subset comprises a central pixel of the image.
As explained above, the assigned depth value for the first pixel 108 is related to a cost function for the first pixel 108. The cost function is based on transitions, i.e. the cost value increases when there are more and/or bigger transitions on the path from the first 108 pixel to the second pixel 110. The assigned depth value can be based on one of the following approaches:
To summarize, there is a relation between the cost value and the location of the second pixel 110 and a relation between the assigned depth value and the cost value.
Table 1 shows a number of possible relations between these quantities. In the cases as listed in Table 1, it is assumed that the first pixel 108 is located at the center of image.
A relatively low depth value means that the first pixel is relatively close to the viewer of the multi view image being generated on basis of the image and a relatively high depth value means that the first pixel is relatively far removed from the viewer of the multi view image.
Preferably, the computation of the cost value V(x′,y′) is based on an accumulation of pixel value differences, which are allocated to pixels being located on a path Pi from the first pixel to the second pixel, with i being an index to indicate a particular one of the paths from the pixel with coordinates (x′,y′).
V(x′,y′)=Σ{E(x,y)|(x,y)εPi} (1)
A first example of the computation of a pixel value difference E(x,y) is given in Equation 2:
E(x,y)=|I(x,y)−I(x−a,y−b)| (2)
with, I(x,y) the luminance value of a pixel with coordinates x and y of the image and −1≦a≦1 and −1≦b≦1.
Alternatively, a pixel value difference E(x,y) is computed on basis of color values:
E(x,y)=|C(x,y)−C(x−a,y−b)| (3)
with, C(x,y) a color value of a pixel with coordinates x and y of the image. In Equation 4 a further alternative is given for the computation of a pixel value difference E(x,y) based on the three different color components R (Red) G (Green) and B (Blue).
E(x,y)=max(|R(x,y)−R(x−a,y−b)|,|G(x,y)−G(x−a,y−b)|,|B(x,y)−B(x−a,y−b)|) (4)
Optionally, the pixel value difference signal {right arrow over (E)} is filtered by clipping all pixel value differences, which are below a predetermined threshold, to a constant e.g. zero.
As said, preferably the computation of the cost value V(x′,y′) is based on an accumulation of pixel value differences being allocated to pixels being located on a path Pi from the first pixel to the second pixel. There are several approaches to select this path Pi from a set of paths.
Instead of computing the cost value V(x′,y′) on basis of a single path, the cost value can be based on a combination of paths, e.g. the average cost values may be computed.
A further alternative for computing the cost value V(x′,y′) is based on weighting factors for the various pixel value differences.
V(x′,y′)=Σ{W(j)E(x,y)|(x,y)εPi} (5)
This weighting factor W(j) is preferably related to a spatial distance j between one of the pixels of the pixel pair for which a pixel value difference E(x,y) is being computed and the first pixel. Typically, this weighting factor W(j) is lower for bigger spatial distances.
Alternatively, the weighting factor W(j) is related to an angle between two vectors.
The multi-view image generation unit 400 comprises:
The depth map generating unit 401 for generating depth maps comprising depth values representing distances to a viewer, for respective pixels of the images, comprises:
The computing unit 402 is arranged to provide a cost value signal VF=V(x′,y′,n), with coordinates x′ and y′ of image at time n, which represents per pixel the cost value.
After the computation of the cost value signal VF the depth map is determined. This is specified in Equation 6:
D(x′,y′,n)=F(VF) (6)
with D(x′,y′,n) the depth value of a pixel with coordinates x′ and y′ of image at time n and the function F(j) being a linear or non-linear transformation of a cost value V(x′, y′, n), into a depth value D(x′, y′, n). This function F(j) is preferably a simple multiplication of the cost value V(x′,y′,n) with a predetermined constant:
D(x′,y′,n)=α·V(x′,y′,n) (7)
It should be noted that for the computation of the cost value for a particular pixel the computed cost value for a neighboring pixel could be applied. In other words, the computation of cost values is preferably performed in a recursive way. See also the description in connection with
The cost value computing unit 402, the depth value assigning unit 404 and the rendering unit 406 may be implemented using one processor. Normally, these functions are performed under control of a software program product. During execution, normally the software program product is loaded into a memory, like a RAM, and executed from there. The program may be loaded from a background memory, like a ROM, hard disk, or magnetically and/or optical storage, or may be loaded via a network like Internet. Optionally an application specific integrated circuit provides the disclosed functionality.
It should be noted that, although the multi-view image generation unit 400 as described in connection with
The video signal may be a broadcast signal received via an antenna or cable but may also be a signal from a storage device like a VCR (Video Cassette Recorder) or Digital Versatile Disk (DVD). The signal is provided at the input connector 510. The image processing apparatus 500 might e.g. be a TV. Alternatively the image processing apparatus 500 does not comprise the optional display device but provides the output images to an apparatus that does comprise a display device 506. Then the image processing apparatus 500 might be e.g. a set top box, a satellite-tuner, a VCR player, a DVD player or recorder. Optionally the image processing apparatus 500 comprises storage means, like a hard-disk or means for storage on removable media, e.g. optical disks. The image processing apparatus 500 might also be a system being applied by a film-studio or broadcaster.
With processing a particular pixel is meant:
Computing the particular cost value is based on already computed cost values for other pixels. The following example is provided to illustrate that. Suppose that the depth values corresponding to pixels 604-614 of the first and second row have already been determined, and hence the respective cost values corresponding to respective paths are known. Besides that a number of pixels 602 of the third row have also been processed. Next the depth value for a particular pixel with reference number 600 has to be determined. Preferably, this is done by evaluating the following set of candidate cost values:
After determining the minimum cost value from the set of candidate cost values the path starting from the particular pixel is known, the corresponding cost value is known and the corresponding depth value can be assigned.
It will be clear that sets of candidate cost values typically depend on the scan direction. For instance in the case of a scan direction from the right to the left, the sets of candidate cost values may comprise a candidate cost value which is based on the cost value of a pixel being located right from the particular pixel under consideration. The sets of candidate cost values may comprise additional cost values. Alternatively, the sets of candidate cost values comprise less cost values.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be constructed as limiting the claim. The word ‘comprising’ does not exclude the presence of elements or steps not listed in a claim. The word “a” or “an” preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements and by means of a suitable programmed computer. In the unit claims enumerating several means, several of these means can be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words are to be interpreted as names.
Number | Date | Country | Kind |
---|---|---|---|
04100625 | Feb 2004 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB2005/050482 | 2/7/2005 | WO | 00 | 8/15/2006 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2005/083631 | 9/9/2005 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
20020154116 | Nakatsuka et al. | Oct 2002 | A1 |
20040062439 | Cahill et al. | Apr 2004 | A1 |
Number | Date | Country |
---|---|---|
WO 02095680 | Nov 2002 | WO |
WO2004066212 | Aug 2004 | WO |
Number | Date | Country | |
---|---|---|---|
20070146232 A1 | Jun 2007 | US |