The invention relates to the technical field of encoding of visual data in a layer depth format.
Layered depth image (LDI) is a way to encode information for rendering of three dimensional images. Similarly, layered depth video (LDV) is a way to encode information for rendering of three dimensional videos.
LDI/LDV uses a foreground layer and at least one background layer for conveying information. The background layer is called occlusion layer, also. The foreground layer comprises a main colour image/video frame with associated main depth map. The at least one background layer comprises a background colour image/video frame with associated background depth map. Commonly, the occlusion layer is sparse in that it only includes image content which is covered by foreground objects in the main layer and corresponding depth information of the image content occluded by foreground objects.
A way to generate LDI or LDV is to capture a same scene with two or more cameras from different view points. The images/videos captured by the two cameras are then warped, i.e. shifted, and fused for generating the main image/video which depicts the same scene from a central view point located in between the different view points.
Further, the main depth map associated with the main image/video frame can be generated using the two captured images/video frames. The main depth map assigns a depth value, a disparity value or a scaled value homogeneous with disparity to each pixel of the main image/video frame wherein the disparity value assigned is inversely proportional to the distance of an object, to which the respective pixel belongs, from a main image plane.
According to prior art, the foreground layer and the background layer are of the same horizontal width. The inventors recognized that this same size does not allow to convey all the information provided in the images/videos captured by the at least two cameras.
Therefore, the inventors propose a non-transitory storage medium carrying at least one encoded layered depth image/video frame wherein at least one occlusion layer of the layered depth image/video frame has a greater horizontal width than a foreground layer of the layered depth image/video frame wherein the horizontal width of the occlusion layer is proportional to a maximum disparity value comprised in lateral boundary areas of a main depth map comprised in the foreground layer, the lateral boundary areas consisting of a predetermined number of outermost columns of the main depth map.
And, the inventors propose a method for layered depth image/video frame encoding, said method comprising encoding at least one occlusion layer of the layered depth image/video frame with a greater horizontal width than a foreground layer of the layered depth image/video frame wherein the horizontal width of the occlusion layer is proportional to a maximum disparity value comprised in lateral boundary areas of a main depth map comprised in the foreground layer, the lateral boundary areas consisting of a predetermined number of outermost columns of the main depth map.
Similarly, a device for layered depth image/video frame encoding is proposed, said device being adapted for encoding at least one occlusion layer of the layered depth image/video frame with a greater horizontal width than a foreground layer of the layered depth image/video frame wherein the horizontal width of the occlusion layer is proportional to a maximum disparity value comprised in lateral boundary areas of a main depth map comprised in the foreground layer, the lateral boundary areas consisting of a predetermined number of outermost columns of the main depth map.
The additional horizontal width can be used for conveying is the part of information which is provided in the images/videos captured by the at least two cameras but not comprised in the foreground layer.
The features of further advantageous embodiments are specified in the dependent claims.
Exemplary embodiments of the invention are illustrated in the drawings and are explained in more detail in the following description. The exemplary embodiments are explained only for elucidating the invention, but not limiting the invention's disclosure, scope or spirit defined in the claims.
In the figures:
The invention may be realized on any electronic device comprising a processing device correspondingly adapted. For instance, the invention may be realized in a mobile phone, a personal computer, a digital still camera system, or a digital video camera system.
In LDI/LDV, such exemplary depth map Mdm is associated with an exemplary image. For each pixel in the exemplary image there is a value in the exemplary depth map. The set of map and image is called a layer. If the layer is the foreground layer, also called the main layer, the image is called the foreground image and is fully populated with pixels. The associated depth map is called main depth map Mdm in the following.
In an exemplary embodiment the main depth map Mdm and the associated foreground image CV result from processing of two views LV, RV. As shown in
Under these conditions, disparity d of an object located a depth z is given by:
d=h−f*b/z (1)
Where h emulates the sensor shift required to tune the position of the convergence plane. As said previously, if no processing is applied, the convergence plane is located at an infinite distance and h is equal to zero. As exemplarily depicted in
h=f*b/z_conv (2)
In case the main depth map Mdm comprises a scaled value D homogeneous with disparity d, the relation among the two can be
D=255*(d_max−d)/(d_max—d_min) (3)
In case of scaled values comprised in the main depth map, either the parameters d_max and d_min are transmitted as metadata or corresponding depth values z_near and z_far are transmitted wherein
z_near=f*b/(h—d_max) (4)
and
z_far=f*b/(h−d_min) (5)
in accordance with equation (1).
The exemplary embodiment is chosen for explanation of the gist of the invention, only. The invention can be applied to multi-camera-systems with cameras with non-parallel optical axes, for instance by transforming the images captured by such cameras into corresponding virtual images virtually captured by virtual parallel optical axes cameras. Furthermore, the invention can be adapted to non-rectified views and/or more than two cameras. The invention further does not relate to how the foreground layer image or the main depth map has been determined.
The exemplary embodiment comprises determining, within neighbourhood areas Nkl, Nkr of the lateral borders vbl, vbr of the main depth map Mdm, the most close by object which corresponds to determining the smallest disparity min(d). Since disparity is negative for objects located in front of the convergence plane, this corresponds to determining the largest absolute among the negative disparities in the neighbourhood areas of the lateral borders.
In case the main depth map Mdm comprises scaled values homogeneous with disparity, |min(d)| can be determined from a maximum scaled value max(D) in the main depth map
Mdm using the parameters transmitted as metadata. In case d_max and d_min are transmitted this is done according:
|min(d)|=|d_max−max(D)*(d_max−d_min)/255| (6)
In case z_near and z_far are transmitted, |min(d)| can be determined using equations (4), (5) and (6).
In case z_conv is undetermined, |(min(d)−h)| is determined.
The determined largest absolute among the negative disparities in neighbourhood areas Nkr, Nkl of both lateral borders vbl, vbr is the additional width by which the occlusion layer image EOV and/or the occlusion layer depth map has to be extended on both sides in order to allow all information not comprised in the foreground image but provided by the two views to be conveyed.
The width of the neighbourhood areas can be chosen differently. For instance, the neighbourhood areas can consist of the outmost columns C[0], C[n] only. Or, for sake of robustness the neighbourhood areas can consist of eight columns on each side C[0], . . . C[7], and C[n−7], . . . , C[n]. Or, for sake of exhaustiveness the neighbourhood areas are chosen such that they cover the entire main depth map such that the largest absolute among all negative disparities comprised in the main depth map is determined.
In the latter case, instead of the determined largest absolute a reduced value can be used. The reduced value compensates the largest absolute among the negative disparities by the distance of the column in which the largest absolute from the respective nearest lateral border. That is, given the largest absolute among the negative disparities is |min(d)| and was found in column j of a main depth map of width n, the occlusion layer is extended on both sides by (|min(d)|−min(j;n+1−j)). So, the width of the occlusion layer image EOV and/or the occlusion layer depth map is n+2*(|min(d)|−min(j;n+1−j)). As exemplarily depicted in
In case of LDV, the occlusion extension can be determined for each frame independently. Or, groups of frames or the entire video are analysed for the largest absolute among the negative disparities in the neighbourhood areas of the lateral borders of the respective frames and the determined largest absolute is then used to extend the occlusion layers of the respective group of frames or the entire video.
The analysis for the largest absolute among the negative disparities in the neighbourhood areas of the lateral borders can be performed at decoder side the same way as at encoder side for correct decoding of the occlusion layer. Or, side information about the extension is provided. The former is more efficient in terms of encoding, the latter requires less computation at decoder side.
Number | Date | Country | Kind |
---|---|---|---|
10306300.4 | Nov 2010 | EP | regional |