The invention relates to a method and device for providing a layered depth model of a three dimensional scene as well as to a signal comprising a layered depth model of a scene.
Display devices suitable for displaying three-dimensional images are receiving an increasing interest in research. In addition substantial research is undertaken in order to establish how to provide end-users with a satisfying, high quality viewing experience.
Three dimensional (3D) displays add a third dimension to the viewing experience by providing both eyes of a viewer with different views of the scene being watched. This can be achieved by having the user wear glasses to separate two views that are displayed. However, as glasses may be considered inconvenient to the user, it is in many scenarios preferred to use autostereoscopic displays that use means at the display (such as lenticular lenses, or barriers) to separate views, and to send them in different directions where they individually may reach the user's eyes. For stereo displays, two views are required whereas autostereoscopic displays typically require more views (such as e.g. nine views).
In order to effectively support 3D presentation it is important that a suitable data representation of the generated 3D content is used. For example, for different stereo displays the two views are not necessarily the same and an optimal viewing experience typically requires an adaptation of the content data for the particular combination of screen size and viewer distance. The same considerations tend to apply to autostereoscopic displays.
A popular approach for representing three dimensional images is to use one or more layered two dimensional images plus depth representation. For example, a foreground and background image each with associated depth information may be used to represent a three dimensional scene. Within the context of the application the term depth information is used to indicate information indicative of the distance of respective image elements to a viewpoint, or information indicative of the disparity of respective image elements between the two respective viewpoints. As disparity, i.e. the apparent displacement of image elements between left and right eye view, is inversely proportional to depth either representation can be used as input to render views of a layered depth model.
Using an image and depth representation has several advantages in that it allows two dimensional views to be rendered with relatively low complexity and provided an efficient data representation compared to storage of multiple views, thereby reducing e.g. storage and communication resource requirements for three dimensional image (and video) signals. The approach also allows two dimensional images to be generated with different viewpoints and viewing angles than the two dimensional images that are included in the three dimensional representation. Furthermore, the representation may easily be adapted to and support different display configurations.
When rendering a view from a different viewing angle than that represented by the layered images, foreground pixels are shifted depending on their depth. This leads to regions becoming visible that are occluded for the original viewing angle. These regions are then filled out using the background layer, or if suitable background layer data is not available by e.g. repeating pixels of the foreground image. However, such pixel replication may result in visible artifacts. The background information is typically only required around edges of foreground image objects and is accordingly highly compressible for most content.
It is known that a layered depth model of a scene can be simplified from multiple layers into two layers being a top layer and an occlusion layer. In such a model the occlusion layer is used to avoid visual artifacts related to pixel repetition or background interpolation. In prior art systems such layers are provided for some of for all possible occluded regions in view of a set number of viewing angles.
A problem exists in that when such a layered depth model is downscaled this may result in visual artifacts during rendering. Likewise encoding a layered depth model can result in similar artifacts. This is particularly true in situations where the use of occlusion information is triggered implicitly by the content material itself; for example as a result of the presence of depth transitions in the downscaled layered depth model, rather than by means of explicitly coded metadata (which cannot be properly downscaled using text book scaling techniques).
It is an aim of the present invention to mitigate such artifacts resulting from downscaling. This aim is achieved by means of a method of providing a layered depth model of a scene, the layers of the depth model comprising primary view information for a primary view of the scene from a primary viewing direction and occlusion information associated with the primary view information for use in rendering other views of the scene, wherein: the primary view information comprises layer segments of the model which are depth-wise closest with respect to the primary viewing direction, and the occlusion information comprises further layer segments of the model and wherein the occlusion information comprises a safety region adjacent to a depth transition for which occlusion information is provided, and wherein the safety region comprises corresponding segments of the primary view information, and wherein the safety region is located on that side of the respective depth transition which is depth-wise farthest away with respect to the primary viewing direction.
Following resizing or compression of a layered depth model comprising occlusion data, e.g. for storage or transmission reasons, the representation of the layered depth model, and in particular the occlusion data will be affected. As a result a rendering device may not be able to detect a depth variation properly. In particular when the detection of such depth transitions is used to trigger the use of occlusion information this can result in clearly visible artifacts.
These artifacts in itself can be more annoying than using conventional techniques such as pixel repetition, as they are usually not time-stable and as a result stand out due to their “flickering”. Alternatively resizing may result in the rendering device detecting a depth variation where there was no such transition prior to scaling. As a rule however the artifacts resulting from the latter error are less prominent.
The present invention effectively makes sure that in the safety regions, that is in regions outside and adjacent to depth transitions for which occlusion information is available, this occlusion information is made identical to that of the primary view information. As a result the detection of depth transitions in the downscaled signal, which generally involves both depth values comprised in the primary view information and in the occlusion information is improved. As a result the reliability of the triggering of the use of occlusion data on the video content itself is improved.
Typically both the primary view information and the occlusion information comprise both:
Using the occlusion information respective textures which become visible from another viewing angle can be rendered using appropriate displacement of the respective occlusion texture based on the viewing angle and depth information.
In an embodiment a layered depth model comprising more than two layers is collapsed into a two layer model. Such a model in its own right is again a layered depth model. In the latter layered depth model a top layer comprises the primary view information and an occlusion layer comprises the occlusion information. A clear advantage being that the top layer in fact can be used on legacy two-dimensional display devices without further processing. In case the original layered depth model also comprises transparency, the top layer can be a composite layer comprising data of multiple other layers.
In an embodiment the occlusion layer comprises segments of the top layer whenever there are no further layer segments of layers of the model that are with respect to their depth closest to the top layer. As a result the occlusion layer represents a further full image for use in rendering, and there is no need for signaling the presence of occlusion data in the occlusion layer, or run-time composition of a full occlusion layer.
In an embodiment the size of the respective segments of occlusion information in the occlusion layer information is based on a maximum deviation from the viewing angle of the primary view and the depth information in the scene. In another embodiment the size of the respective segments is based on a predetermined width on the inside of a simultaneous depth transitions in both the top layer and composite layer. In yet another embodiment the size of the respective segments is based on a predetermined width on the inside and a further predetermined width on the outside of a depth transitions for which occlusion data is available. Each of these embodiments provides an advantage, in that the first embodiment enables coding of an occlusion layer that allows proper rendering within viewing angle and depth limitations. The second embodiment enables a more storage friendly implementation. The third embodiment allows prevention of compression artifacts such as mosquito noise when coding occlusion information.
In a further embodiment the size of the safety region is based on a predetermined maximum downscale factor. This feature allows a content provider may to determine a range of resolutions wherein downscaling of occlusion information should result in minimum scaling artifacts. For example consider content distributed at a resolution of 1920×1080 pixels, further consider that downscaling resilience is required up to a resolution of 640×480. In this case in order to preserve a 2 pixel wide safety region at the lower resolution a 7 pixel wide safety region is introduced. Likewise the size of the safety region can be based on the granularity of the encoder used to encode the occlusion information.
As downscaling protection using the invention may be applied in both x and y directions differently, the safety region may be chosen differently for each respective direction based on the desired robustness.
In a preferred embodiment the layered depth model further comprises transparency information, at least for the primary view information. In accordance with the embodiment, the transparency values in the safety region adjacent to the depth transition are substantially transparent and the transparency value in the safety region at other end of the safety region being substantially non-transparent. Strictly speaking transparency regions are not required in the safety area for this embodiment as the primary view information and occlusion information are for this embodiment identical. However to make the layered depth model more robust to scaling a smooth transition, with or without additional safety zone is preferably applied. In a further more preferable embodiment the safety region comprises a predetermined number of consecutive transparent pixels adjacent to the depth transition possibly in addition to a gradient from substantially transparent (adjacent to the consecutive transparent pixels), to substantially non-transparent at the end of the safety region removed from the depth transition. By this particular choice of transparency values any opacity/transparency/alpha values of the top layer may be substantially preserved in the downscaling. Similarly, a pre-determined number of consecutive transparency values may be used at the end of the safety regions after the gradient to ensure that values just outside the safety area are not “pulled down” when downscaling.
Alternatively instead of using a gradient between consecutive transparent/non-transparent pixels it may also be possible to use a sharp transition, provided that it is placed on a codec block boundary. The consecutive transparent and/or consecutive non transparent pixels may be used to properly positioning the transition on a boundary. The resulting aligned transition may be coded more efficiently. This in turn may help prevent the introduction of coding artifacts resulting in improved robustness.
In a further advantageous embodiment a depth hole protection region is provided for in the primary view information for a region comprising a depth hole; i.e. a depth transition down followed by a depth transition up, not more than a first threshold number of pixels apart. For such a depth hole preferably the primary view depth information is set to either, one of the high edge depth values of the depth hole, the average of both of the high edge depth values, a gradient between the high edge depth values, or an interpolated segment based on both high edge depth values of the depth hole. In addition the transparency values between the adjacent depth jumps are preferably set to substantially transparent. As a result the depth hole is better protected against downscaling.
In a further advantageous embodiment a depth spike protection region is provided for in the primary view information for a region comprising a depth spike; i.e. a depth transition up followed by a depth transition down, not more than a second threshold number of pixels apart. For such a depth spike preferably the primary view depth information is set to the depth value of the top of the depth spike within the depth spike protection region, and the transparency values in the depth spike protection region outside the depth spike are set to substantially transparent. As a result the depth spike is better protected against downscaling.
The method according to the present invention can be advantageously applied in a method of processing a three dimensional model of a scene, by receiving a three dimensional model of a scene, providing a layered depth model of a scene according to the present invention and processing the three dimensional model of the scene.
The present invention further relates to a signal comprising a layered depth model of a scene, the layers of the depth model comprising: encoded primary view information for a primary view of the scene from a primary viewing direction the primary view information comprising layer segments of the model which are depth-wise closest with respect to the primary viewing direction and encoded occlusion information, associated with the primary view information for use in rendering other views of the scene, wherein the occlusion information comprises further layer segments of the model and wherein the occlusion information comprises a safety region adjacent to a depth transition for which occlusion information is provided, and wherein the safety region comprises corresponding segments of the primary view information, and wherein the safety region is located on that side of the respective depth transition which is depth-wise farthest away with respect to the primary viewing direction.
The present invention further relates to a device for providing a layered depth model of a scene, the layers of the depth model comprising primary view information for a primary view of the scene from a primary viewing direction and occlusion information associated with the primary view information for use in rendering other views of the scene, the device arranged to provide: the primary view information such that it comprises layer segments of the model which are with respect to their depth closest to the viewpoint of the primary view, and the occlusion information such that it comprises further layer segments of the model and wherein the occlusion information comprises a safety region adjacent to depth transition for which occlusion information is provided, and wherein the safety region comprises corresponding segments of the primary view information, and wherein the safety regions is located on that side of the respective depth transitions which is depth-wise farthest away with respect to the primary viewing direction.
These and other aspects, features and advantages of the invention will be apparent from and elucidated with reference to the embodiment(s) described hereinafter.
Embodiments of the invention will be described, by way of example only, with reference to the drawings, wherein like numerals refer to element with like function, in which
The use of layered depth representation for use in rendering new views based on a layered depth representation has attracted researchers over time. In “Layered Depth Images”, Shade et. al, published in Proceedings of ACM SIGGRAPH 1998 storage of a three dimensional model based on a multi-layer depth image is described as well as the rendering of content based thereon.
In “High-quality video view interpolation using a layered representation”, by Zitnick, et al, published in Proceedings of ACM SIGGRAPH 2004, the generation of a layered depth model of a scene is disclosed, wherein the layered depth model comprises occlusion information in the form of boundary color, boundary depth and boundary alpha (opacity). In addition a method of rendering of views from the layered depth model is presented.
The inventors of the present invention realized that in practice when content is distributed using a layered depth mode, it may be required to downscale and/or compress the layered depth model. However when software scalers or hardware scalers are used for doing so which are agnostic of the problem(s) as addressed by the present invention, this generally results in visible artifacts. An example of such artifacts is shown in
The above described problem particularly holds for situations wherein depth information from the layered depth model is being used to trigger the use of occlusion information. In such a situation downscaling of the layered depth model may affect the primary view information and occlusion information in such a manner that depth transitions are no longer properly recognized, or alternatively, are recognized where they were not present prior to the downscaling.
A possible heuristic for coding occlusion information is provided below. It is noted that this is merely exemplary and other trigger mechanisms may be employed. However from this description it will be clear that downscaling of the depth information in both primary view and/or occlusion information may affect the proper triggering of the use of occlusion information.
Occlusion information should be provided for a depth transition when:
The low pixels of the primary view information and occlusion information are similar; e.g. a fixed threshold may be used,
The high pixel in the primary view information of the transition must be significantly larger than the pixel of the occlusion information; e.g. a further fixed threshold may be used and
The high pixel of the primary view information must be significantly larger than the low pixel of the primary view information. (This avoids detecting transitions when the primary view information is uniform and the occlusion information suddenly “drops”. The present invention addresses this downscaling issue in that it provides a solution that allows rendering of the model after downscaling in a manner that results in less artifacts, as is depicted in
When the layered depth image as presented in
For determining the top layer, the camera model used to generate the occlusion layer is a camera placed at infinity with a primary viewing direction indicated by PVD. As a result the top layer in the layered depth image is that layer that at a certain position (x,y) is depth wise closest to the camera, in this case the highest layer.
With regard to the occlusion layer OL it is noted that although in
The above clearly shows that heuristics may play an important role within determining what is included in an occlusion layer and what is not.
The occlusion layers and the depth layers presented in the
It will be clear to the skilled person that the layered depth model comprising the primary view information and the occlusion information OL2 as shown in
The occlusion layer OL as presented in
Although the invention has been described for providing a safety region along a horizontal direction, i.e. along the direction of the x-axis, the present invention can also be advantageously applied along a vertical direction.
For example, when downscaling content provided at a maximum resolution of 1920×1080, which has to be able to be down scaled to 640×480, that is a downscaling factor of 3 in the horizontal direction a 7 pixel wide safety region is preferably introduced in the horizontal direction. It is preferred to have a 2 pixel safety region to prevent coding from affecting layer boundaries. As a result it would theoretically suffice to use a 6 pixel safety region. In practice as a result of e.g. asymmetric filter kernels a 7 pixel wide safety region is preferred. As artifacts in vertical direction for many current applications are less relevant a 3 pixel safety region is applied there.
The ratio of using safety region for the protection of occlusion data is illustrates in
An occlusion layer as described hereinabove may be advantageously used for storing or distribution of content. As a result thereof content which was provided with a so processed occlusion layer may be downscaled using downscaling tools commonly available at the priority date. Such content is also more resilient to coding/decoding. Moreover there is no need to tune such scaling algorithms to cater for particular idiosyncrasies of the actual occlusion format used, instead typically the occlusion information may be downscaled in a manner similar to conventional two-dimensional images. Alternatively, the method described herein may be applied just before downscaling (not necessarily at the content creation side). Or it may be applied at the content creation side with safety margins large enough to protect against coding, but not necessarily against large downscale factors, and then again later with larger safety margins tailored to an upcoming downscale operation.
In
In
The method further comprises providing 510 a layered depth mode of the scene, wherein the layered depth model comprises primary view information for a primary view of the scene from a primary viewing direction and occlusion information associated with the primary view information for use in rendering in other views of the scene. The primary view information in turn comprises layer segments of the model which are depth-wise closest with respect to the primary viewing direction. The occlusion information comprises further layer segments of the model. The occlusion information comprises a safety region adjacent to a depth transition for which occlusion information is provided. The safety region in turn comprises corresponding segments of the primary view information and the safety region is located on that side of the respective depth transition which is depth-wise farthest away with respect to the primary viewing direction.
Optionally the method depicted in
The device 600 further comprises a device 610 for providing a layered depth model 605 of a scene. The layers of the depth model 605 as provided comprise primary view information for a primary view of the scene from a primary viewing direction and occlusion information associated with the primary view information for use in rendering other views of the scene. The device 610 is arranged to provide the primary view information such that it comprises layer segments of the model which are with respect to their depth closest to the viewpoint of the primary view, and the occlusion information such that it comprises further layer segments of the model. The occlusion information comprises a safety region adjacent to depth transition for which occlusion information is provided, and wherein the safety region comprises corresponding segments of the primary view information, and wherein the safety regions is located on that side of the respective depth transitions which is depth-wise farthest away with respect to the primary viewing direction.
Optionally the device further comprises processing means 680 which can be e.g. a general purpose processor, an ASIC or other processing platform, for processing the layered depth model 605. The processing may comprise e.g. downscaling, coding, storing, transmitting and/or alternatively rendering.
The device 610 as presented in
Although the device 610 has been described for processing a three dimensional model of a scene, the three dimensional model of a scene may also be a layered depth model, which in a particularly advantageous case can be a two layer mode comprising a top layer 625 which may also be referred to as composite layer and an occlusion layer 635 as shown in
It will be appreciated that the above description for clarity has described embodiments of the invention with reference to different functional units and processors. However, it will be apparent that any suitable distribution of functionality between different functional units or processors may be used without detracting from the invention. For example, functionality illustrated to be performed by separate units, processors or controllers may be performed by the same processor or controllers. Hence, references to specific functional units are only to be seen as references to suitable means for providing the described functionality rather than indicative of a strict logical or physical structure or organization.
The invention can be implemented in any suitable form including hardware, software, firmware or any combination of these. The invention may optionally be implemented at least partly as computer software running on one or more data processors and/or digital signal processors. The elements and components of an embodiment of the invention may be physically, functionally and logically implemented in any suitable way. Indeed the functionality may be implemented in a single unit, in a plurality of units or as part of other functional units. As such, the invention may be implemented in a single unit or may be physically and functionally distributed between different units and processors.
Although the present invention has been described in connection with some embodiments, it is not intended to be limited to the specific form set forth herein. Rather, the scope of the present invention is limited only by the accompanying claims. Additionally, although a feature may appear to be described in connection with particular embodiments, one skilled in the art would recognize that various features of the described embodiments may be combined in accordance with the invention. In the claims, the term comprising does not exclude the presence of other elements or steps.
Furthermore, although individually listed, a plurality of means, elements or method steps may be implemented by e.g. a single unit or processor. Additionally, although individual features may be included in different claims, these may possibly be advantageously combined, and the inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. Also the inclusion of a feature in one category of claims does not imply a limitation to this category but rather indicates that the feature is equally applicable to other claim categories as appropriate. Furthermore, the order of features in the claims do not imply any specific order in which the features must be worked and in particular the order of individual steps in a method claim does not imply that the steps must be performed in this order. Rather, the steps may be performed in any suitable order. In addition, singular references do not exclude a plurality. Thus references to “a”, “an”, “first”, “second” etc do not preclude a plurality. Reference signs in the claims are provided merely as a clarifying example shall not be construed as limiting the scope of the claims in any way.
Number | Date | Country | Kind |
---|---|---|---|
08167120.8 | Oct 2008 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB09/54543 | 10/15/2009 | WO | 00 | 4/14/2011 |