The invention relates to remapping of a depth map that corresponds to a two-dimensional (2D) content image. The 2D image and the depth map form the basis for rendering a three-dimensional (3D) image that is to be viewed on a 3D display. The remapping maps the depth map from an input depth range to an output depth range of the 3D display.
Literature paper ‘Disparity remapping to ameliorate visual comfort of stereoscopic video’ (Sohn et al, Proc. SPIE 8648, Stereoscopic Displays and Applications XXIV, 86480Y) describes a method for remapping of a disparity map. The disparity map is part of a three-dimensional (3D) image that also comprises a two-dimensional (2D) image corresponding to the disparity map. The disparity map is remapped into a new disparity map such that the 3D image (based on the new disparity map) can be viewed on a 3D display. The remapping is established as follows. First, the method establishes a global remapping curve for mapping the disparity map from an input disparity range to an output disparity range (of the 3D display). Second, the method identifies local salient features based on disparity transitions that cause visual discomfort when viewing the 3D image on the 3D display. The global remapping curve is therefore adapted to the local salient features in order to reduce said visual discomfort. The disparity map is then remapped according to the adapted global remapping curve.
US2012/0314933 discloses image processing that includes estimating an attention region which is estimated as a user paying attention thereto on a stereoscopic image, detecting a parallax of the stereoscopic image and generating a parallax map indicating a parallax of each region of the stereoscopic image, setting conversion characteristics for correcting a parallax of the stereoscopic image based on the attention region and the parallax map, and correcting the parallax map based on the conversion characteristics. Different conversion functions may be used for the attention region and the background.
US2013/0141422 describes a system for altering a property associated with a portion of a three dimensional stereoscopic image. The method includes determining that a portion of a virtual object in a three dimensional image resides at a predetermined position along a first axis relative to the display based on a difference between a left eye image of the portion of the virtual object and a right eye image of the portion of the virtual object. The first axis is perpendicular to a plane of the display.
WO2009/034519 describes receiving depth related information for image data, including receiving metadata relating to a mapping function used in generation of depth-related information.
US2012/0306866 describes 3D-image conversion for adjusting depth information. The conversion includes generating depth information with regard to an input image; detecting an object having parallax exceeding a preset range; and adjusting depth information of the object by adjusting the parallax of the detected object to be within a preset range. Metadata, for example genre or viewing age, may be analyzed in order to adjust generated depth information to be within a predetermined range.
A disadvantage of the prior art is that the adaptability of the global disparity remapping (or ‘retargeting’) to the local features is limited, because all adaptations to the local features need to be accommodated by the same (adapted) global remapping. It is an aim of the invention to overcome the disadvantage of the prior-art by providing a depth remapping that accurately selects and adapts an object in the image without adapting the depth remapping in other parts of the image.
An image processing device is disclosed, arranged for remapping a depth map of a three-dimensional image, the three-dimensional image comprising the depth map and a two-dimensional content image, the depth map having depth pixels configured in a two-dimensional array at locations corresponding to locations of image pixels in the content image, each of the depth pixels having a depth value, the remapping comprising a global remapping function for mapping of depth values of the depth map to new depth values of the depth map, the image processing device comprising a receiving unit for receiving a signal comprising the three-dimensional image and metadata coupled to the three-dimensional image,
the metadata comprising selection criteria based on at least location and depth value for selecting depth pixels corresponding to at least one object in the three-dimensional image, and a processing unit comprising a selection function configured for retrieving, from the metadata, the selection criteria and selecting depth pixels that correspond to at least one object in the three-dimensional image using the selection criteria; a determining function configured for determining a local remapping function for mapping depth values of the selected depth pixels to new depth values; and a mapping function configured for remapping the depth map using the local remapping function for remapping the selected depth pixels and using the global remapping function for depth pixels other than the selected depth pixels.
The three-dimensional (3D) image includes a depth map and a corresponding content image. The depth map comprises depth pixels in a 2D array at respective locations along X and Y axes, each depth pixel having a depth value. Each pixel of the depth map corresponds to a pixel at a corresponding location in the content image. Such a 3D image format is commonly known as ‘image-plus-depth’ or ‘2D+Z’.
Remapping the depth map implies mapping of depth values of respective depth pixels of the depth map to respective new depth values. The remapping comprises at least a global remapping function for remapping the depth map.
The selection function is configured for selecting depth pixels that correspond to an object in the three-dimensional image, using selecting criteria at least based on location and depth value. For example, the selection criteria comprise boundaries in depth and location that include depth pixels corresponding to a foreground object: the selection function selects depth pixels corresponding to the foreground object by selecting the depth pixels residing within the boundaries. Selecting the object based on location and depth value enables accurate selection of the object, such that a high percentage of depth pixels corresponds to that object while selecting a low percentage of depth pixels not corresponding to that object.
Optionally, the selection function comprises an automated process for determining (foreground) objects in the 3D image.
The determining function is configured for determining a local remapping function for remapping the selected depth pixels. The local remapping function is a different remapping function than the global remapping function.
Optionally, the determining function is configured for retrieving the local remapping function from metadata coupled to the 3D image. Optionally, the determining function comprises an automated process for determining the local remapping function, such that depth contrast between the object and another object and/or the background improves.
The remapping function is configured for remapping the depth map using both the local remapping function and the global remapping function. The local remapping function is used for remapping the selected depth pixels, whereas the global remapping function is used for remapping the remaining (not selected) depth pixels.
A method is disclosed for remapping a depth map of a three-dimensional image, the three-dimensional image comprising the depth map and a two-dimensional content image, the depth map having depth pixels configured in a two-dimensional array at locations corresponding to locations of image pixels in the content image, each of the depth pixels having a depth value, the remapping comprising a global remapping function for mapping of depth values of the depth map to new depth values, the method comprising receiving a signal comprising the three-dimensional image and metadata coupled to the three-dimensional image, the metadata comprising selection criteria based on at least location and depth value for selecting depth pixels corresponding to at least one object in the three-dimensional image, retrieving, from the metadata, the selection criteria, selecting depth pixels corresponding to an object in the three-dimensional image, using the selection criteria; and determining a local remapping function for mapping depth values of the selected depth pixels to new depth values; and remapping the depth map using the local remapping function for remapping the selected depth pixels and using the global remapping function for depth pixels other than the selected depth pixels.
A signal is disclosed for use in the image processing device as described above for remapping a depth map, the signal comprising a three-dimensional image and metadata coupled to the three-dimensional image, the three-dimensional image comprising the depth map and a content image, the depth map having depth pixels configured in a two-dimensional array, each of the depth pixels having a depth value and having a location in the two dimensional array corresponding to a location in the content image, the metadata comprising the selection criteria based on at least location and depth value for selecting the depth pixels corresponding to at least one object in the three-dimensional image for mapping depth values of the selected depth pixels to new depth values.
An image encoding method is disclosed for generating metadata for use in the above signal, the method comprising the steps of generating metadata comprising selection criteria based on at least location and depth value for selecting depth pixels corresponding to at least one object in a three-dimensional image for mapping depth values of the selected depth pixels to new depth values, and coupling the metadata to the three-dimensional image.
The invention does not have the said disadvantage of the prior art because the metadata enables accurately selecting depth pixels corresponding to the object by using both location and depth value. The accurate selection of the object consequently enables a local remapping to be applied accurately to the object while a global remapping is being maintained for other parts of the image.
Note that the term ‘accurately’ in this context refers to selecting a high percentage of depth pixels corresponding to that object while selecting a low percentage of depth pixels not corresponding to that object. For example, the high percentage refers to 95-100%, and the low percentage refers to 0-5%. The effect of the invention is that the depth remapping adapts accurately to an (local) object in the 3D image while maintaining a global remapping for other parts of the 3D image.
These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter.
In the drawings,
It should be noted that items that have the same reference numbers in different figures, have the same structural features and the same functions. Where the function and/or structure of such an item has been explained, there is no necessity for repeated explanation thereof in the detailed description
Note that the term ‘remapping a depth map’ means that depth values of the depth map are mapped to respective new depth values.
The depth map MAP is formatted as said 2D array of depth pixels. The depth map MAP comprises depth pixels and is coupled to a (2D) content image comprising content pixels representing content. For example, the content image shows a natural scene and is a photograph or is a video frame of a movie. The combination of the content image and the depth map 101 constitute a three-dimensional (3D) image format that is commonly known as ‘2D+Z’ or ‘2D+depth’.
A depth pixel at a location in the 2D array corresponds to a pixel at a corresponding location in the (2D) content image. If the depth map has the same resolution as the content image, then a content pixel at a certain location in the content image corresponds to a depth pixel at the same certain location in the depth map. If the depth map has a different resolution than the content image, then the content pixel at the location in the content image corresponds to a depth pixel at the same location in the scaled depth map, which is the result of scaling the depth map to the resolution of the content image. Therefore, in the context of this document, referring to a location (or region) in the content image is equivalent to a location in the depth map MAP.
Optionally, the image processing device 100 includes a receiving unit RECVR 150 for receiving a signal comprising a 3D image and metadata to provide the depth map MAP to the processing unit 199. The receiving unit RECVR may receive the 3D image having a depth map and the metadata comprising selection criteria, e.g. from an optical disc, and provide the depth map and the selection critria to the processing unit 199. Having the receiving unit RECVR, the image processing device 100 may act as an optical disc unit.
Optionally, the image processing device 100 includes a display DISP 160 that receives the remapped depth map MAPNEW from the processing unit 199 and renders the 3D image for viewing on the display DISP, based on the remapped depth map MAPNEW. Having the display DISP, the image processing 100 may act as a 3D TV.
The selection function SELFUN selects, from the depth map MAP, depth pixels that meet the selection criteria CRT. Selection function SELFUN obtains the selection criteria CRT, for example, from metadata coupled to the 3D image, and selects the depth pixels accordingly. The selection criteria CRT are based on (at least) depth and location.
The selected (depth) pixels typically correspond to an object in the 3D image. An object is naturally confined to a region of the 3D image. For example, the object corresponds to a floating ball being near the camera that captured the 3D image. When viewing the 3D image on a 3D display, the ball is in the foreground and floats in front of the rest of the scene in the 3D image. The ball is confined not only to the region in the depth map MAP, but is also confined to a limited depth range. The ball can thus be selected using selection criteria that define a 3D bounding box having three sides: (1) a first side along to a horizontal dimension of the 2D location, (2) a second side along a vertical dimension of the 2D location and (3) a third side along a depth dimension, respectively. Effectively, the 3D bounding box is defined in a 3D mathematical space being a location-depth' space. Selecting the ball is done by selecting depth pixels residing inside the bounding box. The advantage of selecting an object, like the ball, on the basis of both depth and location is further explained in what follows.
Foreground object A is surrounded by a circular boundary 221xy, whereas foreground object B is surrounded by a bounding box 231. Depth pixels corresponding to foreground object A may be selected by selecting depth pixels that reside within the circular boundary 221xy. However, such a selection will be inaccurate in the sense that not only depth pixels corresponding to object A will be selected, because art of the background C and the foreground object B are also included by the circle 221xy. Likewise, bounding box 231 will also be inadequate for accurately selecting depth pixels corresponding to foreground B, because bounding box 231 also includes a part of the background C and the foreground object A. Overlap area 250 is a region where (object A's) boundary 220 also includes a part of object B and where (object B's) boundary 230 also includes a part of object A. Therefore, selection criteria such as the boundaries 221xy and 231xy, which are purely based on location, are not adequate for accurately selecting objects A and B in the content image. Note that ‘accurate selection of an object’ in this context refers to selecting a high percentage of depth pixels corresponding to that object while selecting a low percentage of depth pixels not corresponding to that object. For example, the high percentage refers to 95-100%, and the low percentage refers to 0-5%.
Foreground object A is surrounded by an elliptical boundary 221xd, whereas foreground object B is surrounded by a bounding box 231xd (rectangular boundary). Depth pixels corresponding to foreground object A can be selected accurately using the elliptical boundary 221xd, because only pixels of foreground object A are included in the ellipse 221xd. Thus, by selecting depth pixels that reside inside ellipse 221xd, only depth pixels corresponding to foreground object A are selected. Likewise, depth pixels corresponding to foreground object B can be selected accurately using the bounding box 231xd, because only pixels of foreground object B are included in the bounding box 231xd. Thus, by selecting depth pixels that reside inside bounding box 231xd, only depth pixels corresponding to foreground object B are selected. Selection criteria, such as the boundaries 221xd and 231xd, which are based on both location and depth value, are thus adequate for accurately selecting an objection in the 3D image.
XYD space, an object is thus selected using a 3D boundary in XYD space. For accurately selecting the foreground object A, the selection criteria comprise a 3D ellipsoid. Provided that the ellipsoid includes object A in the D-Y plane (not shown) in a similar manner as in the D-X plane (as shown in
The previous paragraphs describe an example of a general case, wherein accurate selection requires selection criteria based on both the 2D location and depth value. However, two particular cases may occur wherein accurate selection does not require the 2D location or requires only one dimension of the 2D location.
In a first particular case of foreground object A and B in
In a second particular case, in analogy to the first particular case, accurate selection of objects A and B requires only criteria based on depth value and one dimension (X or Y) of the location. A requirement for this second particular case would be that objects A and B and background C are separated in depth value and in one dimension (X or Y) of the 2D location.
In contrast, as explained above, it is not possible to accurately select depth pixels of object A (or B) based on only location in a typical case, wherein the boundary 221xy (or 231xy) surrounds object A (or B) with some margin (as illustrated
In summary: in the general case, accurate selection requires selection based on depth value and 2D location; in the first particular case, accurate selection requires selection based on only depth; in the second particular case, accurate selection requires selection based on depth and one dimension of the location.
Various shapes may be used for selecting an object.
Note that, in principle, any shape being a closed volume in the XYD space may be used for selecting an object.
Note that margins between an object and its selection boundaries are preferably not too ‘small but also not too large. A small margin corresponds to a ‘tight fit’ of the selection boundaries around an object, and therefore has a risk that not all depth pixels of the object are included in the boundary and may therefore not be selected. A large margin corresponds to a ‘loose fit’ of the selection boundaries around the object (e.g. ellipsoid 331) and has a risk that depth pixels of other objects or the background are included and may therefore not be selected.
Again, note that graph 360 presents a two-dimensional view and that the generalized case of
As a variant to
Optionally, the selection function SELFUN uses an automated process for determining objects A and B without using boundaries in XYD space retrieved from metadata. The automated process uses a clustering algorithm to determine groups of depth pixels forming large clusters in the XYD space. A group of depth pixels that form a cluster have, by definition, a similar position in the XYD space. From
The clustering algorithm used in the selection function may be a text-book clustering algorithm, such as the so-called K-means clustering algorithm (e.g. J. A. Hartigan (1975), ‘Clustering algorithms’, John Wiley & Sons, Inc.). Other commonly known clustering algorithms for searching clusters in a multi-dimensional space may also be used.
In addition to said similar position, the clustering technique may also determine a cluster using additional properties, such as similarity in color or structure. The color or structure associated to a depth pixel at a location in the depth map is retrieved from a corresponding location in the (content) image. For example, if object A corresponds to a smooth red ball then depth pixels of object A will not only be confined to a limited XYD space in the depth map, but the corresponding pixels in the content image will also be red and be part of a smooth region. (Note that by using two-dimensional location, depth, color and structure, the clustering algorithm effectively searches clusters in a five-dimensional space). Using the additional properties improves the accuracy and robustness of the clustering algorithm.
Note that the previous embodiment using an automated process for selecting depth pixels is consistent with previous embodiments, in the sense that depth pixels are selected using selection criteria based on location and depth value. Clusters of depth pixels are determined in the XYD space or location-depth space', and are thus based on location and depth value. Depth pixels are selected if they meet the criterion of belonging to the determined cluster in the XYD space.
The global remapping function 440 maps the background C from the input depth range 411 onto the lower end of the output depth range 412. In contrast, local remapping function 420 maps object A to the far upper end of the output depth range 412. Local remapping function 430 maps foreground object B to an intermediate part of the output depth range 412. The local remapping functions 420 and 430 are applied to the accurately selected depth pixels that corresponded to object A and B, respectively. The global remapping function 440 is applied to accurately selected depth pixels that correspond to background C, which are all depth pixels in depth map 210 excluding the selected depth pixels of objects A and B.
Determining function DETFUN may determine the local remapping functions 420 and 430 by retrieving data in the form of remapping parameters from metadata coupled to the 3D image. The remapping parameters define the local remapping functions 420 and 430. For example, remapping parameters that define the local remapping function 420 are the depth range 421 and the slope of the straight line 420.
Various types of curves may represent a local or global remapping function. The curve may be linear, as shown in
The remapping functions 420-430 may be created in an artistic off-line process by video editing experts who design the remapping functions such that the depth perception is aesthetically pleasing when viewing the 3D image on a 3D display.
Alternatively, the remapping functions are determined by an automated process that is performed by the determining function DETFUN running on the (processing unit 199 of) image processing device 100. The automated process for determining the local remapping functions 420 and 430 may work according to an algorithm that increases a depth contrast between object A, object B and background C. Having received selected depth pixels from the preceding selection function SELFUN (the selected depth pixel corresponding to objects A and B and background C, the algorithm assesses the depth ranges that include objects A and B and background C, respectively. As a result, the algorithm determines that object A, object B and background C are included in depth ranges 421, 431 and 441, respectively. Next, the algorithm maps the depth ranges 421, 431 and 441 onto the output depth range 412, by using the full output depth range 412 while creating maximum depth contrast between object A, object B and background C. To that end, object A is remapped to the upper end of the output range 412, and object B is remapped to an intermediate range in between (a) the lower part of the output range 412 that includes the remapped background C and (b) the upper part of the output range 412 that includes the remapped object A. In this example, the slope of the remapping curves 420, 430 and 440 is maintained the same.
Depth contrast between, for example, object A and background C be quantified as follows.
Before remapping, depth values (of depth pixels) corresponding to object A are in depth range 421. The depth pixels of object A have depth values that are, on average, at approximately 0.7 (70%) of the input depth range 411. Likewise, depth values corresponding to background C in depth range 441, thus are on average at approximately 0.1 (10%) of the depth range 411. Consequently, the depth contrast between object A and background C before remapping is 0.7-0.1=0.6.
After remapping the situation is as follows. Depth values of object A are remapped by local remapping function 420 to output depth range 412: new depth values of object A are, on average, at approximately 0.9 (90%) of the output depth range 412. Likewise, new depth values of background C (remapped using local remapping function 440) are, on average, at approximately 0.1 (10%) of the output depth range 412. Consequently the depth contrast between object A and background C after remapping is 0.9-0.1=0.8. The depth contrast between object A and background C has thus increased from 0.6 to 0.8, as a result of the remapping
A similar quantification holds for a depth contrast between object B and background C and for a depth contrast between object B and object A. One can infer from
As a variant to the previous embodiment, the automated process (performed by the determining function) determines a local remapping function for remapping object A such that the depth contrast between object A and background C increases by a fixed factor, for example by 0.15. The depth contrast after remapping then becomes 1.15×0.6=0.69. As mentioned above, new depth values of background C are at about 0.1 of the output depth range 412. The local remapping function 420 then needs to be shifted vertically in
Optionally, the global remapping function is also determined by the automated process. For example, in the case that depth pixels corresponding to the background have depth values in not only input depth range 441 but also in depth range 431 (i.e. the depth range of object B), the global remapping function 440 may be adapted such that it has a lower slope than indicated in
Note that, in the context of the current invention, ‘remapping an object’ refers to ‘remapping the depth values of the depth pixels corresponding to the object’. Likewise, ‘remapping the depth pixels’ refers to ‘remapping the depth values of the depth pixels.’.
An application of the image processing device 100 is remapping of the depth map in order to prepare the 3D image for being viewed on a 3D display. The 3D display is, for example, a multi-view autostereoscopic display. The 3D display typically has a limited disparity range. Depth and disparity are similar in a qualitative sense.
Disparity is defined as follows: a large disparity corresponds to an object appearing near a viewer, and a small disparity corresponds to an object appearing—far away from the viewer (zero disparity corresponds to infinitely far away). Thus, when shown on the 3D display, an object appearing in front of the plane of the display corresponds to large disparity values, and an object appearing behind the plane of the 3D display corresponds to small disparity values. The plane of the 3D display corresponds to a specific disparity value, which will be referred to as the ‘display disparity value’ below.
For rendering the 3D image on the 3D display, the depth map needs to be converted to disparity. The conversion is based on some definitions between depth and disparity. The definitions concern zero depth, minimum- and maximum depth, and the position of a viewer relative to the plane of the 3D display. A common choice is to define zero depth as corresponding to the plane of the 3D display, so that a positive depth value corresponds to a position in front of the plane of the 3D display and a negative depth value corresponds to a position behind the plane of the 3D display. The relation between depth and disparity is further defined by choosing a maximum and minimum disparity that corresponds to the minimum- and maximum disparity, respectively. A common definition for the position of the viewer relative to the plane of the 3D display is a typical viewer position (for example, the viewer being in a living room and watching his 3D display having a 55″ diagonal is typically at 3-to-4 meters in front of the 3D display. Finally, depth is then converted to disparity based on a curve defined by the definitions in this paragraph.
When the 3D image is to be rendered for viewing on a 3D display, the depth map thus needs to be converted to a disparity map, using a curve as described in the previous paragraph. This depth-to-disparity conversion may be combined with remapping a depth map according to three scenarios: (1) the depth map is remapped, and the remapped depth map is then converted to a disparity map, or (2) the curves for the depth remapping and for depth-to-disparity conversion are integrated in to a single curve, or (3) the depth map is converted to a disparity map, and the disparity map is subsequently remapped according to a disparity remapping curve. The disparity remapping curve may be derived by applying the depth-to-disparity conversion to the depth remapping curve itself.
When the 3D display has a limited disparity range, an object may appear ‘flattened’ in the depth direction when shown on the 3D display. This occurs when a relatively large depth range is mapped to a relatively small disparity range. For example, a ball defined as a perfectly round ball in the location-depth space would then appear on the 3D display as a ball squashed in the depth direction, becoming an ellipsoid rather than a sphere. The local remapping function used to remap the depth values of the ball may be defined to compensate for the flattening. For example, the object A in
As an example, object B corresponds to a logo in the content image. For the purpose of legibility, object B is to be remapped such that it is viewed in the plane of the 3D display. To that end, the determining function determines the local remapping function 430 such that object B is remapped to depth values near zero (corresponding, in this case, to the plane of the 3D display). The latter is actually the case in
The global remapping function may be established in different ways. Optionally, the processing unit 199 applies a pre-determined global remapping function. Optionally, the global remapping function is included in metadata coupled to the 3D image. Optionally, both the global remapping function and the local remapping functions are included in metadata coupled to the 3D image.
Optionally, the image processing device 100 receives the 3D image from an image encoding device via a network link The image encoding device sends a signal comprising the 3D image to the image processing device 100. Optionally, the signal further comprises metadata containing selection criteria for selecting, for example, object A in the 3D image. The metadata is thus coupled to the 3D image. For example, the metadata comprises a 3D bounding box (i.e. in XYD-space) for selecting object A. Optionally, the signal further comprises the local remapping function 420 for remapping the depth pixels corresponding to object A. Note that the image processing device 100 effectively acts as an image decoding device by receiving and using the signal from the image encoding device.
Optionally, the signal sent by the image encoding device comprises a 3D video sequence, i.e. a 3D movie. The 3D video sequence comprises (3D) video frames, wherein each video frame comprises a 3D image. Optionally, the signal comprises, for each 3D image (thus each video frame), metadata coupled to the 3D image, in a similar way as described in the previous paragraph.
Optionally, the signal comprises the metadata only once every N video frames, wherein N=12 for example. Similar as above, the metadata may comprise a 3D bounding box for selecting object A. However, object A is generally not static but may move throughout the 3D video sequence, i.e. the location of object A changes. In order to select and remap object A for each video frame, a 3D bounding box is needed for each video frame. To obtain a 3D bounding box for each video frame, (the processing unit 199 of) the image processing device 100 tracks object A by using motion vectors that describe the movement of object A the video frames or between every N video frames. Knowing the location of the 3D bounding box at the first of the N video frames, the bounding box for the next frames is obtained by moving (the location of) the bounding box according to the motion vectors. Optionally, the motion vectors are also included in the signal comprising the 3D video sequence. Optionally, the motion vectors are obtained by applying a motion estimator to the video sequence. Optionally, the motion vectors indicate 3D-motion in the XYD-space, thus in the terms of location as well as in the depth dimension.
As an alternative to using motion vectors, the processing unit 199 may apply alpha blending between two subsequent bounding boxes to obtain a bounding box at each video frame. This works as follows. The processing unit 199 first retrieves from the signal two subsequent 3D bounding boxes from the 3D video sequence: one bounding box corresponding to video frame 1 and the second bounding box corresponding to video frame N+1. Both 3D bounding boxes correspond to the same object, but at different video frames. If a specific corner of the 3D bounding boxes
has coordinate R1=(X1,Y1,D1) at frame 1 and
has coordinate RN+1=(XN+1,YN+1,DN+1) at frame N+1, it then
has coordinate Rk=αR1+(1−α) RN+1 at an intermediate frame k,
where α=(N+1−k)/N and 1<k<N+1. Note that the coordinates are in the three-dimensional XYD space. The same alpha blending needs to be applied to other corners of the 3D bounding box in order to obtain the coordinates of all corners of the 3D bounding box at frame k. Note that the coordinates of the 3D bounding box are thus effectively interpolated between frames.
Analogously, the processing unit 199 may also use alpha blending to obtain a global remapping function at the intermediate frame k. For example, if the global remapping function
at frame 1 is G1(D), and
at frame N+1 is GN+1(D), then
at frame k it is Gk(D)=G1(D)+(1−α) GN+1(D),
where α and k are as above, and variable D represents depth. An analogous procedure may obviously be applied to interpolate a local remapping function.
Note that the previous embodiments use a bounding box for selecting objects. Other shapes or combinations of shapes may also be used for selecting objects, as mentioned above in this description.
Optionally, in the case (above) of the signal comprising a 3D video sequence, the signal includes for each video frame (or for each N video frames) multiple bounding boxes for selecting respective multiple objects, respective multiple local remapping functions, and a global remapping function.
Optionally, the image encoding device applies a video compression technique to encode the 3D video sequence. The compression technique may be based on H.264, H.265, MPEG-2 or MPEG-4, for example. The encoded 3D video sequence may be configured in so-called GOP-structures (Group Of Pictures). Each GOP structure includes boundaries for selecting foreground objects and local and global remapping functions for remapping the foreground objects and the background, respectively. The image processing device 100 (in particular its processing unit 199) is arranged to receive and decode the encoded 3D video sequence and retrieve the 3D image, the boundaries and the local/global remapping functions.
Optionally, the image encoding device composes the signal by generating metadata for a given three-dimensional image. For example, the boundaries for selecting an object at a decoder side (e.g. the image processing device 100) are determined by the image encoding device by (a) automatically determining a foreground object and (b) fitting a shape like a bounding box or an ellipsoid around the determined object. Automatically determining the foreground object (and selecting the corresponding depth pixels) may be done using an embodiment described above, wherein an automated process using a clustering algorithm determines a foreground object. Fitting, for example, a bounding box around the selected depth pixels may be done by determining the ranges of the selected depth pixels (in X, Y and D dimension) and fitting the bounding box based on the ranges.
Optionally, the image encoding device generates metadata including a local and/or global remapping function. The local/global remapping function may be the automated process described above, based on increasing the depth contrast between foreground object(s) and a background.
Combining the previous two paragraphs, the image encoding device may thus automatically determine a boundaries for selecting foreground objects and the background, determine automatically the local/global remapping functions, include the determined boundaries and the determined local/global remapping functions in the metadata, and include the metadata in the signal.
Alternatively, the image encoding device composes the signal by wrapping the given three-dimensional image and corresponding given metadata together in the signal.
A image processing method is disclosed in analogy to the image processing device 100. The image processing method performs the selecting, the determining and the remapping in the same manner as performed by the selection function, the determining function and the remapping function of the image processing device 100, respectively.
Furthermore, an image encoding method is disclosed in analogy to the image encoding device as described above: the image encoding method performs the steps of the image encoding device for generating the signal, in particular the metadata.
This image processing method and/or image encoding method may be used in the form of a computer program that instructs a processor to perform the steps of the respective method. The computer program may be stored on a data carrier, such as a DVD, CD, or a USB-stick. The computer program product may run on a personal computer, a notebook, (as an app on) a smartphone, or on an authoring system
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims.
In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. Use of the verb “comprise” and its conjugations does not exclude the presence of elements or steps other than those stated in a claim. The article “a” or “an” preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the device claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
Number | Date | Country | Kind |
---|---|---|---|
13188429.8 | Oct 2013 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2014/071948 | 10/14/2014 | WO | 00 |