The invention relates to the domain of image or video processing and more specifically to the processing of three-dimensional (3D) images and/or video comprising an embedded object. The invention also relates to the domain of estimation of disparity and image interpolation.
According to the prior art, it is known to add the information to a video stream of images generated by capture using a camera or by image synthesis via computer. The information added corresponds for example to a logo appearing in a given part of images of the video stream, to sub-titles illustrating speech between the personalities of the video stream, to text describing the content of images of the video stream, or to the score of a match. This information is generally added in post-production by embedding on the original images, that is to say on the images originally captured using the camera or via image synthesis. This information is advantageously embedded in such a way that it is visible when the video stream is displayed on a display device, that is to say that the video information of pixels of the original images are modified by an item of video information enabling the information to be embedded to be displayed.
In the case of a 3D image video stream, for example a video stream of stereoscopic images, each stereoscopic image is composed of a left image representing the scene filmed or synthesized according to a first viewpoint and a right image representing the same scene but filmed or synthesized according to a second viewpoint offset according to a horizontal axis of a few centimetres (for example 6.5 cm) with respect to the first viewpoint. When information must be embedded (or inlayed or encrusted) for display in the stereoscopic image, the information is embedded in the right image and the same information is embedded in the left image replacing the video information of pixels originating in left and right images with video information enabling the information to be embedded to be displayed. Generally, the information to be embedded is added to the stereoscopic image in a way so that it is displayed in the image plane during the display of the stereoscopic image so that this embedded information is clearly visible to all spectators. To do this, the information to be embedded is embedded (or inlayed or encrusted) in the left and right images of the stereoscopic image with a null disparity between the left image and the right image, that is to say that the pixels for which the video information is modified to display the information to be embedded are identical in the left image and the image, that is to say that they have the same coordinates in each of the left and right images according to a reference common to each left and right image. One of the problems engendered by such an embedding (or inlaying or encrusting) is that the embedded information may replace pixels in each of the left and right images associated with a video content, that is to say a stereoscopic image object, for which the disparity is for example negative, that is to say for which the disparity is such that the object will be displayed in the foreground during the display of the stereoscopic image. In fact, during the display of the stereoscopic image, the embedded information for which the associated disparity is null will appear in front of an object for which the associated disparity is negative while if it is attached purely and simply to the disparities associated with the embedded information and the object, the object should appear in front of the embedded information. This problem causes more specifically errors when the processes of estimation of disparity or interpolation of images are applied to the stereoscopic image.
Such a conflict between the video associated with the embedded information and the associated disparity is shown in
The left 220 and right 230 images shown in
A conflict problem occurs if the object 20 is simply embedded, by superposition onto the content of images, to be always visible, and if it is placed at the same positions as previously in the 2 images that is to say that it appears further away than the object 21. As a result, it appears in front of the object 21 as it occludes it, but behind this object in terms of distance.
The left 221 and right 231 images respectively show the left viewpoint 22 and the right viewpoint 23 of the 3D environment 2. In this case, there is conflict between the disparity information associated with the objects and video information associated with the same object. The depth associated with the embedded object is greater than the depth associated with the first object, the disparity associated with the embedded object 20 being null (as this appears clearly with respect to the images 221 and 231 as the position of the embedded object 200 is identical on each of these images, that is to say the position of the pixels associated with the representation of the embedded object 200 according to the horizontal axis is identical in the two images, there is no horizontal spatial offset between the representation of the embedded object 200 in the left image 221 and the representation of the embedded object 200 in the right image 231) and the disparity associated with the first object 21 being null negative (as appears clearly with respect to images 221 and 231 as the position of the first object 210 is offset according to a horizontal axis between the left image 221 and the right image 231, that is to say the position of the pixels associated with the representation of the first object 210 according to the horizontal axis is not identical in the two images, the first object appearing more to the right in the left image 221 than in the right image 231). Relating to video information associated with the pixels of left and right images, it appears clearly that the video information associated with the pixels associated with the embedded object 200 corresponds to the video information associated with the embedded object 200, without taking account of the disparity information. The embedded object thus appears in the foreground of the left image 221 and the right image and partially occludes the first object. At the display of the stereoscopic image comprising the left 221 and right 231 image, there is a display fault as the disparity information associated with the first object 21 and the embedded object 20 are not coherent with the video information associated with these same objects. Such an implementation example also poses problems when the disparity between the left image and the right image is estimated based on a comparison of video values associated with the pixels of the left image and the pixels of the right image, the objective being to match any pixel of the left image with a pixel of the right image (or inversely) in order to deduce the horizontal spatial offset representative of the disparity between the two matched pixels.
The purpose of the invention is to overcome at least one of these disadvantages of the prior art.
More specifically, the purpose of the invention is particularly to reduce the display faults of an object embedded in a stereoscopic image and to render coherent the video information displayed with the disparity information associated with the embedded object.
The invention relates to a method for processing a stereoscopic image, the stereoscopic image comprising a first image and a second image, the stereoscopic image comprising an embedded object, the object being embedded onto the first image and onto the second image while modifying the initial video content of pixels of the first image and of the second image associated with the embedded object. In order to reduce the display faults of the embedded object and provide coherence between the video information and the depth associated with the embedded object, the method comprises steps for:
Advantageously:
According to an additional characteristic, membership of the group to the embedded object is determined by comparison of at least one property associated with the group and to the pixels of the embedded object, the at least one property belonging to a set of properties comprising:
According to a particular characteristic, the method comprises a step of detection of the position of the embedded object based on the stationary aspect of the embedded object over a determined time interval.
Advantageously, the method comprises a step of detection of the position of the embedded object based on the at least one property associated with the embedded object the at least one property associated with the embedded object belonging to a set of properties comprising:
Advantageously, the method also comprises a step of determination of an item of disparity information representative of disparity between the first image and the second image on at least one part of the first and second images comprising said embedded object.
According to another characteristic, the assigning of a depth to the embedded object is carried out via horizontal translation of pixels associated with the embedded object in at least one of the first and second images, an item of video information and an item of disparity information being associated with the pixels of the at least one of the first and second images uncovered by the horizontal translation of pixels associated with the embedded object by spatial interpolation of video information and disparity information associated with the neighbouring pixels of uncovered pixels.
The invention also relates to a module for processing a stereoscopic image, the stereoscopic image comprising a first image and a second image, the stereoscopic image comprising an embedded object, the object being embedded onto the first image and onto the second image while modifying the initial video content of pixels of the first image and of the second image associated with the embedded object, the module comprising:
The invention also relates to a display device comprising a module for processing a stereoscopic image.
The invention will be better understood, and other specific features and advantages will emerge upon reading the following description, the description making reference to the annexed drawings wherein:
in which
Zp is the perceived depth (in meters, m),
P is the parallax between the left and right images (in meters, m),
d is the transmitted disparity information (in pixels),
te is the inter-ocular distance (in meters, m),
Zs is the distance between the spectator and the screen (in meters, m),
Ws is the width of the screen (in meters, m),
Ncol is the number of columns of the display device (in pixels).
Equation 2 enables a disparity (in pixels) to be converted into parallax (in metres).
Advantageously, the analysis is based on the stationary aspect 411 of the embedded object 200, that is to say that the analysis consists in searching in the images 221, 231 for parts that do not vary in time, that is to say the pixels of images for which the associated video information does not vary in time. The analysis is carried out over a determined time interval or on a number (greater than 2) of temporally successive left images and on a number (greater than 2) of temporally successive right images (corresponding to a temporal filtering 413 over a plurality of images). The number of successive or images (left or right) where the time interval during which the embedded object is searched for advantageously depends on the type of object embedded. For example, if the embedded object is of logo (for example the loge of a television channel broadcasting stereoscopic images) type, the analysis is carried out on a high number of successive images (for example 100 images) or over a significant duration (for example 4 seconds) as a logo is generally called to be displayed permanently. According to another example, if the embedded object is of subtitle type, that is to say an object for which the content varies rapidly in time, the analysis is carried out over a time interval less than (for example 2 seconds) that for a logo or on a number of images (for example 50) less than the number of images for a logo.
According to a variant, the analysis is based on metadata 412 associated with the left and right images, metadata added for example by an operator during the embedding of the object in the original left and right images. The metadata comprise information providing indications to the video analysis engine to target its research, the indications being relative to properties associated with the embedded object, for example information on the approximate position of the embedded object (for example of type left upper corner of the image, lower part of the image, etc.), information on the precise position of the embedded object in the image (for example coordinates of a reference pixel of the embedded object, for example the upper left pixel), information on the form, the colour and/or the transparency associated with the embedded object.
Once the position of the embedded object is detected, masks 414 of left and right images are advantageously generated, the mask of the left image comprising for example a part of the left image comprising the embedded object and the mask of the right image comprising for example a part of the right image comprising the embedded object.
Then, during step 42, the disparity between the left image and the right image (or conversely between the right image and the left image) is estimated. In an advantageous but non-restrictive way, the disparity between the two images is estimated over only a part of the left image and a part of the right image, that is to say a part surrounding the embedded object 200 (for example a box surrounding n×m pixels around the embedded object). Achieving the estimation over only a part of images containing the embedded object offers the advantage of limiting the calculations. The realise the estimation over the totality of images offers the assurance of not losing information, that is to say offers the assurance of having an estimation of the disparity for all of the pixels associated with the embedded object and other objects of the stereoscopic image occluded or partly occluded by the embedded object. The disparity estimation is carried out according to any method known to those skilled in the art, for example by pairing pixels of the left image with pixels of the right image and comparing the video levels associated with each of the pixels, a pixel of the left image and a pixel of the right image having a same video level being paired and same spatial offset according to the horizontal axis (in number of pixels) supplying the disparity information associated with the pixel of the left image (if interested by the disparity map of the left image with respect to the right image for example). Once the estimation of disparity has been carried out, one or several disparity maps 421 are obtained, for example the disparity map of the left image with respect to the right image (providing disparity information representative of the disparity between the left image and the right image) and/or the disparity image of the right image with respect to the left image (providing information of the disparity representative of the disparity between the left image and the right image) and/or one or several partial disparity maps providing disparity information between the part of the left image (respectively the part of the right image) comprising the embedded object with respect to the part of the right image (respectively the part of the left image) comprising the embedded object.
Then, during a step 43, the occlusions in the left image and in the right image are detected.
During a step 44, the disparity information associated with the pixels of parts occluded in the left image and/or the right image is estimated. The estimation of disparity to be associated with the pixels occluded in the left image and/or right image is obtained according to any method known to those skilled in the art, for example by propagating the disparity information associated with the neighbouring pixels of occluded pixels to these occluded pixels. The determination and association of disparity information with the occluded pixels of left and right images is advantageously realised based of the disparity maps 421 estimated previously and on occlusion maps clearly identifying the occluded pixels in each of the left and right images. New disparity maps 441 (called enriched disparity maps) more complete than the disparity maps 421, as they contain an item of disparity information associated with each pixel of left and right images, or are thus obtained.
During a step 45, the stereoscopic image, that is to say the left and/or right image composing it, is synthesized by modifying the disparity associated with the embedded object 200, that is to say by modifying the depth associated with the embedded object 200. This is obtained by basing on the mask or masks 414 and on the disparity maps 421 or the enriched disparity map or maps 441. To do this, the smallest depth value is found in the box surrounding the embedded object, which is the same as determining the smallest disparity value, that is to say the negative disparity for which the absolute value is maximal in the surrounding box. Advantageously, the determination of the smallest depth value is realised on the disparity map providing an item of disparity information between the part of the left image (respectively the part of the right image) comprising the embedded object with respect to the part of the right image (respectively the part of the left image) comprising the embedded object. According to a variant, the determination of the smallest depth value is carried out on the disparity map providing an item of disparity information between the part of the left image comprising the embedded object with respect to the part of the right image comprising the embedded object and on the disparity map providing an item of disparity information between the part of the right image comprising the embedded object with respect to the part of the left image comprising the embedded object. According to this variant, the smallest depth value corresponds to the smallest depth determined in comparing the two disparity maps on which the determination was carried out. Once the smallest depth value is determined, a depth value lower than this smallest determined depth value is assigned to the pixels of the embedded object 200, that is to say a negative disparity value less than the negative disparity value corresponding to the smallest determined depth value is assigned to the pixels of the embedded object in a way to display the embedded object 200 in the foreground, that is to say in front of all objects of the 3D scene of the stereoscopic image, during the display of the stereoscopic image on a display device. The modification of the depth associated with the embedded object enables the coherence to be re-established between the depth associated with the embedded object and the video information associated with the pixels of the embedded object in the left and right images of the stereoscopic image. Thus, during the display of the stereoscopic image, there will be coherence between the object displayed in the foreground and the displayed video content, the object displayed in the foreground being that for which the associated video content is displayed.
Modifying the depth (that is to say the disparity) associated with the embedded object 200 is the same as repositioning the embedded object in the left image and/or the right image. Advantageously, the position of the embedded object is modified in only one of the two images (left and right). For example, if the position of the embedded object 200 is modified on the left image 221, this is equivalent to offsetting the embedded object 200 towards the right according to the horizontal axis in the left image. If for example the disparity associated with the embedded object is augmented by 5 pixels, this is equivalent to associating video information corresponding to the embedded object 200 to the pixels situated right of the embedded object over a width of 5 pixels, which has the effect of replacing the video content of the left image over a width of 5 pixels to the right of the embedded object 200 (on the height of the embedded object 200). The embedded object being offset to the right, this means that it is then necessary to determine the video information to assign to the pixels of the left image uncovered by the repositioning of the embedded object 200, a band of 5 pixels in width over the height of the object being “uncovered” on the left part occupied by the embedded object in its initial position. The missing video information is advantageously determined by spatial interpolation using video information associated with the pixels surrounding the pixels for which the video information is missing due to the horizontal translation of the embedded object to the left. If however the position of the embedded object 200 is modified on the right image 231, the reasoning is identical except that in this case the embedded object 200 is offset to the left, the part uncovered by the horizontal translation of the embedded object 200 being situated on a zone corresponding to the right part of the embedded object (taken in its initial position) over a width corresponding to the number of pixels by which the disparity is augmented.
According to a variant, the position of the embedded object is modified in the left image and in the right image, for example by offsetting the embedded object in the left image by one or several pixels to the right according to the horizontal axis and by offsetting the embedded object 200 in the right image by one or several pixels to the left according to the horizontal axis. According to this variant, it is necessary to re-calculate the video information for the uncovered pixels by the repositioning of the embedded object in each of the left and right images. This variant however has the advantage that the uncovered zones in each of the images are less wide than in the case where the position of the embedded object is modified only in one of the left and right images, which reduces possible errors engendered by the spatial interpolation calculation of the video information to be associated with the uncovered pixels. In fact, the bigger the number of pixels to be interpolated on the image, the greater the risk of assigning erroneous video information, particularly for the pixels situated at the centre of the zone for which the video information is missing, these pixels being relatively far from pixels of the periphery for which video information is available.
The processing unit 5 comprises the following elements:
A first signal L 501 representative of a first image (for example the left image 221) and a second signal R 502 representative of a second image (for example the right image 231), for example acquired by respectively a first acquisition device and a second acquisition device, are provided at input of the processing unit 3 to an embedded object detector 51. The embedded object detector advantageously detects the position of one or several embedded objects contained in each of the first and second images basing the analysis on the search for stationary objects and/or objects having particular properties (for example a determined form and/or a determined colour and/or a determined level of transparency and/or a determined position). One or several masks are found at the output of the embedded object detector, for example a mask for the first image and a mask for the second image, each mask corresponding to a part of the first image (respectively the second image) comprising the detected embedded object(s) (corresponding for example to a zone of the first image (respectively the second image) of m×n pixels surrounding each embedded object). According to a variant, at output from the embedded object detector 51 are found the first image 501 and the second image 502, with each image is associated an item of information representative of the position of the detected embedded object (corresponding for example to the coordinates of a reference pixel of the detected embedded object (for example the upper left pixel of the embedded object) as well as the width and height expressed in pixels of the embedded object or of a zone comprising the embedded object).
The disparity estimator 52 determines the disparity between the first image and the second image and/or between the second image and the first image. According to an advantageous variant, the estimation of disparity is only carried out on the parts of the first and second image comprising the embedded object(s). At output of the disparity estimator 52 are found one or several total disparity maps (if the disparity estimation is carried out over the totality of first and second images) or one or several partial disparity maps (is the disparity estimation is carried out on a part only of first and second images).
Using disparity information from the disparity estimator 52, a view synthesizer 53 determines the minimal depth value corresponding to the smallest disparity value (that is to say the negative disparity value for which the absolute value is maximal) present in the disparity map(s) received in a zone surrounding and comprising the embedded object (for example a zone surrounding the object with a margin of 2, 3, 5 or 10 pixels above and below the embedded object and a margin of 1, 10, 20 or 50 pixels left and right of the embedded object). The view synthesizer 53 modifies the depth associated with the embedded object in such a way that the new depth value associated with the embedded object is displayed in the foreground in the zone of the stereoscopic image that comprises it during the display of the stereoscopic image formed from the first image and the second image. The view synthesizer 53 consequently modifies the video content of the first image and/or the second image, offsetting the embedded object in a direction according to the horizontal axis in the first image and/or offsetting the embedded object according to a horizontal axis in the second image in the opposite direction to that of the first image in a way to augment the disparity associated with the embedded object to display it in the foreground. At output from the view synthesizer 53 are found a modified first image L′ 531 and the source second image R 502 (in the case where the position of the embedded object was only offset on the first source image L 501) or the first source image L 501 and a second modified image R′ 532 (in the case where the position of the object was only offset on the second source image R 502) or the first modified image L′ 531 and the second modified image R′ 532 (in the case where the position of the embedded object was modified in the two source images). Advantageously the view synthesizer comprises a first interpolator enabling the disparity to be associated with the pixels of the first image and/or the second image “uncovered” during the modification of the position of the embedded object in the first image and/or the second image to be estimated. Advantageously the view synthesizer comprises a second interpolator enabling the video information to be associated with the pixels of the first image and/or the second image “uncovered” during the modification of the position of the embedded object in the first image and/or the second image to be estimated.
According to an optional variant corresponding to a particular embodiment of the invention, the processing unit 5 comprises an occlusion estimator 54 to determine the pixels of the first image that are occluded in the second image and/or the pixels of the second image that are occluded in the first image. Advantageously, the determination of pixels occluded is carried out in the neighbouring area of the embedded object only being based on the information on the position of the embedded object provided by the embedded object detector. According to this variant, one or several occlusion maps comprising information on the pixel or pixels of an occluded image in the other of the two images is transmitted to the view synthesizer 53. Using this information, the view synthesizer 53 launches the process of modification of the depth assigned to the embedded object if and only if the position of pixels occluded in the first image and/or in the second image correspond to a determined model, the determined model belonging for example to a library of models stored in a memory of the processing unit 5. This variant has the advantage of validating the presence of an embedded object in the stereoscopic image comprising the first and second image before launch of the calculations necessary for the modification of the position of the embedded object at the level of the view synthesizer. According to another variant, the comparison between the position of pixels occluded and the determined model or models is realised by the occlusion estimator 54, the result of the comparison being transmitted to the embedded object detector to validate or invalidate the embedded object. In the case of invalidation, the detector 51 recommences the detection process. Advantageously, the detector recommences the detection process a determined number of times (for example 3, 5 or 10 times) before stopping the search for an embedded object.
According to an advantageous variant, the processing unit 5 comprises one or several memories (for example of RAM (Random Access Memory) or flash type able to memorise one or several first source images 501 and one or several source images 502 and a synchronisation unit enabling the transmission to be synchronised of one of the source images (for example the second source image) with the transmission of a modified image (for example the first modified image) for the display of the new stereoscopic image, for which the depth associated with the embedded object was modified.
During an initialisation step 60, the different parameters of the processing unit are updated, for example the parameters representative of the localisation of an embedded object, the disparity map or maps generated previously (during a previous processing of a stereoscopic image or of a previous video stream).
Then during a step 61, the position of an embedded object in the stereoscopic image, for example an object added in post production to the initial content of the stereoscopic image. The position of the embedded object is advantageously detected in the first image and in the second image that compose the stereoscopic image, the display of the stereoscopic image being obtained by the display of the first image and the second image (for example sequential display), the brain of a spectator looking at the display device making the synthesis of the first image and the second image to arrive at the display of the stereoscopic image with 3D effects. The determination of the position of the embedded object is obtained by analysis of the video content (that is to say the video information associated with the pixels of each image, that is to say for example a grey level value coded for example on 8 bits or 12 bits for each primary colour R, G, B or R, G, B, Y (Y is Yellow) associated with each pixel of each first and second image). The information representative of the position of the embedded object is for example formalised by an item of information on the coordinates of a particular pixel of the embedded object (for example the upper left or right pixel, the pixel situated at the centre of the embedded object). According to a variant, the information representative of the position of the embedded object also comprises an item of information on the width and the height of the object embedded in the image, expressed in number of pixels.
The detection of the position of the embedded object is advantageously obtained by searching for the fixed parts in the first image and in the second image, that is to say the parts for which the associated video content is fixed (or varying little, that is to say with a minimal video information variation associated with the pixels, that is to say less than a threshold value, for example a value variation less than a level equal to 5, 7 or 10 on a scale of 255 grey levels). To do this, the video content of several first temporally successive images is compared as well as the content of several temporally successive second images. The zone or zones of first and second images for which the video content associated with pixels of these zone varies little or nota at all advantageously corresponds to an embedded object. Such a method enables any embedded object for which the content varies little or not at all over time to be detected, that is to say any embedded object stationary in an image such as for example the channel logo of a television channel broadcasting the stereoscopic image or the score of a sporting match or any element giving information on the displayed content (such as for example the recommended age limit for viewing the displayed content). Such a detection of the embedded object is thus based on the stationary aspect of the embedded object over a determined time interval, corresponding to the duration of the display of several first images and several second images.
According to a variant, the detection of the position of the embedded object is obtained while searching for pixels having one or several specific properties, this property or these properties being associated with the embedded object. The specific property or properties advantageously belong to a list of properties comprising:
According to another variant, the detection of the embedded object is carried out by combining the search for fixed part(s) in the first and second images with the search for pixels having one or several specific properties.
Then, during a step 62, an item of disparity information representative of the disparity between the first image and the second image is estimated, over at least a part of the first and second images comprising the embedded object for which the position was detected in the preceding step. The estimation of disparity is for example carried out on a part of the first and second images surrounding the embedded object, for example on a bounding box or on a wider part comprising the embedded object and a part surrounding the embedded object of a given width (for example 50, 100 or 200 pixels around the peripheral limits of the embedded object). The estimation of disparity is carried out according to any method known to those skilled in the art. According to a variant, the estimation of disparity is carried out on the entire first image with respect to the second image. According to another variant, the estimation of disparity is carried out on all or part of the first image with respect to the second image and on all or part of the second image with respect to the first image. According to this other variant, two disparity maps are obtained, a first associated with the first image (or with a part of the first image according to the case) and a second associated with the second image (or a part of the second image according to the case).
Then during a step 63, a minimal depth value corresponding to the smallest depth value in the part of the first image (and/or of the second image) comprising the embedded object is determined according to the disparity information estimated previously (see equations 1 and 2 explaining the relationship between depth and disparity with respect to
Finally, during a step 64, a new depth is assigned to the embedded object, the value of the new depth assigned being less than the minimal depth value determined in the zone of the first image and/or the second image comprising the embedded object. Modifying the depth associated with the embedded object is a way so that it is displayed in the foreground in the zone of the image that contains it enables coherency to be returned with the displayed video information which is that of the embedded object, whatever the depth associated with the embedded object, as the object has been embedded in the first and second images of the stereoscopic image by modifying the video information of pixels concerned by video information corresponding to the embedded object.
According to an embodiment, the pixels of the first image that are occluded in the second image and the pixels of the second image that are occluded in the first image are determined, for example according to the method described with respect to
Steps 61 to 64 are advantageously reiterated for each stereoscopic image of a video sequence comprising several stereoscopic images, each stereoscopic image being formed of a first image and a second image. According to a variant, steps 61 to 64 are reiterated every n stereoscopic image, for example every 5, 10 or 20 stereoscopic images.
During an initialisation step 70, the different parameters of the processing unit are updated, for example the disparity map or maps generated previously (during a previous processing of a stereoscopic image or of a previous video stream).
Then, during a step 71, the pixel or pixels of the first image (221) that are occluded in the second image (231) are determined, for example as described in respect of
Then, during a step 72, a possible embedding error of the embedded object is detected. To do this, for at least one horizontal line of pixels of the first image comprising a group of pixels occluded in the second image, it is determined if the group of occluded pixels corresponds or not to the embedded object. The depth values associated with the pixels of the line that surround the group of occluded pixels and adjacent to the occluded pixels are also compared with each other, that is to say the depth values associated with the pixels adjacent to the group of occluded pixels situated right of the group of pixels are compared with the depth values associated with the pixels adjacent to the group of occluded pixels situated left of the group of occluded pixels. According to the result of the comparison of depth values and the membership (or non-membership) of the group of occluded pixels to the embedded object, an error linked to the embedding of the object is detected or not. By group of occluded pixels is understood a set of adjacent pixels of the first image occluded in the second image along a horizontal line of pixels. According to a variant, the group of occluded pixels only comprises a single pixel of the first image occluded in the second image. An embedding error of the embedded object corresponds advantageously to the detection of a conflict between the depth and the occlusion, between the embedded object and the original content of the stereoscopic image (that is to say before embedding). This conflict is for example due to the fact that the embedded object partially occluded another object of the stereoscopic image that is moreover situated closer to the observer (or cameras). In other words, this other object has a lesser depth than the embedded object and is nevertheless partially occluded by it, as shown with respect to
An embedding error associated with the embedded object is for example detected in the following case:
Some examples are shown with respect to
Finally, during a step 73, a new depth is assigned to the embedded object if an embedding error is detected, the value of the new assigned depth being less than a minimal depth value. The minimal depth value advantageously corresponds to the smallest depth value associated with the pixels bounding the group pixels occluded and adjacent to the group of occluded pixels, in a way to return the embedded object to the foreground, coherent with the video information associated with the pixels of first and second images at the level of the embedded object.
Advantageously, the membership of the group of occluded pixels to the embedded object is determined by comparison of at least one property associated with the group of occluded pixels to at least one property associated with the pixels of the embedded object. The properties of pixels correspond for example to the video information associated with the pixels (that is to say colour) associated with pixels and/or a motion vector associated with the pixels. An occluded pixel belongs to the embedded object if its colour is identical or almost identical to that of pixels of the embedded object and/or if an associated motion vector is identical or almost identical to that associated with the pixels of the embedded object.
The determination of the occluded pixel(s) is advantageously realised on the part of the image comprising the embedded object, the position of the embedded object being known (for example due to meta data associated with the stereoscopic image) or determined as described in step 61 of
Advantageously, a disparity map is associated with each first and second image and received with video information associated with each first and second image. According to a variant, the disparity information is determined on at least the part f the first and second images that comprises the embedded object.
Steps 71 to 73 are advantageously reiterated for each stereoscopic image of a video sequence comprising several stereoscopic images, each stereoscopic image being formed of a first image and a second image. According to a variant, steps 71 to 73 are reiterated every n stereoscopic images, for example every 5, 10 or 20 stereoscopic images.
Naturally, the invention is not limited to the embodiments previously described.
In particular, the invention is not restricted to a method for processing images but extends to the processing unit implementing such a method and to the display device comprising a processing unit implementing the image processing method.
The invention also is not limited to the embedding of an object in the plane of the stereoscopic image but extends to the embedding of an object at a determined depth (in the foreground, that is to say with a negative disparity or in the background, that is to say with a positive disparity), a conflict appearing if another object of the stereoscopic image is positioned in front of the embedded object (that is to say with a depth less than that of the embedded object) and if the video information associated with the embedded object is embedded on left and right image of the stereoscopic image without taking account of the depth associated with the embedded object.
Advantageously, the stereoscopic image to which is added the embedded object comprises more than two images, for example three, four, five or ten images, each image corresponding to a different viewpoint of the same scene, the stereoscopic image being then adapted to an auto-stereoscopic display.
Advantageously, the invention is implemented on transmission of the stereoscopic image or images comprising the embedded object to a receiver adapted for decoding of the image for displaying or on the reception side where the stereoscopic images comprise the embedded object, for example on the display device or a set-top box associated with the display device.
Number | Date | Country | Kind |
---|---|---|---|
1160083 | Nov 2011 | FR | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2012/071440 | 10/30/2012 | WO | 00 | 5/1/2014 |