The present invention relates to a method for estimating a disparity distribution between a left image and a right image of a stereoscopic 3D picture, each image having an array of pixels. The invention also relates to an apparatus for estimating a disparity distribution as well as a television apparatus for displaying stereoscopic 3D pictures. Moreover, the invention relates to an apparatus for recording, processing and/or displaying 3D pictures, and a computer program product.
The principles of stereoscopic 3D cinematography are known for a long time, but recently it became very popular and the demand for devices being able to display stereoscopic 3D content is increasing rapidly. In particular, the entertainment industry has begun to develop television devices with a stereoscopic 3D capability. However, a principle problem with the display of stereo-scopic 3D content is the often occurring discrepancy between shooting condition and viewing condition related to the depth impression perceived by the viewer. Factors for the depth impression are the display screen size of the television set, the distance and position of the viewer in front of the display, and the individual interocular (eye) distance. While the eye distance is considered to be less variant among adults, a problem exists specifically for children. In most cases, the viewing condition is not known when the content is produced. On the other hand, metadata describing the shooting condition could be attached to the content, but this is not standardized. This problem is of particular interest, because of the variety of different display screen sizes of television sets, distances and positions of the viewer compared to the conditions in a movie theatre.
The display of 3D content can therefore present problems for the viewer. Common problems can be the experience of eye divergence when looking at a far point of the scene, or the confusing impression between eye vergence and eye accommodation when fixating an object with too high apparent distance from the display screen.
In the prior art, the use of a so-called “comfort zone” has been established. The “comfort zone” defines an area before and behind the screen or display plane of a TV set in which the fixating of an object could be done by the viewer without any eye vergence and eye accommodation problems. In other words, the comfort zone describes the depth relative to the display screen which should be used for displaying objects.
This comfort zone, which defines a depth range around the screen or display plane, is closely related to the disparity between left and right view. A method to change the perceived depth for the viewer is therefore to change the disparity between left and right image. In the simplest form, this can be achieved by a horizontal scale and shift operation of left and right image when presented on the display. The scale operation applied equally to both images will scale the disparity range by the same amount. The horizontal shift of left vs. right image will reposition the plane of zero disparity, i.e. a specific depth plane in the scene can be positioned in the plane of the display screen in order to adjust the scene depth within the comfort zone of the display.
In other words, one of the main problems of displaying 3D content is to bring the depth range used in the delivered stereoscopic 3D content into the comfort zone of the display device, for example a television set. This is achieved by scaling the depth range such that the maximum depth range of the delivered content substantially corresponds to the depth range of the comfort zone. Further, the depth range of the delivered content may also be shifted relative to the display screen plane.
More detailed information about 3D cinematography fundamentals, like the 3D comfort zone, may be found in “3D Movie Making, Stereoscopic Digital Cinema from Script to Screen” Bernard Mendiburu, Focal Press, ISBN 978-0-240-81137-6, particularly Chapter 5, the content of which is incorporated by reference herewith.
In order to derive the proper parameters for scale and shift operations, the range of disparity must be known beforehand. In this context, the range of disparity is defined as representing at least the minimum and maximum disparity present in the content. Preferably, the distribution of disparity levels between these extremes is also known. This information is usually not available from metadata attached to the content and must be recovered from the image content itself.
A naïve approach to generate a disparity distribution is the estimation of a dense disparity map in which a disparity value is assigned to each pixel position in the input images. Then, a histogram is computed from the dense disparity map. The disadvantage of this method is the inefficiency of first searching for localized depth information, and then discarding it.
It is an object of the present invention to provide an efficient method that is able to estimate the global distribution of disparity between a left and a right view image. It is a further object of the present invention to provide an apparatus which is able to estimate the global distribution of disparity in an efficient manner.
According to an aspect of the present invention there is provided a method for estimating a disparity distribution between a left image and a right image of a stereoscopic 3D picture, each image having an array of pixels, comprising the steps of
Providing a maximum range of disparity;
Correlating a left image area with a right image area, with one of both image areas being shifted by a disparity shift value, wherein the result of the correlation is an indication of the pixel match between both images;
Repeating the correlating step for a set of disparity shift values within the maximum range of disparity;
Deriving the disparity distribution from the results of the correlation.
That is in other words that one of the left or right image areas is compared with the other image area shifted by a disparity shift value in order to determine how many pixels between both images match. If for example all pixels of the one image area completely match with the shifted other image area, the whole content lies in the same depth plane with a disparity (which is an indication of the position of the depth plane relative to the display plane) corresponding to the used disparity shift value.
This correlating step is repeated for a couple of disparity shift values within the given maximum range of disparity. At the end, there is a correlating result for every used disparity shift value, the results being then combined to the disparity distribution.
This disparity distribution can be employed for further image processing used to bring the stereoscopic 3D content into the comfort zone.
The core principle of the proposed inventive method is hence based on a non-linear correlation of left and right image. One of the two images is horizontally shifted against the other by d pixel columns (i.e. a disparity shift value), and a correlation operation is performed for the same area in the first image and the shifted version of the other image. This method for providing the proper parameters for scale and shift operations, namely a disparity distribution, is very efficient because of the simple pixel operations necessary.
According to a preferred embodiment, the set of disparity shift values comprises all integer values within the maximum range of disparity, wherein the unit of the disparity shift value as well as the maximum range of disparity is a pixel.
That is in other words that the correlating step is carried out for every disparity shift value within the given maximum range of disparity. This maximum range of disparity is defined by a minimum disparity value and a maximum disparity value. Both disparity values may be equal, however with different signs so that the described range is symmetrical to zero. However, both values may also be selected asymmetrically in case that any respective information is available. Generally, the maximum range of disparity defines the expected maximum depth range of the delivered stereo-scopic 3D content, or in other words the maximum expected disparity contained in the content. The disparity values mentioned may also be defined on the basis of constrains of computational resources or any compromise between the expected disparity and computational resources constrains.
In a preferred embodiment, the image area used for correlating is the overlapping area of the one image area and the shifted other image area. More preferably, the left and the right image areas for correlating are trimmed at the left and right borders by a value preferably corresponding to the maximum range of disparity.
This measure avoids that the correlation area crosses the boundaries of either image.
In a further preferred embodiment, the correlating step comprises the steps of comparing both image areas with each other pixelwise, and
increasing a counter in response to the result of the comparison, wherein the counter indicates the match of the pixel values for both image areas, one of which being shifted by the disparity shift value. More preferably, the step of comparing both image areas pixelwise comprises the step of subtracting the value of each pixel of one of both image areas from the value of each respective pixel of the other image area. More preferably, the counter is increased if the absolute value of the result of the comparison is below a predetermined threshold, preferably one.
In other words, the correlating step comprises a simple subtraction operation between two pixel values and if the absolute value of the result of this subtraction is below a predetermined threshold a counter is increased by one. Hence, every time a pixel matches the respective pixel in the shifted image the counter is increased. Hence, the higher the counter is the higher is the number of matching pixels.
However, there is to be noted that the result of the correlation does not comprise any spatial information about the matching pixels. In other words, the correlating step does not supply any information about a certain disparity value and the respective region within the image area. This makes this method so efficient.
According to a further preferred embodiment, the image areas are shifted horizontally relative to each other.
In a further preferred embodiment, the left and right image areas are divided into a number of subareas, and the correlating step is carried out for each subarea separately, so that a disparity distribution is derived for every image subarea. Preferably, the disparity distributions of the subareas are combined to a single distribution. More preferably, the number of subareas is nine.
The inventors have noted that the disparity distribution derived by the above-mentioned method has the property that it is very smooth and that peaks correspond to large objects in the stereoscopic input. In order to avoid masking of peaks corresponding to smaller objects at different depth planes, the inventors have found out that using multiple correlation areas is advantageous.
In a further preferred embodiment, each subarea is analyzed whether it contains any structured elements. Preferably, a weight factor for each subarea is determined depending on the analyzing result, wherein the weight factor is used for the combination of the disparity distributions.
Because disparity can only be estimated when the image content exposes some minimum structure, each subarea is tested whether it contains structure or only flat or uniform color values. A computational efficient test can be performed on the distribution obtained from the correlation, observing that sufficient structure in the content results in sharply located, pronounced peaks. In case of only weak or no structure, peaks become weak as well, possibly extending over the whole search range. Preferably, the peak curvature is evaluated using its second derivative to determine a weight factor that is used in the subsequent combination step.
In a further preferred embodiment, a non-linear transfer function is applied to each subarea disparity distribution before combining the subarea disparity distributions to enhance large peaks and attenuate small peaks and noise.
In a further preferred embodiment, a set of subarea disparity distributions is combined. Preferably, the set of subarea disparity distributions only comprises those relating to subareas located at the image border, preferably the top and bottom image borders.
Hence, another aspect of the proposed method is that the combination of subarea distributions can comprise different subsets instead of the full image area. For example, the distribution of all subareas located at the top and bottom image borders can be combined to obtain a disparity distribution of the border area. Such a distribution could be used to search for border violation of scene content, i.e. when an object that is located at a depth plane nearer to the viewer is cut by the image border located in the display plane.
In a further preferred embodiment, the proposed method is also suitable for stereo-scopic material that contains rectified left and right views, i.e. that epipolar lines of the inherent view geometry are aligned with the image rows. Furthermore, left and right view should have equal exposure or brightness. While these requirements ensure best portrayal on a stereoscopic display, they are still violated by most of today's content.
The proposed method can therefore be extended to include also preprocessing means to first compensate global illumination differences between left and right view. Secondly, a vertical shift between left and right correlation area is determined for each correlation area. Finally, the horizontal distribution may be estimated as described above.
According to a further aspect of the present invention there is provided an apparatus for estimating a disparity distribution between a left image and a right image of a stereoscopic 3D picture, each image having an area of pixels, comprising:
an estimation device adapted to correlate a left image area with a right image area, with one of both image areas being shifted by a disparity shift value, wherein the result of the correlation is an indication of the pixel match between both images, repeat the correlation for a set of disparity shift values within a given maximum range of disparity; derive the disparity distribution from the results of the correlation; and output the derived disparity distribution.
The inventive apparatus have the same advantages as mentioned above with respect to the inventive method. Therefore, it may be referred to the respective description above. Further, the apparatus have similar and/or identical preferred embodiments as described with respect to the method. Hence it may be refrained from repeating these embodiments and the corresponding advantages.
Finally, according to a further aspect of the present invention there is provided an apparatus for playing stereoscopic 3D pictures, preferably a television set, which comprises the inventive apparatus mentioned above.
To sum up, the present invention proposes a method which is computationally more efficient than the mentioned naïve approach. Further, the inventive method is less complex than the naïve approach. Therefore, it can be implemented more easily in hardware (e.g. ASIC) or in software for processors with vectorized computational units (e.g. VLIW, CELL). Further the inventive method is more robust than the naïve approach or content that exposes periodic structures.
Apart from changing the depth impression of an image pair carried out on the basis of the disparity distribution provided by the inventive method, other potential applications for the inventive method conceivable, in particular for the full range of devices from the lens to the living room, and may include:
a) On-the-fly metadata generation, e.g. to find the depth distance nearest to the viewer in order to place subtitles or an on-screen menu properly in front of the scene;
b) a still picture or video camera device,
c) a content post-production system for home video or as used by a a broadcaster,
d) a media playing device based on a computer product or gaming console using packaged media like Blu-Ray or streaming media from internet,
e) a display device not restricted to a TV apparatus but also including pure stereocopic monitor devices and projection systems.
While the application for case e) focuses on the control of perceived depth based on the display/viewers condition as described below, a potential application for case b) and c) could be an interactive feedback to the photographer or production operator indicating an ill-conditioned shooting situation, where a too high disparity range is known to cause problems in the down-stream processing chain. For case c), d), and e) an application is the depth positioning of captions or subtitles as well as the positioning of the on-screen menu with which such devices are controlled. For cases c) or d) the information could be used to improve the codec efficiency regarding interview prediction, in terms of computational effort and/or picture quality of the stream.
It is to be understood that the features mentioned above and those yet to be explained below can be used not only in the respective combinations indicated, but also in other combinations or in isolation, without leaving the scope of the present invention.
These and other aspects of the present invention will be apparent from and explained in more detail below with reference to the embodiments described hereinafter. In the following drawings
Before going into a detailed description of preferred embodiments, it is first given some general background information about stereoscopic 3D principles with respect to
In particular, these general remarks also serve to define terms which are used below in order to avoid any ambiguities which could raise because certain terms are used with different meanings in the literature.
On the right side of
As it is generally known, each 3D image comprises a right image and a left image which are displayed alternately. The observer typically wears for example shutter glasses synchronized with the display plane so that the observer sees the left image with the left eye only and the right image with the right eye only.
For illustrating purposes,
For a disparity of zero, meaning that the object in the right image is displayed in the same position on the display as the object in the left image, the observer perceives the object in the display plane 10.
In the example shown in
Due to the fact that the display of a TV set is pixel-based, the unity of the disparity is a pixel hereinafter. That is in other words that a disparity of one means that the left image is shifted in horizontal direction by one pixel relative to the right image.
It is apparent from
Although in theory the distance z of the perceived object relative to the display plane may take any value between zero and the observer's distance Z for a positive disparity and from zero to infinity for a negative disparity, it has turned out that certain disparities cause disturbing effects to the observer. In particular, the observer may get a headache if the disparity d becomes too large.
Due to this knowledge a so-called “comfort zone” has been established. The comfort zone defines a depth range before and behind the display plane which does not cause any disturbing effects to the observer if a perceived object lies within this zone. This comfort zone is indicated in
In the following, it is assumed that zmin is a negative value and zmax a positive value. Further, it is assumed that the absolute values of zmin and zmax are equal, meaning that the comfort zone is symmetrical to the display plane. However, it is to be noted that the absolute values of zmin and zmax may also be unequal. The comfort zone depends on the viewing geometry, which includes certain parameters of the used TV set, like the display size, and the viewer's position and individual interpupillary distance.
Due to this dependency between comfort zone and TV set parameters, it is nearly impossible for a movie broadcaster, for example, to supply information by means of metadata defining the comfort zone. Hence, there is a demand and necessity to process the supplied images and to adapt the disparity to the comfort zone. That is in other words that the TV set has the task to shift all objects lying outside the comfort zone into the comfort zone. Since the depth z is a monotonic function of the disparity d such an image processing may be based on the disparity as an input argument. In particular, a disparity distribution between a left and a right image is used as an input argument. A disparity distribution, for example, provides the minimum and maximum disparities in an image and hence the maximum depth range of the image which has to be scaled into the comfort zone.
In
Hence, in order to avoid any disturbing effects to the observer, the image has to be processed to bring the disparity distribution into the comfort zone. This processing requires a shifting step to bring the center of the distribution onto the center of the comfort zone and a scaling step to scale the disparity range d1 to d2 to the disparity range of the comfort zone Dmin to Dmax. The result of such an image processing is then shown in
In
As an argument for the image transformation, the image transformation means 42 receives a disparity distribution Pin(d) as an input. For calculating this disparity distribution, the image processor 40 comprises a disparity analysis means 44 which also receives as an input the original left image and the original right image.
The subject of the present application is the provision of the disparity distribution Pin(d) processed by the disparity analysis means 44. The image transformation is part of Japanese patent application 2009-199139 of the assignee (Sony reference number 09900660), the content of which is incorporated by reference herewith, and will therefore not be described any more hereinafter.
In the following, the disparity analysis means 44 and in particular its functionality will be described.
It comprises center cut elements 52, one for processing the left image and one for processing the right image. The center cut element 52 serves to cut or trim the supplied image to reduce the image width. In other words, the center cut element 52 cuts off a left and a right margin of the image, the width of this margin being indicated by Dmax. The output of the center cut element 52 is an image with a image width reduced by 2×Dmax relative to the original width W.
The disparity analysis means 44 further comprises a horizontally shifting element 53 which is assigned in
The disparity analysis means also comprises a correlating element 54 which receives as input the center cut left image and the center cut and horizontally shifted right image. The correlating element 54 is adapted to compare the left and right images pixelwise. The result of the pixelwise comparison is then compared with a threshold. If the absolute value of the comparison result is smaller or equal to the threshold, a counter signal is generated. Otherwise, that is if the absolute value of the comparison result is greater than the threshold, no countersignal is generated. The counter signal is supplied to a counter element 56 which increases a counter by one if it receives a counter signal. The output of the counter element 56 is a disparity distribution value for the particular disparity Δd.
The disparity analysis means shown in
A detailed description of the method carried out by the disparity analysis means 44 is now described with reference to
First, some parameter are set to the initial values. In block 60, the disparity shift value is set to Dmin. This value Dmin is generally a negative value and is selected on the basis of the expected minimum disparity in the images. Parallel to the value Dmin, a maximum disparity value Dmax is also provided. This value is determined on the basis of the maximum expected disparity in the images and has usually a positive sign. In a preferred embodiment, Dmin is set to −Dmax, so that the absolute values of Dmin and Dmax are equal and the range defined by both values Dmin, Dmax is symmetrical to zero.
Further, a counter value is set to zero (block 61). The counter value is used in the counter element 56. Further, in block 61, the index values x, y, describing a particular pixel in a two-dimensional pixel array of the image, are set. The y index is set to zero and the x index is set to a value of doff. This value doff determines the width of the cut off margin (indicated as Dmax in
In the next step (block 62) a correlation step is carried out. This correlation step comprises the subtraction of the pixel value p(x,y) of the left image and the pixel value p(x−Δd, y) of the right image. Since the sign of the difference is not to be considered, the absolute value is calculated and used in the following steps. The absolute value of the difference Δp of the subtraction indicates the extent of the pixel match of the left and the right images. In other words, if the difference Δp is zero, both pixels in the image pair are equal. If the absolute value of the difference Δp is greater than a predetermined threshold THR, which is one in the preferred embodiment, both pixels do not match.
In block 63, the absolute value of the difference Δp is evaluated and if it is below a threshold THR, the counter is increased by 1 (block 64). Otherwise, i.e. both pixels do not match, the counter is not increased.
Next, in block 65, the x index is increased by one and then compared with the value W−doff, wherein W is the width of the image (block 66). If the index x is smaller or equal to W−doff the correlation step is repeated for the next pixel in the same pixel row (i.e. the y index remains unchanged).
After having compared all pixels in a row of the pixel array, the same above-mentioned steps are repeated for the next row of the image's pixel array. Therefore, the x index is again set to doff and the y index is increased by one (block 67). Then, all pixels in the new row are correlated and if a pixel match is determined, the counter is again increased by one.
As it is apparent from
As soon as all pixel rows of the image have been processed (block 68), the value of the counter is stored in the disparity distribution array P(Δd) for the array index Δd (block 69). Then, the disparity shift value is increased by one and the counter is reset to zero. Then, the above described process is repeated for the new disparity shift value Δd.
As soon as the process has been carried out for every value Δd within the range Dmin to Dmax, the process is terminated (block 70) and the disparity distribution array P (Dmin to Dmax) is output for further processing (block 71).
The first example shows a situation with a disparity shift value of Δd=Dmax. As already mentioned before, only a trimmed image area is taken for the correlation. The left image is hence trimmed by margins 73 so that only a center cut area 74 of the image is employed. The width of the margin 73 is indicated with doff.
The right image which is employed for the correlation is shifted by Dmax, which is in this embodiment a positive value. Hence, the image area, having the same size as the image area of the left image 74, is shifted to the left.
It is apparent from this Figure that the width of the margin doff has to be greater than or equal to the absolute value of Dmax. Otherwise, a part of the shifted image area 75 would lie outside of the valid area.
In the second example, the disparity shift value Δd is zero. Hence, the left image area 74 and the right image area 75 used for the correlation are identical with respect to the position within the whole image. In other words, the image area 75 of the right image is not shifted.
In the third example, the disparity shift value Δd is Dmin, which is a negative value. Here, the image area 75 used for the correlation or match is shifted to the right by Dmin pixels.
It is also to be noted that the width of the margin doff has to be greater than or equal to the absolute values of Dmax and Dmin. Otherwise, a portion of the shifted area 75 of the right image would lie outside of the valid area.
The result is then a disparity distribution for all disparity values between Dmin and Dmax.
With reference to
The result of the described correlation is the disparity distribution P(d) which is supplied as the disparity distribution Pin(d) to the image transformation means 42 (see
It is apparent from the foregoing detailed description that the correlation is a pixel based operation only using a subtraction of two pixel values. As a consequence, the correlation method for determining the disparity distribution may be implemented very efficiently.
In order to increase the accuracy of the correlation, the above-mentioned correlation can be modified as follows.
In order to avoid masking of peaks corresponding to smaller objects at different depth planes which may happen when the correlation is carried out for the whole image area 74, 75, the image area 74, 75 is divided into a plurality subareas or sub-windows. In
The advantage of using image subareas is for example that the individual subarea disparity distributions can be differently weighted when combining them to the total disparity distribution supplied to the image transformation means 42.
A further advantage of using image subareas is that so-called object frame violations, i.e. objects located in front of the image plane but cut by the image border, may be detected on the basis of the respective subarea disparity distributions of the top row and/or bottom row subareas.
With reference to
The normalized disparity distributions for the subareas Plin,k is then supplied to a non-linear mapping element 82. The normalized disparity distribution is transformed by a non-linear monotonic function that effectively attenuates small pseudo-probability values more than large pseudo-probability values.
The output of the non-linear mapping element 82 Pnl,k is then supplied to a denormalizing element 83. This element denormalizes the disparity distribution Pnl,k by an inversion of the normalization performed by the normalizing element 81. The result is output as the disparity distribution Pnw,k(d) for each image subarea.
The post-processed disparity distributions Pnw,k(d) for the subareas are then combined by a combining element 85 which is preferably a summing element 86. The result output by the combining element 85 is a single distribution Pim(d) that represents the estimated disparity distribution for the stereoscopic input image pair and which is supplied to the image transformation means 42. The combining element 85 with its input of N subarea disparity distributions is shown in
As mentioned before, the non-linear mapping element 82 uses a non-linear monotonic function. An example of such a function is shown in
It has been pointed out above that the image area used for correlation is trimmed at the left and right borders. Further, with respect to
In
In
In particular
The described method for estimating the disparity distribution of an image pair is suitable for stereoscopic material that contains rectified left and right views, i.e. that epipolar lines of the inherent view geometry are aligned with the image rows. Furthermore, left and right view should have equal exposure or brightness. While these requirements ensure best portrayal on a stereoscopic display, they are still violated by most of today's content.
The proposed and above described method can therefore be extended to include also preprocessing means to first compensate global illumination differences between left and right views. Secondly, a vertical shift between left and right correlation image areas is determined for each correlation area. Finally, the horizontal distribution is estimated as described above.
To summarize the main advantages of the invention, it is computationally more efficient than the mentioned naïve approach. Further, it is less complex than the naïve approach. Therefore, it can be implemented more easily in hardware (e.g. ASIC) or in software for processors with vectorized computational units (e.g. VLIW, CELL). And the inventive method is more robust than the naïve approach for content that exposes periodic structures.
The invention has been illustrated and described in detail in the drawings and foregoing description, but such illustration and description are to be considered illustrative or exemplary and not restrictive. The invention is not limited to the disclosed embodiments. Other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure, and the appended claims.
In the claims, the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. A single element or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
Any reference signs in the claims should not be construed as limiting the scope.
Number | Date | Country | Kind |
---|---|---|---|
10 155 576.1 | Mar 2010 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2011/053204 | 3/3/2011 | WO | 00 | 8/10/2012 |