This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 2013-135983, filed Jun. 28, 2013, the contents of which are incorporated herein by references.
1. Field of the Invention
The present invention relates to a picture (or image) processing apparatus, a picture processing method and a picture processing program for converting a two-dimensional (2D) picture (or image) into a three-dimensional (3D) picture (or image) for the purpose of achieving a stereoscopic vision.
2. Description of the Related Art
In recent years, 3D video contents such as 3D movies and 3D broadcasting are in wide use. To allow a viewer or observer to have a stereoscopic view, a right-eye picture and a left-eye picture each having a parallax are required. When 3D video picture are displayed, a right-eye picture and a left-eye picture are displayed in a time-division manner. And the right-eye picture and the left-eye picture are separated from each other by use of video separating glasses such as shutter glasses or polarization glasses. This allows the viewer to have his/her right eye only see the right-eye pictures and have his/her left eye only see the left-eye pictures.
The production of 3D videos may be roughly classified into two methods. One is a method where a right-eye picture and a left-eye picture are simultaneously taken by two cameras. The other is a method where a 2D picture captured by a single camera is later edited so as to produce a parallax picture. The present invention relates to the latter method and relates to a technology for converting 2D pictures into 3D pictures.
When a 3D picture is produced as described above, pixels of a first 2D picture are shifted using the depth map and then a second 2D picture having a different viewpoint relative to the first 2D picture is generated (see Reference (1) in the following Related Art List). With this pixel shifting, there is generated one or more missing pixels within the thus generated second 2D picture having the different viewpoint.
(1) Japanese Unexamined Patent Application Publication (Kokai) No. 2009-44722.
In general, the missing pixels caused by the pixel shifting are interpolated using peripheral pixels thereof. If the difference in depth values at an object boundary within a screen (this difference will be hereinafter referred to as “level difference of depth” also) is large, a pixel shift amount at the boundary portion will be large, too. Thus, the number of missing pixels, namely the area of the missing pixel region, will be large as well. As described above, those missing pixels are interpolated using the peripheral pixels thereof. However, as the area of the missing pixel region gets larger, the number of positions, where the dissociation or discrepancy between those pixels to be interpolated and the correct pixels is large, tends to increase.
The present invention has been made in view of the foregoing circumstances, and a purpose thereof is to provide a technology by which the picture quality of boundary part of an object is enhanced when a 3D picture is generated from a 2D picture.
In order to resolve the above-described problems, a picture processing apparatus (100) according to one embodiment of the present invention includes: a depth map generator (10) configured to generate a depth map of an input picture; a depth map correction unit (20) configured to correct the depth map generated by the depth map generator; and a picture generator (30) configured to shift pixels of the input picture, based on the depth map corrected by the depth map correction unit, so as to generate a picture having a different viewpoint. The depth map correction unit (20) includes: a level difference detector (21) for detecting a difference in depth values of pixels in a horizontal direction of the depth map; and a low-pass filter unit (23) for applying a low-pass filter to part of the depth map generated by the depth map generator (10), in response to the detected difference in the depth values.
Another embodiment of the present invention relates to a picture processing method. The method includes: generating a depth map of an input picture; detecting a difference in depth values of pixels in a horizontal direction of the depth map; correcting the depth map by applying a low-pass filter to part of the depth map, in response to the detected difference of the depth values; and shifting pixels of the input picture, based on the corrected depth map, so as to generate a picture having a different viewpoint.
Optional combinations of the aforementioned constituting elements, and implementations of the invention in the form of methods, apparatuses, systems, recording media, computer programs and so forth may also be practiced as additional modes of the present invention.
Embodiments will now be described by way of examples only, with reference to the accompanying drawings, which are meant to be exemplary, not limiting and wherein like elements are numbered alike in several Figures in which:
The invention will now be described by reference to the preferred embodiments. This does not intend to limit the scope of the present invention, but to exemplify the invention.
A general pixel interpolation method is given before a description of the embodiments.
As a method for avoiding the degradation of image quality, conceivable is a method where the depth map is subjected to a low-pass filter. As a low-pass filter is applied to the depth map, the level difference of depth at the boundary portion changes gradually and slowly. As a result, the interpolation quality is improved.
In
Although a method where a low-pass filter is applied to a depth map is an effective way, the low-pass filter is applied to the entire region of the depth map and therefore a part of the depth map where no low-pass filter needs to be applied is also subjected thereto. For example, the low-pass filter is also applied to a on a right side of a boundary portion of the person shown in
In a picture where the pixels have been shifted using a depth map to which the low-pass filter has been applied, the boundary of the object appears blurry. In a still picture and moving picture(s) to be 2D/3D converted, an object to be viewed by viewers is often located in the foreground. Thus, any blur at an edge of the object in the foreground is more likely to be conspicuous. It is therefore desirable that the low-pass filter be applied to an object boundary portion only, which is located on a side of the object where the missing pixels occur.
The depth map generator 10 analyzes a 2D picture, which is inputted to the depth map generator 10, and generates a pseudo-depth map. A description is hereunder given of specific examples of generating the depth map. Based on a 2D picture and a depth model that are inputted, the depth map generator 10 generates a depth map of this 2D picture. The depth map is a gray scale picture where a depth value is expressed by a luminance value. The depth map generator 10 estimates a scene structure and then generates a depth map using a depth model best suited to the scene structure. The depth map generator 10 combines a plurality of basic depth models and uses the thus combined one in the generation of the depth map. In so doing, the composition ratio of the plurality of basic depth models is varied according to the scene structure of this 2D picture.
The top-screen high-frequency-component evaluating unit 11 calculates the ratio of pixels having high-frequency components at the top of a screen of a 2D picture to be processed. The calculated ratio thereof is set as a high-frequency-component evaluation value of a top part of the screen. The ratio of the top part of the screen to the entire screen is preferably set to about 20%. The lower-screen high-frequency-component evaluating unit 12 calculates the ratio of pixels having high-frequency components at a lower part of a screen of said 2D picture. The calculated ratio thereof is set as a high-frequency-component evaluation value of the lower part of the screen. The ratio of the lower part of the screen to the entire screen is preferably set to about 20%.
The first basic depth model frame memory 14 stores a first basic depth model. Similarly, the second basic depth model frame memory 15 stores a second basic depth model, and the third basic depth model frame memory 16 stores a third basic depth model. The first basic depth model is a model where the top part of the screen and the lower part of the screen are each a concave spherical surface. The second basic depth model is a model where the top part of the screen is a cylindrical surface having an axis line in a vertical direction and where the lower part of the screen is a concave spherical surface. The third basic depth model is a model where the top part of the screen is a planar surface and where the lower part of the screen is a cylindrical surface having an axis line in a horizontal direction.
The composition ratio determining unit 13 determines composition ratios k1, k2, and k3 of the first basic depth model, the second basic depth model, and the third basic depth model (where k1+k2+k3=1), respectively, based on the high-frequency-component evaluation values of the top part and the lower part of the screen calculated by the top-screen high-frequency-component evaluating unit 11 and the lower-screen high-frequency-component evaluating unit 12, respectively. The combining unit 17 multiplies the first basic depth model, the second basic depth model, and the third basic depth model by k1, k2, and k3, respectively, and then adds up the respective multiplication results. This calculation result is used as a combined basic depth model.
If, for example, the high-frequency-component evaluation value of the top part of the screen is small, the composition ratio determining unit 13 recognizes that there is a scene where the sky or flat wall is present in the top part of the screen. As a result, the ratio of the second basic depth model where the depth of the top part of the screen has been made larger is increased. If the high-frequency-component evaluation value of the lower part of the screen is small, the scene is recognized as one where a flat ground surface or water surface extending continuously in front is present in the lower part of the screen, and the ratio of the third basic depth model is increases. In the third basic depth mode, the top part of the screen is plane-approximated as the background; in the lower part of the screen, the depth is made smaller toward the bottom.
The adder 18 superimposes a red component (R) signal of the aforementioned 2D picture on the combined basic depth model generated by the combining unit 17 and thereby generates a depth map. The reason why the R signal is used herein is based on a rule of experience as follows. That is, in a condition where the environment is close to the direction light and the brightness of textures is not much different from each other, it is highly probable that the magnitude of R signal coincides with recess and projection of an object. Also, it is because red color and warm color are an advancing color in the chromatics, and the advancing color, such as red, is recognized more frontward than one based on a cold color and emphasizes the stereoscopic effect.
Now refer back to
First, consider a case where a 2D picture having an original viewpoint is set to the right-eye picture and then a left-eye picture, where the viewpoint has been shifted to the left, is generated. In this case, when a texture is to be viewed stereoscopically in a pop-out direction relative to the viewer, the texture of the 2D picture having the original viewpoint is moved to the right of the screen according to the depth value. Conversely, when the texture is to be viewed stereoscopically in a depth direction relative to the viewer, the texture thereof is moved to the left of the screen according to the depth value.
Next, consider a case where the 2D picture having the original viewpoint is set to a left-eye picture and then a right-eye picture, where the viewpoint has been shifted to the right, is generated. In this case, when a texture is to be viewed stereoscopically in a pop-out direction relative to the viewer, the texture of the 2D picture having the original viewpoint is moved to the left of the screen according to the depth value. Conversely, when the texture is to be viewed stereoscopically in a depth direction relative to the viewer, the texture thereof is moved to the right of the screen according to the depth value.
The 3D picture generator 30 outputs the 2D picture having the original viewpoint and another 2D picture having a different viewpoint as a 3D picture. It is to be noted here that the detailed descriptions of generating the depth map by the depth map generator 10 and generating the 3D picture by the 3D picture generator 30 are disclosed in the aforementioned Reference (1) filed by the same applicant as that of the present patent specification.
Hereinafter, a depth map generated by the depth map generator 10 will be denoted by “depth map dpt”. Assume herein that each depth value constituting a depth map dpt takes a value ranging from −255 to 255. And assume also that when the depth value is positive, it indicates a pop-out direction, whereas when the depth value is negative, it indicates a depth direction. The setting is not limited to this and may be reversed instead, namely, when the depth value is negative, it may indicate a pop-out direction, whereas when the depth value is positive, it may indicate a depth direction.
The depth map correction unit 20 corrects the depth map dpt generated by the depth map generator 10 so as to generate a corrected depth map dpt_adj. The 3D picture generator 30 generates a 3D picture, based on the input 2D picture and the corrected depth map dpt_adj corrected by the depth map correction unit 20. In the present embodiment, the input 2D picture is directly outputted as a right-eye picture, and the picture generated by the pixel shifting is outputted as a left-eye picture.
The level difference detector 21 detects a level difference of the depth map dpt in the horizontal direction. For example, the level difference detector 21 detects a difference in depth values of pixels adjacent in the horizontal direction. To reduce the processing load, the difference in depth values may be detected at intervals of a predetermined number of pixels. The level difference determining unit 22 compares the difference in depth values, detected by the level difference detector 21, against a set threshold value and thereby detects an object boundary in the input 2D picture. The low-pass filter unit 23 applies a low-pass filter to part of the depth map dpt in response to the detected difference in the depth values. More specifically, the low-pass filter is applied to an object boundary portion on the side where missing pixels are caused by the pixel shifting.
The missing pixel region, caused by the pixel shifting, occurs on a left side of a boundary of a foreground object when the left-eye picture is generated by the pixel shifting. Similarly, the missing pixel region caused thereby occurs on a right side of the boundary of the foreground object when the right-eye picture is generated by the pixel shifting. The left side of the boundary of the foreground object is a rising edge where the detected level difference rises up to a positive value on a large scale. Similarly, the right side of the boundary of the foreground object is a falling edge where the detected level difference falls down to a negative value on a large scale. Conversely, the left side of a boundary of a background object is a falling edge where the detected level difference falls to a negative value on a large scale; the right side thereof is a rising edge where the detected level difference rises up to a positive value on a large scale.
When a left-eye picture is generated by the pixel shifting, the low-pass filter unit 23 applies a low-pass filter, as follows, with a boundary pixel position on the left side of the foreground object, namely a rising edge position where the level difference in the depth values rises up to a positive value on a large scale, as the reference point. That is, the low-pass filter unit 23 applies the low-pass filter to a region containing pixels starting at a rising edge position (starting position) up to a pixel position located apart by a preset number of pixels to the left of the starting position. In other words, the low-pass filter is applied to the region containing the pixels starting at the rising edge position up to a position of a predetermined pixel located leftward from the rising edge position. When a right-eye picture is generated by the pixel shifting, the low-pass filter is applied to a region containing pixels starting at a boundary pixel position on the right side of the foreground object, namely a falling edge position where the level difference in the depth values falls down to a negative value on a large scale, up to a pixel position located apart by a preset number of pixels to the right of the falling edge position. In other words, the low-pass filter is applied to the region containing the pixels starting at the falling edge position up to a position of a predetermined pixel rightward from the falling edge position.
If a left-eye picture is generated by the pixel shifting and if a foreground object and a background object or a background are related according to the following condition (1) or (2), a missing pixel or pixels occurs/occur.
(1) When the background object or the background is located on a left side of the foreground object, [the depth value of the foreground object]>0 and [the depth value of the foreground object]>[the depth value of the left-side background object.
(2) When the background object is located on a left side of the foreground object, [the depth value of the foreground object]<0 and [the depth value of the foreground object]>[the depth value of the left-side background object].
The condition (1) indicates a case where the foreground object is pixel-shifted to the right (in a pop-out direction) with the result that a missing pixel or pixels occurs/occur on a left side of the rising edge position.
The condition (2) indicates a case where the foreground object is pixel-shifted to the left (in a depth direction) and the background object is pixel-shifted to the left (in the depth direction) to a much greater degree as compared with the foreground object with the result that a missing pixel or pixels occurs/occur on a left side of the rising edge position.
If a right-eye picture is generated by the pixel shifting and if a foreground object and a background object or a background are related according to the following condition (3) or (4), a missing pixel or pixels occurs/occur.
(3) When the background object or the background is located on a right side of the foreground object, [the depth value of the foreground object]>0 and [the depth value of the foreground object]>[the depth value of the right-side background object.
(4) When the background object is located on a right side of the foreground object, [the depth value of the foreground object]<0 and [the depth value of the foreground object]>[the depth value of the right-side background object].
The condition (3) indicates a case where the foreground object is pixel-shifted to the left (in a pop-out direction) with the result that a missing pixel or pixels occurs/occur on a right side of the falling edge position.
The condition (4) indicates a case where the foreground object is pixel-shifted to the right (in a depth direction) and the background object is pixel-shifted to the right (in the depth direction) to a much greater degree as compared with the foreground object with the result that a missing pixel or pixels occurs/occur on a right side of the falling edge position.
When the left-eye picture is to be generated by the pixel shifting, the filter characteristic of the low-pass filter unit 23 is set such that the low-pass filter has coefficients on the right side of the center and has no coefficients on the left side thereof (see
A description is given hereunder using specific examples. The level difference detector 21 calculates a difference value of depth values between adjacent pixels, based on the following Equation (1), and then outputs its result as a depth map edge level dpt_edge. “dpt (x, y)” indicates a depth value wherein the horizontal position of an input picture is denoted by “x” and the vertical position thereof by “y”. Although, in the present embodiment, the result of calculating the difference between adjacent pixels is used to calculate the depth map edge level dpth_edge, this should not be considered as limiting. For example, the edge level may be calculated by applying a high-pass filter processing to the depth map.
dpt_edge (x, y)=dpt(x+1, y)−dpt(x, y) Equation (1)
The level difference determining unit 22 compares the depth map edge level dpt_edge against a threshold value th1 and then converts the depth map edge level dpt_edge into a depth map edge level determining value dpt_jdg that takes three values. The threshold value th1 is set to a value determined by a designer based on experiments, simulation runs, experimental rules or the like.
dpt_edge≧th1, dpt_jdg=1 Inequality (2)
th1>dpt_edge>th2, dpt_jdg=0 Inequality (3)
th2≧dpt_edge, dpt_jdg=−1 Inequality (4)
(th1>0, th2<0)
The pixel position where the depth map edge level determining value dpt_jdg=1 indicates a rising edge position of the depth map, while the pixel position where the depth map edge level determining value dpt_jdg=−1 indicates a falling edge position of the depth map. In the present embodiment, a region for which the depth map edge level determining value dpth_jdg=0 is provided such that a small and negligible level difference within the same object is not detected. This allows only a level difference between objects to be detected as an edge.
The low-pass filter unit 23 is comprised of a horizontal low-pass filter whose filter coefficient can be varied. While varying the filter coefficient based on the depth map edge level determining value dpt_jdg supplied from the level difference determining unit 22, the low-pass filter unit 23 applies a low-pass filter processing to the depth map dpt. More specifically, when a left-eye picture is to be generated by the pixel shifting, the low-pass filter changes its shape to one where the low-pass filter has coefficients on the right side only. And when a right-eye picture is to be generated by the pixel shifting, the low-pass filter changes its shape to one where the low-pass filter has coefficients on the left side only. In the horizontal low-pass filter, the number of taps used is 2N+1 (N being a natural number). A description is now given of specific exemplary operations.
In the present embodiment, the left-eye picture is generated by the pixel shifting and therefore missing pixels occur, on a left side of the rising edge of the depth map dpt, in a pixel-shifted picture of the input 2D picture. The low-pass filter unit 23 applies the low-pass filter processing to the depth map dpt such that a low-pass filter is applied to only a region (see a portion encircled by a dotted line denoted by “A4” in
The aforementioned “N” is set to a value determined by the designer based on experiments, simulation runs, experimental rules or the like. The aforementioned “N” may be a fixed or varying value. Where “N” is a varying value, it is varied proportionally to the level difference of boundary pixels detected by the level difference detector 21. In other words, the larger the level difference becomes, the larger the pixel shift amount will be; as a result, the low pass filter will be applied to a wider area.
A description has been given of the case where the left-eye picture is generated by the pixel shifting, and a description is now given of the case where a right-eye picture is generated by the pixel shifting.
It goes without saying that the picture processing as described above can be accomplished by transmitting, storing and receiving apparatuses using hardware. Also, the above-described picture processing can be accomplished by firmware stored in Read Only Memory (ROM), flash memory or the like, or realized by software such as a computer. A firmware program and a software program may be recorded in a recording medium readable by a computer or the like and then made available. Also, the firmware program and the software program may be made available from a server via a wired or wireless network. Further, the firmware program and the software program may be provided through the data broadcast by terrestrial or satellite digital broadcasting.
As described above, the present embodiments can prevent the deterioration of picture quality in the missing pixel region caused by the level difference of depth when a pixel-shifted picture is generated based on the depth map in the 2D-to-3D conversion. In other words, the rising edge position and the falling edge position of the depth map are identified by detecting the edge level of the depth map and determining the detected edge level. Then the low-pass filter processing using a low-pass filter with an asymmetrical frequency response is applied to only the peripheral regions of the rising edge position and the falling edge position. This allows the low-pass filter processing to be adaptively applied to only a region in a depth map where a missing pixel or missing pixels can occur in the generation of the pixel-shifted picture. Hence the quality of the pixel-shifted picture can be improved. Moreover, the low-pass filter processing is not applied to the unnecessary regions where no low-pass filter needs to be applied in the first place. Thus the sparsity and blurring of the object can be prevented. If the low-pass filter processing is accomplished by software, the amount of calculation can be reduced.
The present invention has been described based on the embodiments. The embodiments are intended to be illustrative only, and it is understood by those skilled in the art that various modifications to constituting elements or an arbitrary combination of each process could be further developed and that such modifications are also within the scope of the present invention.
For example, in the above-described embodiments, a description has been given of a case where one of a 3D picture is generated based on an input 2D image and its depth map and then the 2D picture is directly used as the other of the 3D picture. In this regard, both right-eye and left-eye pictures that constitute the 3D picture may be generated based on the input 2D picture and its depth map.
Assume that when the depth value is positive, the depth map dpt indicates a pop-out direction, whereas when it is negative, the depth map dpt indicates a depth direction. Then, for example, the pixels of the object in the input 2D picture are shifted by pixels of [dpt/2] to the right (left) based on the depth map dpt so as to generate a left-eye picture (right-eye picture); the pixels of the object in the input 2D picture are shifted by pixels of [−dpt/2] to the right (left) based on the depth map dpt so as to generate a right-eye picture (left-eye picture).
Number | Date | Country | Kind |
---|---|---|---|
2013-135983 | Jun 2013 | JP | national |