1. Field
Example embodiments of the following disclosure relate to an apparatus and method for encoding and decoding, and more particularly, to a method and apparatus for encoding and decoding a three-dimensional (3D) video based on depth transition data.
2. Description of the Related Art
A three-dimensional (3D) video system may effectively perform 3D video encoding using a depth image based rendering (DIBR) system.
However, a conventional DIBR system may generate distortions in rendered images and the distortions may degrade the quality of a video system. Specifically, a distortion of a compressed depth image may lead to erosion artifacts in object boundaries. Due to the erosion artifacts, a screen quality may be degraded.
Therefore, there is a need for improved encoding and decoding of 3D video.
The foregoing and/or other aspects are achieved by providing an apparatus for encoding a three-dimensional (3D) video, including: a transition position calculator to calculate a depth transition for each pixel position according to a view change; a quantizer to quantize a position of the calculated depth transition; and an encoder to encode the quantized position of the depth transition.
The transition position calculator may calculate depth transition data based on a view transition position where a foreground-to-background transition or a background-to-foreground transition occurs.
The transition position calculator may calculate depth transition data based on pixel positions where a foreground-to-background transition or a background-to-foreground transition occurs between neighboring reference views.
The 3D video encoding apparatus may further include a foreground and background separator to separate a foreground and a background based on depth values of foreground objects and background objects in a reference video.
The foreground and background separator may separate the foreground and the background based on a global motion of the background objects and a local motion of the foreground objects in the reference video.
The foreground and background separator may separate the foreground and the background based on an edge structure in the reference video.
The transition position calculator may calculate depth transition data by measuring a transition distance from a given pixel position to a pixel position where a foreground-to-background transition or a background-to-foreground transition occurs.
The transition position calculator may calculate depth transition data based on intrinsic camera parameters or extrinsic camera parameters.
The quantizer may perform quantization based on a rendering precision of a 3D video decoding system.
The foregoing and/or other aspects are achieved by providing an apparatus for decoding a three-dimensional (3D) video, including: a decoder to decode quantized depth transition data; an inverse quantizer to perform inverse-quantization of the depth transition data; and a distortion corrector to correct a distortion with respect to a synthesized image based on the decoded depth transition data.
The decoder may perform entropy decoding for a pixel position where a foreground-to-background transition or a background-to-foreground transition occurs.
The 3D video decoding apparatus may further include a foreground and background separator to separate a foreground and a background based on depth values of foreground objects and background objects in a reference video.
The distortion corrector may correct a distortion by detecting pixels with the distortion greater than a reference value based on the depth transition data.
The 3D video decoding apparatus may further include a foreground area detector to calculate local averages of a foreground area and a background area based on a foreground and background map generated from the depth transition data, and to detect a pixel value through a comparison between the calculated local averages.
The distortion corrector may replace the detected pixel value with the local average of the foreground area or the background including a corresponding pixel based on the depth transition data.
The distortion corrector may replace the detected pixel value with a nearest pixel value belonging to the same foreground area or to the background area based on the depth transition data.
The foregoing and/or other aspects are achieved by providing a method of encoding a three-dimensional (3D) video, including: calculating a depth transition for each pixel position according to a view change; quantizing a position of the calculated depth transition; and encoding the quantized position of the depth transition.
The calculating may include calculating depth transition data based on a view transition position where a foreground-to-background transition or a background-to-foreground transition occurs.
The foregoing and/or other aspects are achieved by providing a method of decoding a three-dimensional (3D) video, including: decoding quantized depth transition data; performing inverse quantization of the depth transition data; and enhancing a quality of an image generated based on the decoded depth transition data.
The decoding may include performing entropy decoding for a pixel position where a foreground-to-background transition or a background-to-foreground transition occurs.
Example embodiments may provide a further enhanced three-dimensional (3D) encoding and decoding apparatus and method by adding depth transition data to video plus depth data and thereby providing the same.
Example embodiments may correct a depth map distortion since depth transition data indicates that a transition between a foreground and a background occurs.
Example embodiments may provide depth map information with respect to all the reference vies by providing depth transition data applicable to multiple views at an arbitrary position.
Example embodiments may significantly decrease erosion artifacts causing a depth map distortion by employing depth transition data and may also significantly enhance the quality of a rendered view.
Example embodiments may enhance the absolute and relative 3D encoding and decoding quality by applying depth transition data to a rendered view.
Additional aspects of embodiments will be set forth in part in the description which follows and, in part, will be apparent from the description, or may be learned by practice of the disclosure.
These and/or other aspects will become apparent and more readily appreciated from the following description of embodiments, taken in conjunction with the accompanying drawings of which:
Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. Embodiments are described below to explain the present disclosure by referring to the figures.
Hereinafter, an apparatus and method for encoding and decoding a three-dimensional (3D) video based on depth transition data, according to example embodiments, will be described with reference to the accompanying drawings.
A depth image based rendering (DIBR) system may render a view between available reference views. To enhance the quality of the rendered view, a depth map may be provided together with a reference video.
The reference video and the depth map may be compressed and coded into a bitstream. A distortion occurring in coding the depth map may cause relatively significant quality degradation, particularly, due to erosion artifacts along a foreground object boundary. Accordingly, proposed is an approach that may decrease erosion artifacts by providing additional information for each intermediate rendered view.
For example, generally, an encoder may synthesize views and may transmit a residue between the synthesized view and an original captured video. This process may be unattractive since overhead increases based on a desired number of possible interpolated views.
Accordingly, example embodiments of the present disclosure may provide auxiliary data, e.g., depth transition data, which may complement depth information and may provide enhanced rendering of multiple intermediate views.
Referring to
Referring to
Referring to
For example, a pixel position may belong to a foreground in a left reference view and may belong to a background in a right reference view. The depth transition data may be generated by recording a transition position for each pixel position. When the arbitrary view is positioned at the left of the transition position, a corresponding pixel may belong to the foreground. When the arbitrary view is positioned to the right of the transition position, the corresponding pixel may belong to the background. Accordingly, the foreground and background map may be used to generate the arbitrary view position based on the depth transition data. When depth maps for intermediate views are used to generate the depth transition data based on a reference depth map value, a binary map using the same equation applied to the reference views may be generated. In this example, a transition may be easily traced. However, the depth maps may not be available at all times for a target view at the arbitrary view position. Accordingly, a method of estimating a camera position where a depth transition occurs based on camera parameters may be derived.
The depth transition data may have camera parameters as shown in Table 1.
, y
, z
)
, y
)
, y
)
, y
)
, y
) in p-th view
, y
)
, y
) in p-th view
indicates data missing or illegible when filed
Camera coordinates (x, y, z) may be mapped to world coordinates (X, Y, Z), according to Equation 1, shown below.
In Equation 1, A denotes an intrinsic camera matrix and M denotes an extrinsic camera matrix. M may include a rotation matrix R and a translation vector T. Image coordinates (xim, yim) may be expressed, according to Equation 2, shown below.
Accordingly, when each pixel depth value is known, a pixel position may be mapped to world coordinates and the pixel position may be remapped to another set of coordinates corresponding to a camera position of a view to be rendered. In particular, when a pth view having camera parameters Ap, Rp, and Tp is mapped to a P′th view having parameters Ap′, Rp′, and Tp′, camera coordinates in the p′th view may be represented, according to Equation 3, shown below.
In Equation 3, Z denotes a depth value and image coordinates in the P′th view may be expressed, according to Equation 4, shown below.
Hereinafter, a method of calculating a camera position based on a previous derivation of point mapping when a depth transition occurs will be described. It is assumed that cameras are arranged in a horizontally parallel position, which implies an identify matrix. To calculate Ap′A−1p, the intrinsic matrix A may be defined, according to Equation 5, shown below.
In Equation 5, fx and fy respectively denote focal lengths divided by an effective pixel size in a horizontal direction and a vertical direction. (ox, oy) denotes pixel coordinates of an image center that is a principal point. An inverse matrix of the intrinsic matrix A may be calculated, according to Equation 6, shown below.
When the same focal length for two cameras at the Pth view and the p′th view is assumed, Equation 4 may be expressed, according to Equation 7, shown below.
With the assumption of parallel camera setting, there will be no disparity change other than in a horizontal direction or an x direction. Accordingly, disparity Δxim may be expressed, according to Equation 8, shown below.
In Equation 8, tx denotes a camera distance in the horizontal direction.
The relationship between an actual depth value and an 8-bit depth map may be expressed, according to Equation 9, shown below.
In Equation 9, Znear denotes a nearest depth value in a scene and Zfar denotes a farthest depth value in the scene. In a depth map L, Znear corresponds to a value 255 and Zfar corresponds to a value 0. When substituting Equation 8 with the above values, Equation 10 may be obtained, shown below.
Accordingly, when the camera distance tx is known, the disparity Δxim may be calculated. When the disparity Δxim is known, the camera distance tx may be calculated. Accordingly, when the disparity is used as the horizontal distance from a given pixel position to a position where a depth transition occurs, it is possible to find the exact view position where the depth transition occurs. The horizontal distance may be measured by counting a number of pixels from a given pixel to a first pixel for which a depth map value difference with respect to an original pixel exceeds a predetermined threshold. Using the above calculated horizontal distance as the disparity Δxim, the view position where the depth transition occurs may be estimated, according to Equation 11, shown below.
In Equation 11,
tx may be quantized to a desired precision and be transmitted as auxiliary data.
Referring to
The foreground and background separator 410 may receive a reference video and a depth map and may separate a foreground and a background in the reference video and the depth map. That is, the foreground and background separator 410 may separate the foreground and the background based on depth values of foreground objects and background objects in the reference video. For example, the foreground and background separator 410 may separate the foreground and the background in the reference video and the depth map based on the foreground level or the background level as shown in
Depending on embodiments, the foreground and background separator 410 may separate the foreground and the background based on a global motion of background objects and a local motion of foreground objects in the reference video.
Depending on embodiments, the foreground and background separator 410 may separate the foreground and the background based on an edge structure in the reference video.
The transition area detector 420 may receive, from the foreground and background separator 410, data in which the foreground and the background are separated, and may detect a transition area based on the received data. The transition area detector 420 may detect, as the transition area based on the data, an area where a foreground-to-background transition or a background-to-foreground transition occurs. As an example, when the view index v=3 as shown in
The transition distance measurement unit 430 may measure a distance between transition areas. Specifically, the transition distance measurement unit 430 may measure a transition distance based on the detected transition area. For example, the transition distance measurement unit 430 may measure a transition distance from a given pixel position to a pixel position where a foreground-to-background transition or a background-to-foreground transition occurs.
The transition position calculator 440 may calculate a depth transition for each pixel position according to a view change. That is, the transition position calculator 440 may calculate depth transition data based on a view transition position where a foreground-to-background transition or a background-to-foreground transition occurs. For example, the transition position calculator 440 may calculate depth transition data based on pixel positions where the foreground-to-background transition or the background-to-foreground transition occurs between neighboring reference views.
The transition position calculator 440 may calculate the depth transition data by measuring the transition distance from the given pixel position to the pixel position where the foreground-to-background transition or the background-to-foreground transition occurs.
The transition position calculator 440 may calculate the depth transition data using intrinsic camera parameters or extrinsic camera parameters.
The quantizer 450 may quantize a position of the calculated depth transition. The quantizer 450 may perform quantization based on a rendering precision of a 3D video decoding system.
The entropy encoder 460 may perform entropy encoding of the quantized position of the depth transition.
Referring to
The foreground and background separator 510 may separate a foreground and a background based on depth values of foreground objects and background objects in a reference video. The foreground and background separator 510 may receive reference video/depth map data and may separate the foreground and the background based on the depth values in the reference video/depth map data.
The foreground area detector 520 may calculate local averages of a foreground area and a background area by referring to a foreground and background map generated from the depth transition data. Further, and the foreground area detector 520 may detect a transition area by comparing the calculated local averages.
The entropy decoder 530 may decode quantized depth transition data. That is, the entropy decoder 530 may receive a bitstream transmitted from the 3D video encoder 400, and may perform entropy decoding for a pixel position where a foreground-to-background transition or a background-to-foreground transition occurs, using the received bitstream.
The inverse quantizer 540 may perform inverse quantization of the depth transition data. The inverse quantizer 540 may perform inverse quantization of the entropy decoded depth transition data.
The foreground and background map generator 550 may generate a foreground and background map based on the transition area detected by the transition area detector 8520 and the inverse quantized depth transition data output from the inverse quantizer 540.
The distortion corrector 560 may correct a distortion by expanding a rendered view based on the inverse quantized depth transition data. That is, the distortion corrector 560 may correct the distortion by detecting pixels with a distortion greater than a predetermined reference value, based on the depth transition data. As an example, the distortion corrector 560 may replace the detected pixel value with the local average of the foreground area or the background area including a corresponding pixel, based on the depth transition data. As another example, the distortion corrector 560 may replace the detected pixel value with a nearest pixel value belonging to the same foreground area or background area, based on the depth transition data.
Referring to
In operation 620, the 3D video encoder 400 may determine a foreground area. That is, in operation 620, the 3D video encoder 400 may determine the foreground area by calculating a depth transition for each pixel position according to a view change. For example, the 3D video encoder 400 may determine the foreground area and the background area by comparing foreground and background maps of neighboring reference views using the transition area detector 420. When the pixel position belongs to the foreground in the reference view and belongs to the background in another reference view or vice versa, the 3D video encoder 400 may determine the pixel position as the transition area. For the transition area, a depth transition area may be calculated and a view position may be transited.
In operation 630, the 3D video encoder 400 may measure a transition distance. That is, in operation 630, the 3D video encoder 400 may measure, as the transition distance, a distance from a current pixel position to a transition position in a current reference view using the transition distance measurement unit 430. For example, in a 1D parallel camera model, the transition distance may be measured by counting a number of pixels from a given pixel to a first pixel for which a depth map value difference with respect to an original pixel exceeds a predetermined threshold.
In operation 640, the 3D video encoder 400 may calculate a transition area. That is, the 3D video encoder 400 may calculate depth transition data based on a view transition position where a foreground-to-background transition or a background-to-foreground transition occurs. For example, in operation 640, the 3D video encoder 400 may calculate the transition view position, according to Equation 11, using the transition position calculator 440.
In operation 650, the 3D video encoder 440 may quantize a position of the calculated depth transition. That is, in operation 650, the 3D video encoder 400 may obtain a position value that is quantized with a desired precision enough to support a minimum spacing between interpolated views, using the quantizer 450. The interpolated views may be generated at the 3D video decoder 500.
In operation 660, the 3D video encoder 400 may encode the quantized depth transition position. For example, in operation 660, the 3D video encoder 400 may perform entropy encoding of the quantized depth transition position. The 3D video encoder 400 may compress and encode data to a bitstream, and transmit the bitstream to the 3D video decoder 500.
Referring to
In operation 720, the 3D video decoder 500 may determine a transition area. That is, in operation 720, the 3D video decoder 500 may determine an area where a transition between the foreground and the background occurs, based on data in which the foreground and the background is separated using the transition area detector 520, which is the same as the 3D video encoder 400.
In operation 730, the 3D video decoder 500 may perform entropy decoding of a bitstream transmitted from the 3D video encoder 400. That is, in operation 730, the 3D video decoder 500 may perform entropy decoding of depth transition data included in the bitstream using the entropy decoder 530. For example, the 3D video decoder 500 may perform entropy decoding for a pixel position where the foreground-to-background transition or the background-to-foreground transition occurs, based on the depth transition data included in the bitstream.
In operation 740, the 3D video decoder 500 may perform inverse quantization of the decoded depth transition data. That is, in operation 740, the 3D video decoder 500 may perform inverse quantization of a view transition position value, using the inverse quantizer 540.
In operation 750, the 3D video decoder 500 may generate a foreground/background map. That is, in operation 750, the 3D video decoder 500 may generate the foreground/background map for a target view using the foreground and background map generator 550. When no transition occurs between neighboring reference views, the map may include a value of reference views. When the transition occurs, the inverse quantized transition position value may be used to determine whether a given position in the target view belongs to the foreground or the background.
In operation 760, the 3D video decoder 500 may correct a distortion with respect to a synthesized image based on the decoded depth transition data. That is, in operation 760, when a distortion, such as, an erosion artifact, occurs in a rendered view compared to the foreground/background map, the 3D video decoder 500 may output an enhanced rendered view by correcting the distortion with respect to the synthesized image. For example, the 3D video decoder 500 may perform erosion correction for a local area where the foreground/background map for the target view is given, based on the depth transition data using the distortion corrector 560.
Referring to
In operation 820, the 3D video decoder 500 may classify an outlier or an eroded pixel by comparing each foreground pixel and the background average. When a pixel is close to the background average, foreground pixels without outliers may be used.
In operation 830, the 3D video decoder 500 may calculate a foreground average μFG.
In operation 840, the 3D video decoder 500 may replace the eroded pixel value with the calculated foreground average μFG. That is, the 3D video decoder 500 may replace an eroded pixel with the foreground average.
Referring to
Referring to
The above-described embodiments may be recorded in non-transitory computer-readable media including program instructions to implement various operations embodied by a computer. The media may also include, alone or in combination with the program instructions, data files, data structures, and the like. The described hardware devices may be configured to act as one or more software modules in order to perform the operations of the above-described embodiments, or vice versa.
The embodiments can be implemented in computing hardware (computing apparatus) and/or software, such as (in a non-limiting example) any computer that can store, retrieve, process and/or output data and/or communicate with other computers. The results produced can be displayed on a display of the computing hardware. A program/software implementing the embodiments may be recorded on non-transitory computer-readable media comprising computer-readable recording media. Examples of the computer-readable recording media include a magnetic recording apparatus, an optical disk, a magneto-optical disk, and/or a semiconductor memory (for example, RAM, ROM, etc.). Examples of the magnetic recording apparatus include a hard disk device (HDD), a flexible disk (FD), and a magnetic tape (MT). Examples of the optical disk include a DVD (Digital Versatile Disc), a DVD-RAM, a CD-ROM (Compact Disc-Read Only Memory), and a CD-R (Recordable)/RW.
Further, according to an aspect of the embodiments, any combinations of the described features, functions and/or operations can be provided.
Moreover, the apparatus for encoding a 3D video may include at least one processor to execute at least one of the above-described units and methods.
Although embodiments have been shown and described, it would be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the disclosure, the scope of which is defined by the claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2010-0077249 | Aug 2010 | KR | national |
This application is a U.S. National Phase application of International Application No. PCT/KR2011/002906, filed on Apr. 22, 2011, and which claims the benefit of U.S. Provisional Application No. 61/353,821, filed on Jun. 11, 2010 in the United States Patent & Trademark Office, and Korean Patent Application No. 10-2010-0077249, filed on Aug. 11, 2010 in the Korean Intellectual Property Office, the disclosures of each of which are incorporated herein by reference.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/KR11/02906 | 4/22/2011 | WO | 00 | 8/1/2013 |
Number | Date | Country | |
---|---|---|---|
61353821 | Jun 2010 | US |