The present disclosure relates to a field of video coding technology, and more particularly to a fast view synthesis optimization method for coding of depth map for 3D-high efficiency video coding (HEVC) based on texture flatness.
Three dimensional (3D) video makes people see different scenarios with slight direction differences for their left and right eyes respectively; thus it can provide a viewing experience with depth perception compared to the conventional 2D video. At present, the common 3D video display is stereoscopic display, which provides two views of videos. With the development of multimedia technology, multi-view display becomes more and more popular in multimedia information industry due to its 3D visual perception for naked eyes. However, the increase of views will multiply the video data and in turn generate a great burden to the transmission and storage. An effective coding strategy needs to solve these problems. The up-to-date 2D video coding standard is the High Efficiency Video Coding (HEVC) standard and was officially approved in 2013. Meanwhile, 3D video coding standardization has been in progress.
Multi-view plus depth (MVD) format in 3D-HEVC includes two or three texture videos and their corresponding depth maps, as shown in
Conventional video coding methods use rate distortion optimization techniques to make decision for the motion vector and mode selection process by choosing the vector or mode with the least rate distortion cost. The rate distortion cost is calculate by J =D+λ·R, where J is the rate distortion cost, D is the distortion between the original data and the reconstructed data, λ is the Lagrangian multiplier, and R is the number of bits used. D is usually measured by calculating the sum of squared differences (SSD) or the sum of absolute differences (SAD) between the original data and reconstructed data of current video. While depth maps are only used to synthesize virtual views and cannot be seen by the audiences directly, it may not achieve the satisfactory coding results using the conventional video coding method for depth maps. The distortion measure for depth maps need to also consider distortions in the synthesized intermediate views.
In 3D-HEVC, Synthesized View Distortion Change (SVDC) is used as the distortion calculation metric for rate distortion optimization in depth coding. SVDC defines the distortion difference between two synthesized textures, as shown in
The original SVDC method includes warping, interpolating, hole filling and blending to get the synthesized views. Then the encoder compares the two virtual views synthesized by the original depth maps and the encoded depth maps, respectively. Finally, the sum of squared differences of each synthesized pixel will be calculated. The whole process of the SVDC method is shown in
The present disclosure provides a method for fast 3D video coding for HEVC, which utilizes the pixel regularity from flat texture region to set threshold. With the threshold all the flat regions in depth maps can be judged and the view synthesis process of pixels from these flat regions can be skipped to speed up the view synthesis process in rate distortion optimization and reduce the coding complexity.
The present disclosure includes the following schemes in order to resolve technical problems noted above.
(1.1) Coding information are extracted from the coded textures, including the block size (n×n), the coding mode (Mode) and each pixel luminance value in reconstructed blocks (Y);
(1.2) The luminance regularity among pixels from flat texture regions is analyzed. Using the information obtained in step (1.1), blocks whose Mode types are IntraDC are regarded as flat texture blocks. Then all the sizes and the pixel luminance values of these blocks are recorded and the luminance regularity among pixels from these blocks are analyzed based on statistical method to set threshold T;
(1.3) During the view synthesis process of rate distortion optimization for depth map coding, the threshold T obtained in step (1.2) is used to divide the current depth block into l flat lines and m non-flat lines;
(1.4) Using the division results obtained in step (1.3), the view synthesis process for pixels in flat lines is terminated to decrease the high coding complexity due to the pixel-by-pixel rendering process.
The statistical method used to analyze the pixel regularity and to set the threshold in step (1.2) in this present disclosure is realized by the following schemes.
(2.1) The average luminance value of pixels in flat texture blocks is calculated by
where {tilde over (Y)} denotes the average pixel luminance value, i and j denote the pixel coordinates, n×n denotes the block size, and Y denotes the pixel luminance value;
(2.2) The average difference of pixels in flat texture blocks is calculated by
where A denotes the average difference, {tilde over (Y)} denotes the average pixel luminance value, i and j denote the pixel coordinates, n×n denotes the block size, and Y denotes the pixel luminance value;
(2.3) For each testing sequences {newspaper, balloons, kendo, gtfly, undodancer, poznanstreet}, all the flat texture blocks are used to calculate A values by step (2.2). Then these A values are averaged and rounded down to set the threshold T.
The division process used to judge the flat texture lines in step (1.3) in the present disclosure is realized by the following schemes.
(3.1) For each line of current depth block, the absolute value of the luminance differences between neighboring pixels is calculated by ΔY=|Yp−Yq|, where ΔY denotes the absolute value of the luminance differences, Y denotes the pixel luminance value, p and q denote the neighboring two pixels' abscissa values of the current line;
(3.2) If all the pixels in the line satisfied the condition where the difference ΔY between neighboring pixels are all less than the threshold T, this line could be regarded as a flat line. Otherwise, this line would be regarded as a non-flat line.
The following steps describe this disclosure, but don't limit the coverage of the disclosure.
This disclosure includes a fast view synthesis scheme during rate distortion optimization for depth map coding in 3D-HEVC. The procedure of this disclosure is shown in
The specific implantation is realized by the following steps.
Step (1), coding information are extracted from the TXT files, including the size of the block (n×n), the coding mode of the block (Mode) and the luminance value of pixels in the reconstructed block (Y);
Step (2), the luminance regularity among pixels from the flat texture regions is analyzed. Using the coding information extracted by Step (1), blocks whose Mode types are IntraDC are regarded as flat texture blocks. The n×n sizes of these blocks and all pixels' reconstructed luminance values Y are then recorded. The luminance regularity among pixels from these blocks are analyzed based on statistical method and are used to set the threshold T, as shown by the following steps.
Step (2.1), the average luminance value of pixels in flat texture blocks is calculated by
where
Step (2.2), the average difference of pixels in flat texture blocks is calculated by
where A denotes the average difference, i and j denote the pixel coordinates, n×n denotes the block size,
Step (2.3), the average difference A is calculated by step (2.2) for each flat texture block from each of the testing sequences {newspaper, balloons, kendo, gtfly, undodancer, poznanstreet}. Then these A values are averaged and rounded down to set the threshold T.
Step (3), during the view synthesis process of rate distortion optimization for depth map coding, the threshold Tobtained in step (2) is used to divide the current depth block into l flat lines and m non-flat lines, as shown in the following steps.
Step (3.1), for each line of current depth block, the absolute value of the luminance differences between neighboring pixels is calculated by ΔY=|Yp−Yq|, where ΔY denote the absolute value of the luminance differences, Y denotes the pixel luminance value, p and q denote the neighboring two pixels' abscissa values of the current line.
Step (3.2), if all the pixels in the line satisfied the condition where the difference ΔY between neighboring pixels are all less than the threshold T, this line could be seen as a flat line. If the condition is not satisfied, this line will be seen as a non-flat line.
Step (4), using the division results obtained in step (3), the view synthesis process for pixels in flat lines is terminated to decrease the high coding complexity caused by the pixel-by-pixel rendering process.
In order to evaluate the performance, the method mentioned in this disclosure is simulated and compared to the original method which utilize the pixel-by-pixel rendering process during the rate distortion optimization process. Both of the two methods are built on the 3D-HTM reference software 9.2. {newspaper, balloons, kendo, gtfly, undodancer, poznanstreet} are testing sequences. The resolution is 640*480. The parameters of the specific testing environment are shown in Table 1.
As compared to conventional techniques, coding quality performance of the present disclosure is shown in Table 2. As compared to the conventional techniques, the present disclosure saves time and skips rates, as illustrated in
Number | Date | Country | Kind |
---|---|---|---|
201410671188 | Nov 2014 | CN | national |
This application is a national stage application of International application number PCT/CN2014/093204, filed Dec. 6, 2014, titled “A Method for Fast 3D Video Coding for HEVC”, which claims the priority benefit of Chinese Patent Application No. 201410677118.8, filed on Nov. 23, 2014, which is hereby incorporated by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2014/093204 | 12/6/2014 | WO | 00 |