1. Field of the Invention
Embodiments of the present invention generally relate to a method and apparatus for reducing ringing artifacts.
2. Description of the Related Art
Ringing artifacts reduction techniques can usually be classified into two categories: transform domain methods and spatial domain methods. Transform domain methods are usually more accurate. However, in some cases, transform coefficients may not be available, e.g., for blocks after video decoding, in which the transform coefficients are already lost. For this case, we have to detect ringing artifacts on the spatial domain. On the other hand, transform domain methods are usually more complicated. Below we briefly describe some related work in these two categories.
In the current image and video compression standards, such as JPEG for image and MPEG and H.26× for video, signals are usually divided into fixed-size blocks (usually 8×8), transformed, quantized, followed by an entropy coding of the transform coefficients. Coarser quantization stepsize is usually used for high frequency coefficients, in order to achieve compression gain (saving bits) but ringing artifacts can be introduced if the signal being transformed has hard transitions, as shown in the example of
As discussed above, spatial domain methods do not rely on the MPEG coding parameters. With regard to the related work, one method is proposed in [Kim06]. In this scheme, processing is performed within each 3×3 window and separately in the vertical and horizontal directions. A threshold is used to determine the horizontal and vertical correlations within the 3×3 window, and a 3-tap or 2-tap low-pass filter is then adaptively selected according to the level of the detected spatial correlation. Although this scheme is simple to implement, it is not sufficient to remove ringing artifacts, mainly due to the small size kernel used.
Another method from [Mae06] improves the performance of ringing artifacts reduction by making use of the information in a bigger window with size of 15×15. The neighboring pixel difference within each 3×3 window is first calculated. This value is then compared to the maximum difference of neighboring pixels in each 15×15 window to determine if there likely exists an edge within the 3×3 window, or outside the 3×3 window but within the 15×15 window, or there is no edge within the 15×15 window. Nonlinear low-pass filtering is then applied to the pixel, if the there exists an edge within the 15×15 window but there is no edge within the 3×3 window. The scheme can achieve better performance in terms of ringing artifacts reduction, but, like most de-ringing algorithms, it suffers from over smoothing details during ringing artifacts removal.
The method in [Des06] is intended to overcome this by combining de-ringing and edge sharpening filters into a single filter. In this method, each pixel is first classified as one of the three types: “Visual non-edge pixel substantially surrounded by visual non-edge pixels”, “Visual edge pixels”, and “Visual non-edge pixels substantially surrounded by edge pixels”. This is achieved by comparing the sum of edge strength in each 3×3 window, which is calculated based on the gradients measurement using the Roberts operators, with some pre-defined threshold. De-ringing filtering, sharpening filtering, and no processing are performed to those three types of pixels, respectively. De-ringing filtering output is simply the average over the non-edge pixels within the 3×3 window. The sharpening filter is a nonlinear filter, also performed within the 3×3 window. The drawback of this method is, still, over smoothing edges, together with poor quality of removing ringing artifacts. This is because the de-ringing filtering is applied to each 3×3 window, which is usually not large enough since ringing artifacts can spread beyond it.
Another spatial-domain scheme in [Osa06] uses a larger window with the size of 5×5. It also makes use of the edge information within a 16×16 MB to guide the low-pass filtering. It performs better performance the above-mentioned methods. However, the algorithm used to detect edge is too simple, which can hardly differentiate between busy details and true edges. In addition, the horizontal and vertical filtering is performed separately, which usually does not perform as well as a non-separate filter.
Fuzzy filtering has been proposed in [Nie05], where 5×5 kernel is used. However, the performance of the fuzzy filter heavily depends on the chosen threshold. In addition, the same as in [Osa06], this algorithm tends to smooth picture details due to the simple edge strength detection scheme used.
In other words, ringing artifact in video is caused by compression, which is introduced by the removal of high-frequency information and, thus, appears as oscillations in an image. Thus, there is a need for an improved method and apparatus for reducing ringing artifacts, while preserving picture details.
Embodiments of the current invention generally relate to a method and apparatus for ringing artifacts reduction for compressed video signals. The method includes receiving luma data to the digital signal processor, calculating sum of gradient of the luma data; calculating SAD of the luma data; performing pixel classification based of the calculated SAD and sum of gradient, performing erosion on a detected edge pixel indicator on a detected flat pixel indicators, determining at least one of the strength or weakness of the an edge based on the determined edge erosion, performing horizontal dilation on the detected edge pixel indicators and edge strength; and performing at least one of sigma or bilateral filtering to the luma data according to the detected edge pixel indicator, flat pixel indicator, edge strength, the number of very flat pixel in the block
So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments. It should be noted that a computer readable medium is any medium that may be utilized by a computer to read, write, archive, store, access, and/or execute data.
Video compression is a lossy compression, which means some information of the original source will be lost during compression and at the same time, as a result, some artifacts will be generated. Ringing artifact is one of such artifacts caused by compression. It is introduced by the removal of high-frequency information and thus appears as oscillations in an image. It usually happens at hard transitions/strong edges between foreground and background, as shown in
Hard transition corresponds to high frequency information. A typical example is a step function, which has a plentiful high frequency information.
This type of ideal low-pass filtering is very common in image and video compression, due to the coarser quantization used for high frequency coefficients than for the low frequency coefficients in the transform domain. Due to this reason, ringing artifacts are commonly observed in compressed image and video.
Techniques of ringing artifacts reduction usually consist of two stages: detection of the ringing artifacts and the removal of them. Since the removal of ringing artifacts usually appear in the form of low-pass filtering, which tends to smooth texture details, we need to be very cautious to apply low-pass filter to pictures. In other words, there is a need to accurately identify the locations of ringing artifacts and only apply low-pass filtering to such locations.
In this invention, we propose a method and apparatus to efficiently perform ringing artifacts reduction for compressed images and videos.
Ringing artifacts reduction schemes are generally performed in two stages, the detection and removal of ringing artifacts, as shown in
As discussed in the first section, ringing artifacts are caused by the removal of high frequency DCT coefficients during compression. In the MPEG-2 standards, DCT and the coding of the DCT coefficients are performed per 8×8 block. Thus, the ringing artifacts are usually restricted within each 8×8 block. However, due to the motion compensation, which is performed per 16×16 MB, the ringing artifacts occurred in the reference frame can be propagated to the next frame which relies on this frame as a reference. This can cause ringing artifacts spreading across blocks, which makes the detection of ringing artifacts complicated.
In one embodiment the rule is: if a MB is intra-encoded, then we know that the ringing artifacts should be limited to one 8×8 block, since the DCT and quantization parameters of these four blocks of this MB are independent to each other. If, however, a MB is inter-encoded, we assume that the ringing artifacts can spread across blocks, but are still restricted within the MB.
As discussed above, ringing artifacts usually occur around strong edges. Thus, the ringing artifacts detection stage in this invention is realized based on the gradient information within each 16×16 MB. Based on the gradient measurement, each pixel is classified as “strong edge”, “edge”, “flat”, “very flat”, and the other. In order to be robust to noise, each of these classifications passes some morphological operations such as dilation and erosion. The ringing artifacts removal stage is realized through a 5×5 bilateral filter (also called sigma filter), whose performance heavily depends on the chosen thresholds. The thresholds used in the sigma filter are adaptively selected according to the parameters obtained from the first stage.
Sobel operators are widely used for edge detection due to their simplicity and efficiency. In this invention, we employ them to calculate the gradients. The two Sobel operators, which correspond to the gradients in the vertical and horizontal.
We use the sum of the gradients, calculated as below, to indicate the strength of gradients.
G_sum=((|Gx|+|Gy|)>>1), (3.2)
where Gx and Gy are, respectively, the outputs when applying the two Sobel operators Vx and Vy on each 3×3 window with the pixel being processed sitting at the center. In addition, the sum of absolute difference (SAD) of the 3×3 window with respect to the center pixel is calculated.
where Y[1][1] is the center pixel value of the 3×3 window.
Based on the sum of gradients and SAD, we can classify each pixel into five types: very flat area, flat area, edge area, strong edge area, and the rest. The previous four identified types correspond to four variables, Pixel_veryflat1, Pix_flat1, Pix_edge1, and Pix_strongedge1, respectively.
For the three thresholds related to gradient, drn_Thr_Gsum_strongedge>=drn_Thr_Gsum_weakedge>=drn_Thr_Gsum_flat must hold. In the simulations, we set drn_Thr_Gsum_strongedge, drn_Thr_Gsum_weakedge, and drn_Thr_Gsum_flat to 256, 156, and 100 respectively. We set drn_Thr_SAD_veryflat to 16. This classification scheme, however, is not sufficient to differentiate edge from busy details, which usually also corresponds to large value of G_sum. In addition, it will be sensitive to noise if the classification of each pixel is independently performed. In this invention, we employ morphological operations to improve the accuracy and effectiveness of the classification.
As discussed above, in addition to true edges, busy details and noise tend to lead to large value of G_sum, too. However, for the busy and details area, the pixels that have large G_sum usually have a random distribution. On the other hand, for the edge area, pixels that have large G_sum usually have a particular pattern. For example, for the area with horizontal edges, pixels along the edge all should have large G_sum. This motivates us to use orientation-based erosion techniques to differentiate true edges from busy texture areas and noisy areas.
Note that the erosion process for edge pixels is performed in the two blocks “Edge Erosion” and “Strong Edge Erosion” in
For example, to perform this calculation, e00_conct, e01_conct, e02_conct, e10_conct, e12_conct, e20_conct, e21_conct, and e22_conct may correspond to one of the eight connectivity cases (see
Each edge orientation has two associated edge connectivity cases. This is because the kernel used to calculate Pix_edge is a 3×3 operator, which leads to 3-pixel-wide outputs of Pix_edge along the edge. Thus, for example for the case of horizontal edge, if the edge is between the second and the third line in the 5×5 window, the distribution of Pix_edge1 will be the case of e01_conct. If the edge is between the third and the fourth line, then the distribution of Pix_edge1 will be the case of e21_conct. A true edge is detected (thus the output Pix_edge is 1) only when one of the eight connectivity cases is true and the value of Pix_edge1 of the center pixel is also 1, as given below.
The eight connectivity cases are depicted in graph as shown in
The outputs of edge erosion blocks are Pix_edge and Pix_strongedge, which indicate the true edge and strong edge per pixel, respectively. The next step is to determine the true edge per 8×8 block, edge_blck. The calculation is given below
This step is pretty straightforward. That is, if the sum of Pix_strongedge in one block is greater than a predefined threshold, Thr_num_strongedge, or if the sum of Pix_edge is greater than a predefined threshold, Thr_num_edge, then this block is identified as a block with edge, i.e., edge_blck is set as 1. Otherwise, we can say this block does not have an edge, i.e., edge_blck is set as 0. In the simulations, good results were obtained when we set Thr_num_strongedge be 3 and Thr_num_edge be 3.
In the block of “Determine edge strength per block”, we calculate the edge strength per 8×8 block, which is the maximum of a shifted version of G_sum in the 8×8 block, as given by
This value will be used to determine the thresholds used in the sigma filter.
As we discussed above, due to motion compensation used in video compression, ringing artifacts occurred in one block may be propagated to the other blocks in the same MB. Thus, we need to determine whether there exists an edge or a strong edge within one MB. This is achieved through dilation of the edge identification of each block, edge_blck. MB is usually referred to as 16×16 block. In this embodiment, the dilation in an 8×16 block in considered, rather than 16×16 block, Thus, MB in this invention is referred to an 8×16 block. Then,
where edge_blck[1] and edge_blck[2] are the two edge_blck's of the two 8×8 blocks in one 8×16 MB, edge_MB[1] and edge_MB[2] are the two edge_MB's of the two 8×8 blocks in one MB after the dilation, num_veryflatpix_blck is given in (3.3), and Thr_num_faltpix is a predefined threshold. In one embodiment, Thr_num_faltpix is set to 20.
Similar to the dilation of edge_blck, dilation is employed in this block, too. The edge strength per MB is the maximum of the two edge_strength_blck's of the two blocks in one MB, as given below
where edge_strength_blck[1] and edge_strength_blck[2] are the two edge_strength_blck's of the two blocks in one 8×16 MB.
Similar to the edge erosion, the flat area identifications need the erosion process too, in order to be noise robust and accurately detect flat area. In this invention, one pixel is identified as flat, i.e., Pix_flat=1, only when the equation below is met
where Pix_flat [/j][i], 0≦j≦2, 0≦i≦2, are for the pixels in the 3×3 window with the pixel being processed sitting at the center. This process can be depicted in
As mentioned above, bilateral filter (also called sigma filter) is used to perform low-pass filtering to remove ringing artifacts in this invention. In the implementation, a 5×5 two-step bilateral, which is also called sigma filter, is used. This filter is given by
where Yc=Y[2][2] is the value of the center pixel in the 5×5 window, w[j][i] are the weights for each pixel, and Thr_sigma1 and Thr_sigma2 are the two thresholds. To illustrate the advantage of the Sigma filter, we can look at
In this case, it is obvious that there is a negative 45 degree edge along the center pixel. When the sigma filter is applied to the center pixel, the weights w[j][i] for those blue pixels will be 1 or 2 (depending how close these values with respect to the center pixel), while the weights for the white pixels will be 0, because the differences of those white pixels with respect to the center are large. Then according to (3.9), the output of the Sigma filter Ysigma will be kind of the average of those blue pixels (including the center pixel itself). By doing so, noise will be reduced while the edge will be essentially maintained.
The performance of the Sigma filter heavily depends on how the two thresholds Thr_sigma1 and Thr_sigma2 are chosen. It is clear that smaller thresholds lead to less effectiveness of noise removal, but larger thresholds tend to remove the details or smooth the edges when removing noise.
In this invention, those two thresholds are determined as below
Thr_sigma1=drn_max_sigma_thr1 (3.10)
Thr_sigma2=Thr_sigma1<<1 (3.11)
where drn_max_sigma_thr1 is a predefined threshold which serves as the upper bound of Thr_sigma1, and Thr_sigma2 is defined as twice of Thr_sigma1. It is preferred to adapt this parameter to the noise level of a picture to achieve the best noise reduction performance, if noise measurement can be done in the system. Otherwise, this parameter has to be set as a constant. In our simulations, we found setting drn_max_sigma_thr1 to 8 generates good results, for 8-bit data.
At the last step of this invention, the sigma filter output is mixed with the original pixel value to generate the final output. The blending factor is determined. Gradient calculated maybe used to determine the blending factor, where we scale the previously calculated gradient G_sum to G_sum sigma, which represents the local edge strength for the pixel being processed. The scale factor is calculated, which is used for scaling the blending coefficient used in the soft switch. drn_bld_scale_strgedge and drn_bld_scale_weakedge are two predefined threshold corresponding to strong and weak edge, respectively. In a practical system design, soft switch is usually preferred since it reduces flickers introduced by a hard switch. At the last step of this invention, the sigma filter output is mixed with the original pixel value to generate the final output. The blending factor is determined by the logic shown as below.
Once drn_bld_scale_strgedge >=drn_bld_scale_weakedge, the blending coefficient is calculated, where drn_bld_coeff_block_edge, drn_bld_coeff_QP, and drn_bld_coeff_local_edge are three predefined constants representing the blending factor corresponding to the edge strength of the block, the effect due to quantization parameter Q, and the local edge strength. This coefficient is then scalled by the previously calculated scaling factor, bld_scale, and then cliped to [0, drn_max_bld_factor], as shown from line 20 to 27. At the end, soft-switch is achieved through blending the original pixel value and the filtered pixel value based on the blending factor obtained above. The greater value of the blending coefficient, the final output will be closer to the filtered output.
Sigma filter output is selected only when there is a edge in the MB (edge_MB>=1), the pixel being processed is not an edge pixel (Pix_edge==0), the pixel being processed is relatively flat (Pix_flat==1), and there are not many very flat pixels in this block (num_veryflatpix_blck<=drn_Thr_num_flatpix_blck). Increasing quantization parameter, Q, increases the blending coefficient, since large quantization parameter means coarser quantization and more high frequency DCT coefficients removal, which may lead to more severe ringing artifacts. Usually, if the MPEG information of Q per MB is available in performing de-ringing, better performance can be achieved. If such information is not available, Q can be set as a constant, e.g., 24, or the average Q per frame if it is available.
In one embodiment, as shown in
Due to novel morphologic processing used in edge detection, the invention reduces the ringing artifacts introduced due to compression while preserving the details and edges of images at the same time. Embodiment of this invention may be implemented as a pure post-processing system without using any MPEG coding parameter. While some MPEG coding parameters, such as the quantization parameter for each MB is available, the system performance may be enhanced. Hence, in one embodiment, a novel morphologic scheme differentiates edges from busy textures and then a low-pass filtering is applied to the pixels that are affected with ringing artifacts.
While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.
This application claims benefit of U.S. provisional patent application Ser. No. 61/77,254, filed Jul. 1, 2008, which is herein incorporated by reference.
Number | Date | Country | |
---|---|---|---|
61077254 | Jul 2008 | US |