This invention relates to characterization of an image by a spatial activity metric.
Within the field of image processing, a spatial activity metric provides a measure of the texture within a prescribed region of the image. Using this measure of texture enables many common image processing applications to exploit the spatial masking effect of the human vision system. Spatial masking occurs because the human vision system mechanism can tolerate more distortion introduced into textured regions than into smooth regions of an image. In case of a video image that has undergone compression, distortion introduced into the image corresponds to compression artifacts caused by quantization. In the case of watermarking of video images, introduced distortion corresponds to embedded data.
Many image processing applications use the spatial activity metric to distinguish between flat or low-detail regions, where introduced distortion appears more visible to the human eye, from busy or textured areas, where introduced distortion appears less visible. While existing spatial activity metrics provide a good measure for grain-free images such as those associated with animation or those otherwise having low-resolution noise, such as low film grain strength, such metrics do not characterize the spatial activity in the presence of such noise. As a result, for images containing noise, spatial masking can yield to portions of the image with introduced distortion not otherwise masked.
Existing spatial activity metrics can be classified in three categories: (1) variance-based; (2) gradient-based; and (3) DCT-based. An explanation of each appears below, all based on a 16×16 block
This metric measures the spatial activity using the variance of luminance. A representative metric in this category is the metric used in the rate control algorithm of the MPEG-2 reference software.
where vari is the variance for ith 8×8 subblock. Using this metric, the MPEG-2 reference software allows more distortion in the textured regions and less distortion in the smooth ones, and therefore obtains higher visual quality for the entire picture at the same bit rate.
Two metrics exist in this category. The first metric ACTgra1 considers the horizontal and vertical gradients:
where Iij is the luminance value at pixel (i,j). The second metric ACTgra2 takes into account the diagonal directions:
where gradij,n is a local gradient computed by one of the following four 5×5 directional high-pass filters at pixel (i,j) [3]:
This metric makes uses the AC component of the DCT coefficients of luminance values. The first metric considers the absolute values of the AC coefficients:
where F(i,j) is the DCT coefficient of frequency (i,j). Another one normalizes the AC coefficients by the DC coefficient [3]:
The above metrics provide a good measure for grain-free or low-resolution images having low noise, such as film grain strength. However, these metrics treat noise, such as film grain as the texture and assign a relatively high value to a smooth region that consists mainly of strong noise, such as film grain. Hence, these metrics will mistakenly consider smooth regions with strong noise as textured regions. Consequently, such metrics will likely introduce more distortion than these smooth regions can mask.
Therefore, a need exists for a new spatial activity metric that effectively measures the spatial activity of a region in the presence of noise, such as film grain.
When applying a spatial activity metric measure in connection with spatial masking to improve the quality of a displayed image, the visual quality of the image serves as a measure of effectiveness of the metric. In case of video compression, decoded pictures serve as the displayed picture. In case of watermarking applications, the displayed pictures will possess embedded data. Measuring the effectiveness of the spatial activity metric involves extensive subjective evaluation.
To reduce the evaluation time, a need exists for an objective method that assesses the performance of a spatial activity metric.
Briefly, in accordance with a preferred embodiment of the present principles, there is provided a method for establishing a spatial activity metric for characterizing an image. The method commences by first determining a spatial activity metric. Thereafter, noise within the image (which can include film grain) is estimated. The spatial activity metric is then reduced by the amount of the estimated noise so that upon using the spatial activity metric in connection with spatial masking, the likelihood of unmasked distortion caused by the presence of noise, such as film grain will be reduced.
In accordance with another aspect of the present principles, there is provided a method for characterizing the effectiveness of a spatial activity metric of the type used to provide a measure of the texture in an image. The method commences by determining similarity of the measures made by the spatial activity metric for regions in the image of similar texture. A determination is also made of the difference in the measures made by the spatial activity metric for regions of different texture in the image. The extent to which a spatial activity measure provides similar measures for similarly textured regions and a large spread between measures for regions of different texture reflects a high degree of performance for that metric.
While existing spatial activity metrics provide a good measure for grain-free or low-resolution images where the film grain strength is low, such metrics exhibit a strong dependency not only on spatial activity but also on the brightness in the presence of film grain. For purposes of discussion, film grain generally appears within an image as a random texture generated during film development. Film grain is generally regarded as additive, signal-dependent noise, which differs in size, shape and intensity depending on the film stock, lightening condition and development process. The intensity of film grain appears highly correlated to pixel intensity, which explains why existing spatial activity metrics strongly depend on brightness.
In accordance with the present principles, there is provided a method for establishing a spatial activity metric that has greatly reduced dependency on the brightness. The method of the present principles reduces the dependency on brightness by (1) estimating the film grain, typically through modeling, and (2) removing the film grain strength from the spatial activity metric.
Film grain can be estimated, typically by modeling, in accordance with the following relationship:
g(i,j)=f(i,j)+f(i,j)γ*n(i,j), (7)
where g(i,j) and f(i,j) constitute observed and noise-free pixel values at location (i,j), respectively, γ is a constant given the film stock and shooting condition, and n(i,j) is a zero mean normal distributed noise. The product of f(i,j)γ*n(i,j) characterizes the film grain. Usually γ falls between 0.3-0.7, and in most cases, has a value of around 0.5. For a smooth region where f(i,j) reside close together, equation (8) can be approximated by:
g(i,j)=f(i,j)+
where
Assuming n(i,j) is independent off(i,j), the relation of the variance can be obtained as follows:
σg2=σf2+
where σg2, σf2 and σn2 are the variance of g(i,j), f(i,j) and n(i,j), respectively, and σgrain2 is the variance of film grain.
For images initially recorded on film, and thereafter converted to high resolution video, usually a large number amount of regions appear flat or almost flat, that is they lack texture. The variances of these regions depend mostly on the film grain, i.e., σf2≈0 and σg2≈σn2. For flat regions with similar brightness, the characteristic of the grain are homogeneous, which results in a very close σg2 smaller than the variance of the textured regions. As a consequence, the histogram of the variance usually has a small peak. Therefore a histogram-based method of estimating the grain intensity will produce good results.
To make use of a histogram-based method, regions within the image first undergo classification into multiple groups according to separate brightness ranges. For each group, calculation of the histogram of variances occurs to enable identification of the first peak σpeak,i2. Using a σpeak,i2 from all brightness ranges, σgrain2 can be derived as a linear function of the brightness using linear regression.
To eliminate the effect of film grain, the film grain term gets deducted from the existing variance-based metric ACTvar in accordance with the relationship:
ACT
var
new
=ACT
var
−m(σgrain2), (10)
where m(σgrain2) is a function of σgrain2. In a particular embodiment, the effect of the film grain can be regarded as the variance of film grain, i.e., m(σgrain2)=σgrain2.
When considering noise other than film grain, similar methods can be used to model the noise and remove its effect on the spatial activity metric. Extending the approach of removing the effect of noise, such as film grain, from other existing spatial activity metrics will yield improved performance for pictures with noise.
Step 140 follows step 130 and a second iterative loop begins, depicted as Loop(2), whose the loop index value j initially equals unity. During each execution of this loop, the loop index value j increases by unity. Loop(2) includes steps 150-160. Step 150 undergoes execution to calculate the spatial activity metric for the jth region. Repeated execution of step 150 enables calculation of the spatial activity metric for every region for the ith set of data. For block-based video compression applications such as MPEG-4 AVC, a region refers to a 16×16 macroblock.
Calculation of the spatial activity metric during step 150 begins by initially establishing a metric, typically using one of the known techniques described previously. Thus, the initially established spatial activity metric could constitute a variance-based, gradient-based, or DCT-based metric. Following initial establishment of the metric, the estimated noise, for example film grain, typically obtained from modeling is subtracted from the metric to remove the effect of such noise. The spatial activity metric calculated during step 150 then gets applied to the image during step 160. When applied to the image, the spatial activity metric provides a measure of texture to permit spatial masking by introducing distortion such that the introduced distortion appears less visible in flat regions where distortion is more visible to the human eye. The second loop (Loop(2)) ends during step 170. In other words, the steps within Loop(2) undergo re-execution until such time as the loop variable j reaches its maximum value, corresponding to the number of total regions. The first loop (Loop(1)) ends during step 180. In other words, the steps within Loop(1) undergo re-execution and until such time as the loop variable i reaches its maximum value, corresponding to the number of total sets of data to be read. The entire process ends at step 190 after every region for all sets of data has undergone processing.
Referring to
Step 230 then undergoes execution to initiate a first iterative loop, depicted as Loop(1), whose the loop index value i initially equals unity. During each execution of this loop, the loop index value i increases by unity. This loop includes steps 240-280. Step 240 undergoes execution to initiate reading of an ith set of data from an image. Step 250 initiates a second iterative loop, depicted as Loop(2), whose the loop index value j initially equals unity. During each execution of this loop, the loop index value j increases by unity. Loop(2) includes steps 260-270. Step 260 undergoes execution to calculate the spatial activity metric for the jth region. Repeated execution of step 260 assures calculation of the spatial activity metric for every region for this set of data.
Calculation of the spatial activity metric during step 260 begins by initially establishing a metric, typically using one of the known techniques described previously. Initially, the spatial activity metric could constitute a variance-based, gradient-based, or DCT-based metric. Following the initial establishment of the metric, the estimated noise, for example film grain, typically obtained from modeling, get subtracted, to remove the effect of such noise on the metric. The spatial activity metric calculated during step 260 then gets applied to the image during step 270. The spatial activity metric provides a measure of texture to permit spatial masking by introducing distortion so that the introduced distortion introduced appears less visible in flat regions where distortion is more visible to the human eye. The second loop (Loop(2)) ends during step 280. In other words, the steps within Loop(2) undergo re-execution and until such time as the loop variable j reaches its maximum value, corresponding to the number of total regions. The first loop (Loop(1)) ends during step 290. In other words, the steps within Loop(1) undergo re-execution and until such time as the loop variable i reaches its maximum value, corresponding to the number of total sets of data to be read. The entire process ends at step 295 after every region in all sets of data have undergone processing.
The advantage of the process of
The process of
Calculation of the spatial activity metric during step 350 begins by initially establishing a metric, typically using one of the known techniques described previously. Initially, the spatial activity metric could constitute a variance-based, gradient-based, or DCT-based metric. Following the initial establishment of the metric, the estimated noise, for example film grain, typically obtained from modeling, get subtracted, to remove the effect of such noise on the metric. The spatial activity metric calculated during step 350 then gets applied to the image during step 360. The activity metric provides a measure of texture to permit spatial masking by introducing distortion so that the introduced distortion introduced appears less visible in flat regions where distortion is more visible to the human eye.
The second loop (Loop(2)) ends during step 370. In other words, the steps within Loop(2) undergo re-execution and until such time as the loop variable j reaches its maximum value, corresponding to the number of total regions. The first loop (Loop(1)) ends during step 380. In other words, the steps within Loop(1) undergo re-execution and until such time as the loop variable i reaches its maximum value, corresponding to the number of total sets of data to be read. The entire process ends at step 390 after every region in all sets of data have undergone processing.
Generally, a spatial activity metric assists in exploiting the spatial masking effect. For example, to obtain homogeneous high visual quality in video compression applications, the regions associated with smaller spatial activity metric measures undergo compression at lower quantization stepsizes. In contrasts, regions associated with larger spatial activity metric measures undergo compression at higher quantization stepsizes. Therefore, the performance of a spatial activity metric will strongly influence the visual quality of displayed pictures. It is common to judge the performance of a spatial activity metric by assessing the visual quality of the displayed pictures. Such a process involves extensive subjective evaluation.
In accordance with another aspect of the present principles, there is provided a method for assessing the performance of a spatial activity metric. As described hereinafter, the method makes such an assessment by objectively assessing the performance of the metric for both smooth and textured regions.
Preferably, an effective spatially activity metric should assign similar measures to regions with similar visual smoothness, i.e., the spatial activity metric measures should concentrate around one level for all smooth regions.
In order for a given image processing application, such as video compression, to exploit the masking effect and allow more distortion in textured regions, the spatial activity metric measure should provide a spread between the smooth and textured (e.g., busy regions).
From the above two criteria, an assessment, hereinafter referred to as a “Smooth Busy Area Spread (SBAS)” can be defined to quantify how a spatial activity metric (1) assigns measures to regions with similar visual smoothness and (2) separates the smooth regions from the busy ones:
Mathematically, the Smooth Busy Area Spread can be expressed by:
where avgpic is the average metric for the whole picture, avgflat and σflat are the mean and the standard deviation of the metric in the smooth regions, respectively. Note that the smooth regions are manually selected and serve as visual hints for this method. When a spatial activity metric assigns similar measures to regions with similar visual smoothness, σflat will be small. On the other hand, when the spatial metric separates the smooth regions from the textured ones, |avgpic−avgflat| becomes large. Therefore, the larger the value of SBAS, the more effective the spatial activity metric.
During each execution of step 440, a value of SBAS, as described with respect to equation (11) gets calculated for each spatial activity metric i. The loop ends at step 450. In other words, the steps within this loop undergo re-execution until such time as the loop variable i reaches its maximum value, corresponding to the number of spatial activity metrics undergoing evaluation. During step 460, an overall evaluation of the spatial activity metrics occurs. The spatial activity metric having the largest SBAS becomes the “best” metric.
The coding performed by the entropy coding block 510 depends on a motion estimation performed by motion estimation block 525 on a reference picture stored in a reference picture store 527. A motion compensation block 530 determines the amount of motion compensation needed for the motion estimation established by the motion estimation block 525. The motion estimation is applied via a switch 535 to a second input of the summing block 502 during operation of the encoder in the inter-prediction mode. A macroblock (MB) decision block 540 controls the switch 535 to select between inter-prediction and intra-prediction coding based on the which mode affords the best coding for the instant macroblock.
When operating in the intra-prediction mode, the switch 535 couples the output of an intra-prediction block 545 that provides a same-picture prediction based on the sum of the output signals of an inverse transform and quantization block 550 and the macroblock decision block 540, as provided by a summing block 555. The inverse transform and quantization block 550 performs an inverse and quantization operation on the output signal produced by the transform and quantization block 505. The output of the summing block 555 connects to a deblocking filter 560 that performs on pictures for subsequent storage in the reference picture buffer 527.
The encoder of
As seen in
ΔQP=q(ACTnew) (12)
In this way, the spatial activity metric will map to the quantization step size or QP parameter offsets. During step 650, the encoder will encode the ith macroblock (MBi), typically using an existing compression standard, such as MPEG-2, MPEG-4 AVC or VC-1. The loop ends at step 660. In other words, the steps the undergo re-execution and until such time as the loop variable i reaches its maximum value, corresponding to the number of macroblocks. Thereafter, the process ends at step 670.
The input pictures undergo objective quality analysis, with regard to reference pictures, by a first quality analyzer 710. A second quality analyzer 720 receives the output of the first analyzer along with the spatial masked pictures from the block 705 for comparison against the set of reference pictures. The output of the second analyzer 720 provides a quality assessment result.
The foregoing describes a technique for characterizing an image using a spatial activity metric that takes account of image noise.
This application claims priority under 35 U.S.C. 119(e) to U.S. Provisional Patent Application Ser. No. 60/848,296, filed Sep. 29, 2006, the teachings of which are incorporated herein.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US07/20227 | 9/18/2007 | WO | 00 | 3/4/2009 |
Number | Date | Country | |
---|---|---|---|
60848296 | Sep 2006 | US |