The present invention relates generally to image and video noise analysis and specifically to a method and system for estimating different types of noise in image and video signals.
Noise measurement is an essential component of many image and video processing techniques (e.g., noise reduction, compression, and object segmentation), as adapting their parameters to the existing noise level can significantly improve their accuracy. Noise is added to the images or video from different sources [References 1-3] such as CCD sensor (fixed pattern noise, dark current noise, shot noise, and amplifier noise), post-filtering (processed noise), and compression (quantization noise).
Noise is signal-dependent due to physical properties of sensors and frequency-dependent due to post-capture filtering or Bayer interpolation in digital cameras. Thus, image and video noise is classified into: additive white Gaussian noise (AWGN) that is both frequency and signal independent, Poissonian-Gaussian noise (PGN) that is frequency independent but signal-dependent, i.e., AWGN for a certain intensity, and processed Poissonian-Gaussian noise (PPN) that is both frequency and signal dependent, non-white Gaussian for a particular intensity.
Many noise estimation approaches assume the noise is Gaussian, which is not accurate in practical video applications, where video noise is signal-dependent. Techniques that estimate signal-dependent noise, on the other hand, do not handle Gaussian noise. Furthermore, noise estimation approaches rely on the assumption that high frequency components of the noise exist, which makes them fail in real-world non-white, (processed) noise. This is even more problematic in approaches using small patches (e.g., 5×5 pixels) [References 4-9] because the probability to find a small patch with a variance much less than the noise power is higher than in large patch.
Embodiments of the invention or inventions are described, by way of example only, with reference to the appended drawings wherein:
It will he appreciated that for simplicity and clarity of illustration, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements. In addition, numerous specific details are set forth in order to provide a thorough understanding of the example embodiments described herein. However, it will be understood by those of ordinary skill in the art that the example embodiments described herein may be practiced without these specific details. In other instances, well-known methods, procedures and components have not been described in detail so as not to obscure the example embodiments described herein. Also, the description is not to be considered as limiting the scope of the example embodiments described herein.
A method and a system are provided for the estimation of different types of noise in images and video signals using preferably, intensity-variance homogeneity classification will be described herein.
The computing system 101 may also include a camera device 106, or may be in data communication with CCD or camera device 100. In an example embodiment, the computing system also includes, though not necessarily, a communication device 107, a user interface module 108, and a user input device 110.
Throughout this sensing pipeline as best seen by module 104, noise is added to the image from different sources, including but not limited to a CCD sensor, creating noises such as fixed pattern noise, dark current noise, shot noise, and amplifier noise, post filtering (processed non-white noise), and compression (quantization noise), which render a digital image 206. Referring to
In a non-limiting example embodiment, the computing system may be a consumer electronic device, such as a camera device. In other words, the electronic device may include a physical body to house the components. Alternatively, the computing system is a computing device that is provided with image or video feed, or both.
It will be appreciated that any module or component exemplified herein that executes instructions or operations may include or otherwise have access to computer readable media such as storage media, computer storage media, or data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape. Computer storage media may include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data, except transitory propagating signals per se. Examples of computer storage media include RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by an application, module, or both. Any such computer storage media may be part of the computing system 101, or accessible Or connectable thereto. Any application or module herein described may be implemented using computer readable/executable instructions or operations that may be stored or otherwise held by such computer readable media.
The proposed systems and methods are configured to perform one or more of the following functions:
These features extend beyond [Reference 10], as the proposed systems and methods additionally a) estimate both the noise variance and the NLF; b) estimate both processed and unprocessed noise; and c) broadens the solution by adding many new features such as using temporal data. As a result, the performance significantly improved compared to [Reference 10].
1. Noise Modeling
1.1 White Noise
The input noisy video frame (or still image) I can be modeled as, I=Iorg+nd+ng+nq, where Iorg represents the noise-free image, nd represents white signal-dependent noise, ng represents white signal-independent noise, and nq represents quantization and amplification noise. With modern camera technology and nq can be ignored since it is very small compared to no=nd+ng, nd and ng are assumed zero-mean random variables with variance σd2 (I) and σg2, respectively. (For simplicity of notation, the symbol I is herein used to refer to either a whole image or to an intensity of that image; this will be clear from the context.) The NLF of the image intensity I can be assumed,
σ2(I)=σd2(I)+σg2 (1)
The computing system defines σo2=max (σ2(I)) as the peak σ2(I). When a video application, e,g., motion detection, requires a single noise variance, the best descriptive value is the maximum level, since a boundary can be effectively designated to discriminate between signal and noise. In (15), the computing system estimates σp2 as the peak of the level function of the observed video noise, which can be AWGN, PGN, or PPN. Under PGN, the peak variance is σo2 which becomes σp2 as estimated in (15); under PPN, the peak variance σa2 is estimated from σp2 using (2).
1.2 Processed Noise
Processing technologies such as Beyer pattern interpolation, noise removal, bit-rate reduction, and resolution enlargement, are being increasingly embedded in digital cameras. For example, spatial filtering is used to decrease the bit-rate. Accurate data about in camera processing is not available, in many cameras, however, processing call be bypassed manually, which allows to explore statistical properties of noise before and after processing. Experiments show that the low-power high frequency components of the noise (compared to noise power) are eliminated. As a result, low frequency and impulse shaped noise remains.
When PGN becomes processed, the resulting noisy image cap be modeled as Ip=Iorg+np with np as the PPN and peak variance σp2. The before in-camera processing image I is modeled as I=Ip+nγ with nγ as the distortion noise and peak variance σγ2. The method thus differentiates here between PGN no, PPN np, and distortion noise nγ, where no=np+nγ. Let 1≦γ≦γmax be the degree (power) of processing on σo2. The method estimates,
σo2=γ·σp2. (2)
γ=1 means the observed noise is PGN; γ=γmax means I was not heavily processed, as shown in
1.3 Noise Level Function
A better adaptation of video processing applications to noise can be achieved by considering the NLF instead of a single value. It is herein recognized however, that there is no guarantee that pure noise (signal-free) pixels are available for all intensities, and thus NLF estimation is challenging. The NLF strongly depends on camera and capture settings [Reference 11] as illustrated in
Assume the computing system divides the intensity range of the input noisy image I into M sub-intensity classes. A piecewise linear function, see
σl2(I)=αl*σrep
where l∈{1, . . . , M}, I∈{Il
2. State-of-the-art
AWN estimation techniques can be categorized into filter-based, transform-based, edge-based, and patch-based methods. Filter-based techniques [Reference 12], [Reference 13] first smooth the image using a spatial filter and then estimate the noise from the difference between the noisy and smoothed images. In such methods, spatial filters are designed based on parameters that represent the image noise. Transform (wavelet or DCT) based methods [References 14-20] extract the noise from the diagonal band coefficients. [Reference 19] proposed a statistical approach to analyze the DCT filtered image and suggested that the change in kurtosis values results from the input noise. They proposed a model using this effect to estimate the noise level in real-world images. It is herein recognized that although the global processing makes transform-based methods robust, their edge-noise differentiation load to inaccuracy in low noise levels or high structured images.
[Reference 19] aims to solve this problem by applying a block-based transform. [Reference 20] uses self-similarity of image blocks, where similar blocks are represented in 3D form via a 3D DCT transform. The noise variance is estimated from high-frequency components assuming image structure is concentrated in low frequencies. Edge-based methods [Reference 11, Reference 21, Reference 22] select homogeneous segments via edge-detection. In patch-based methods References [6-9], noise estimation relies on identifying pure noise patches (usually blocks) and averaging the patch variances.
Overall local methods that deal with subsets of images (i.e. homogeneous segments or patches) are more accurate, since they exclude image structures more efficiently, [Reference 6] utilizes local and global data to increase robustness, in [Reference 7], a threshold adaptive Sobel edge detection selects the target patches, then averages of the convolutions over the selected blocks to provide accurate estimation of noise variance. Based on principal component analysis [Reference 8] first finds the smallest eigenvalue of the image block covariance matrix and then estimates the noise variance. Gradient covariance matrix is used in [Reference 9] to select “weak” textured patches through an iterative process estimate the noise variance.
It is herein recognized that patch size is critical for patch-based methods. A smaller patch is better for low level of the noise, while, larger patch makes the estimation more accurate in higher noise level. For all patch sizes, estimation is error prone under processed noise; however by taking more low frequency components into account, larger patches are less erroneous. By adapting the patch size in these estimators to image resolution, it is more likely to find noisy (signal-free) patches, which consequently increases the performance. Logically finding image subsets with lower energy under AWGN conditions leads to accurate results. However, under PGN conditions underestimation normally occurs. Under AWGN, [References 7-9] outperform others, however, it is herein recognized that noise underestimation in PGN makes them impractical for real-world applications.
PGN estimation methods express the noise as a function of image brightness. The main focuses of related work is to first simplify the variance-intensity function and second to estimate the function parameters using many candidates as fitting points. In [Reference 4], [Reference 23], the NLF is defined as a linear function σ2 (I)=α1+b and the goal is to estimate the constants a and b. Wavelet domain [Reference 4] and DCT [Reference 23] analysis are used to localize the smooth regions. Based on the variance of selected regions, each point of curve is considered to perform the maximum likelihood fitting. [Reference 24] estimates noise variation parameters using maximum likelihood estimator. It is herein recognized that this iterative procedure brings up the initial value selection and convergence problems. The same idea is applied in [Reference 11] by using a piecewise smooth image model.
After image segmentation, the estimated variance of each segment is considered as an overestimate of the noise level. Then the lower envelope variance samples versus mean of each segment is computed and based on that, the noise level function by a curve fitting is calculated. In [Reference 25], particle filters are used as a structure analyzer to detect homogeneous blocks, which are grouped to estimate noise levels for various image intensities with confidences. Then, the noise level function is estimated from the incomplete and noisy estimated samples by solving its sparse representation under a trained basis. The curve fitting using many variance-intensity pairs, requires enormous computations, which is not practical for many application especially when the curve estimation is needed to be presented as a single value. As a special case of PGN with zero dependency, AWGN cases are not examined in these NLF estimation methods. In [Reference 26], a variance stabilization transform (VST) converts the properties of the noise into AWGN. Instead of processing the Gaussianized image and inverting back to Poisson model, a Poisson denoising method is applied to avoid an inverted VST.
PPN is not yet an active research and few estimation methods exist. In [Reference 27], first, candidate patches are selected using their gradient energy. Then, the 3D Fourier analysis of current frame and other motion-compensated frames is used to estimate the amplitude of noise. A wider assumption is in [Reference 28] by considering both frequency and signal dependency. In this method, the similarity between patches and neighborhood is the criterion to differentiate the noise and image structure. Using the exhaustive search, candidate patches are selected and noise is estimated in each DCT coefficient.
3. Proposed Systems and methods
The proposed systems and methods are based on the classification of intensity-variances of signal patches (blocks) in order to find homogeneous regions that best represent the noise. It is assumes that noise variance is linear, with limited slope, to the intensity in a class. To find homogeneous regions, the method works on the down-sampled input image and divides it into patches. Each patch is assigned to an intensity class, whereas outlier patches are rejected. Clusters of connected patches in each class are formed and some weights are assigned to them. Then, the most homogeneous cluster is selected and the mean variance of patches of this cluster is considered as the noise variance peak of the input noisy signal. To account for processed noise, an adjustment procedure is proposed based on the ratio of low to high frequency energies. To account for noise variations along video signals, a temporal stabilization of the estimated noise is proposed. The block diagram in
3.1 Homogeneity Guided Patches
Homogeneous patches are image blocks {tilde over (B)}i of a size W×W,
where Ĩ(x, y) is the down-sampled version of the input noisy image at the spatial location (x,y), mod( ) is the modulus after division, and r is the image height (number of rows). After decomposing the image into non-overlapped patches, the noise ni of each patch can be described as Bi=Zi+ni where {tilde over (B)}i is the observed patch corrupted by independent and identically-distributed (i.i.d) zero-mean Gaussian noise ni and Zi is the original non-noisy image patch. The variance σ2({tilde over (B)}i) a patch represents the level of homogeneity {tilde over (H)}i of {tilde over (B)}i,
A small {tilde over (H)}i expresses high patch homogeneity. Under PUN conditions, noise is i.i.d for each intensity level. If an image is classified into classes of patches with same intensity level, the {tilde over (H)}i homogeneity model can be applied to each class. Assuming M intensity classes, {tilde over (L)}1 represents the patches of the lth intensity class,
{tilde over (L)}
l
={{tilde over (B)}
i
|I
l
min≦μ({tilde over (B)}i)≦Ilmax|}, l ∈{1:M} (6)
For M=4, I1min={0; 0.17; 0.4; 0.82} and Ilmax={0.2; 0.45; 0.84; 1} are vectors defining lower and upper hounds of class intensity.
3.2 Adaptive Patch Classification
Images contain statistically more low frequencies than high frequencies. But small image patches show more high frequencies than low frequencies. Thus small patches have the advantage better signal-noise differentiation. Large image patches, on the other side, are less likely to fall in the local minima especially when noise is processed. To benefit from both, the computing systems uses image downscaling with rate R with a coarse averaging as the anti-aliasing filter,
where I and Ĩ are the observed and down-sampled images. This gives small patches in Ī and large patches in I. Furthermore, the processed noise converges to white in the downscaled image. Other desirable effects of downscaling are: 1) noise estimation parameters can he fixed for a lowest possible resolution of the images (note that R varies depending on the input image resolution) and 2) since the down-scaled image contains more low frequencies, the signal to noise ratio is higher. Assuming {tilde over (L)} represents the set of patches in Ĩ; the computing system binary classifies the patches of the lth intensity class in Ĩ into {tilde over (L)}1={{tilde over (L)}10, Ll1}, where {tilde over (L)}1 are the target patches as in,
{tilde over (L)}
l
1
={{tilde over (B)}
i
|{tilde over (H)}
i
≦{tilde over (H)}
th(l), {tilde over (B)}i ∈ {tilde over (L)}i} (8)
It uses the homogeneity values {tilde over (H)}i and a threshold value {tilde over (H)}th(l) to binary classify {tilde over (L)}l. Assuming the maximum value of the slopes α]of the NLF in (3) is αmax. We define {tilde over (H)}th(1) as,
{tilde over (H)}
th(1)=αmax{tilde over (H)}med(l)+β (9)
where β=1 and αmax=3. To calculate {tilde over (H)}med(1), the computing system first divides {tilde over (L)}I into three sub-classes, then finds the minimum {tilde over (H)}i in each sub-class and finally finds the median of the three values. When class l contains overexposed or underexposed patches, {tilde over (H)}med(l) becomes very small. Therefore, the offset β is considered to include noisy patches.
3.3 Cluster Selection and Peak Variance Estimation
Due to complexity of noise and image structure, the variance based classification (8) by itself does not describe the noise in the image. In addition to statistical analysis, the computing system uses a spatial analysis to extract a more reliable noise descriptor. The computing system uses connectivity of patches in both horizontal and vertical directions to form clusters of similar patches. Next, for each cluster of connected patches in the down-sampled image Ĩ, the computing system first finds the corresponding connected patches Bi (with size of R·W×R·W) from the cluster {umlaut over (Φ)} (l, k) in the input noisy image I and then eliminate the outliers of cluster based on their mean and variance. Finally, the computing system assesses each cluster (after outlier removal) based on the intra- and inter-frame weights ω1 to ω11, {umlaut over (Φ)} (l, k) represents the kth cluster of connected patches in the class l before outlier removal.
3.3.1 Outlier Removal
The removal of outliers in each cluster is based on Euclidean distance of both the mean and the variance. For each cluster the patch with higher probability of homogeneity is defined as the reference patch and patches out of certain Euclidean distance are removed. Assuming {umlaut over (Φ)} (l, k) represents the kth cluster of connected patches in the class l before outlier removal, the computing system defines the reference value of variance and mean of each cluster as,
where Bref (l, k) is the patch with the minimum variance in {umlaut over (Φ)} (l, k) and its variance σref2 (l, k) and mean μref (l, k) are considered references. By defining two intervals using two thresholds, the cluster after outlier removal is,
Φ(l, k)={Bi∥σB
where tσ (l, k) and tμ(l, k) are the variance and the mean thresholds that are directly proportional to σref
Where Cσ=3 and Cp=4.
To avoid including image structure in the clusters, the similarity of the patches is considered and in (12) we replace σref2 (l, k) with σsim2 (l, k) defined as,
33.2 Cluster Ranking
For each outlier-reduced connected cluster Φ (l, k) the computing system first computes the weights wj (l, k) and then selects the final homogeneous cluster {circumflex over (Φ)} as in,
Then the computing system defines the peak noise level σp2 in the input image as the average of the patch variances in {circumflex over (Φ)} the cluster ranked highest, e.g., best represents random noise,
where M{{circumflex over (Φ)}} is the number of patches in the cluster {circumflex over (Φ)}. The value σp2 is considered as the peak variance because the computing system gives higher weights to cluster with higher variances. Estimates of {0≦ωj(l, k)≦1} are proposed in the below, where it considers noise in both low and high frequencies, size of the cluster, patch variances, intensity and variance margins, maximum noise level, clipping factors, temporal error, and previous estimates.
3.4 Processed Noise Estimation
It is herein recognized that the assumption that the noise is frequency-independent in each homogeneous cluster is incorrect in processed images. In such situations, the variance of selected cluster σp2 (15) does not represent the true level of the noise in the unprocessed noisy image because some frequency components of the noise have been removed. In many applications such as enhancement, the level of the unprocessed (original) noise is required. To estimate this original noise, the relation between low and high frequency components is necessary to trace the deviation from whiteness because the computing system assumes that the degree of noise removal in high frequency and low frequency is different. Let E(Lf) represent the variance of low-pass filtered pixels of φ (l, k). The and E(Hf) represent the median of the power of high-pass filtered pixels of Φ (l, k). The computing system estimates their relation as follows,
where * is convolution, hlp is a 3×3 moving average filter, and hhp=I−hlp the high-pass filter with a 3 kernel of zero elements except the center is one. With the given low-pass filter Ce=3.7. The ratio Ef increases with spatial filtering occurs. The computing system selects E(Hf) as the median energy because high-frequency noise after filtering has an impulse shape and is divided into high and low levels. In many cameras, the filtering process is optional, allowing for study of the effect of this filtering on processed noise.
To approximate the processing degree γ of (2), the effect of applying anisotropic diffusion [Reference 29] and bilateral filters [Reference 30] on synthetic AWGN is considered.
γ=1.4Er (17)
The computing system temporally stabilizes γ using the procedure discussed in section 3.6. As can be seen in
3.5 Noise Level Function Approximation
The computing system estimates the NLF based on the peak noise variance σp2 of the selected cluster {circumflex over (Φ)} defined in (15) and employs other outlier-removed clusters Φ (l, k) to approximate the NLF. First, the computing system sets all the initial NLF curve {circumflex over (Ω)} (.) to σp2, which means the noise level is identical in all intensities (Gaussian). Then, the computing system updates the {circumflex over (Ω)} (.) based on N{Φ (l, k)} the size (i.e., number of patches) and on σ2 (l, k) the average of the variances of cluster Φ(l, k). The computing, system assigns a weight (confidence) λ (l, k) to σ2 (l, k): the larger N{Φ (l, k)} is, the better σ2 (l, k) represents the noise at intensity μ (l, k), meaning the closer λ (l, k) should be to 1. The point-wise NLF {circumflex over (Ω)} (.) is then,
The divisor constant 5 is considered according to 3σ rule by considering that a cluster with 15 (or more patches is completely reliable i.e., λ (l, k)=1. By applying a regression analysis, e.g., curve fitting, the continuous NLF Ω (.) can be approximated from {circumflex over (Ω)} (.) as illustrated in
3.6 Temporal Stabilization of Estimates
In many video applications, instability of noise level is intolerable, unless the temporal coherence between frame is very small e.g., a scene change. Let ζt−1,t represent the similarity between the current Ii and previous frame It−10≦ζt−1,t≦1. ζ determines how the statistical properties of new observation (i.e., image) are related to previous observations. Consider a process (such as median) Oi (σt−l2, . . . , σt−12, σt2) to filter out outliers from the set of current σi2 and previous estimates {σt−12}. When ζt−1,t=1, the accurate estimate should be σi (σt−i2, . . , σt−12, σt2); when ζt−1,t=0, the accurate estimate is σt2 itself. So the following linear stabilization is proposed,
t
2
=O
i(σt−i2, . . . , σt−12, σt2)·ζt−1,t+(1−ζt−1,1)·σt2 (19)
where,
3.7 Intra-frame Weighting
3.7.1 Noise in Low Frequencies
Image signal is more concentrated in low frequencies, however noise is equally distributed. Down-sampled versus input images cap be exploited to analyze noise in the low frequency components. The variance of finite Gaussian samples follows a scaled chi-squared distribution. But here the computing system utilizes an approximation benefiting the normalized Euclidean distance,
where exp(.) symbolizes the exponential function, α2 and σ2 (l, k) are the average of variances of the input and down-sampled patches in the cluster after outlier removal Φ (l, k). The positive constant C1 (e.g., 0.4) varies depending on the R and the W. Low values of ω1 (l, k) account for image structure, which the signal is concentrated in low frequencies.
3.7.2 Noise in High Frequencies
The dependency of neighboring pixels is another criterion to extract image structure. The median absolute deviation (MAD) in the horizontal, vertical and diagonal directions expresses this dependency,
τi=median{|Bi(m, n+1)−Bi(m, n)|, Bi(m+1, n)−Bi(m, n)|, |Bi(m+1, n+1)−Bi(m, n)|}, 0≦m, n≦R·W−2 (21)
where τi is the MAD of Bi. For a block of Gaussian samples, with the block size 10≦R·W≦25, σBi2=1.1τi. The computing system profits from this property to extract the likelihood function of neighborhood dependency. Assuming for each Φ (l, k), τ (l, k) is the average of τi of the blocks in the Φ (l, k). Under AWGN, the following likelihood function is defined,
where C2=0.2. Low values or ω2 (l, k) mean a strong neighboring dependency, which is a hint of image structure. In case of white noise, the computing system analyzes the MAD versus variance to estimate if the patch contains structure. Thus, in final estimation step, the computing system uses 1.1 τ2 (l, k) instead of σ2 (l, k) for patches with structure.
3.7.3 Size of the Cluster
The target patches are more concentrated in homogeneous regions and the size of the homogeneous region should be large enough to precisely represent the noise statistics. Therefore, larger cluster has a higher probability of presenting the homogeneous regions. However, a linear relationship between cluster size and the corresponding weight is not advantageous, since once it is past a certain size, sufficient noise in can be obtained. The following is proposed for with respect to the weight for the size of the cluster,
where C3=80, N{Φ(l, k)} and N{I} are the lumber of patches in Φ(l, k) and the input image, respectively.
3.7.4 Variance of Means and Variance of Variances
In a homogeneous cluster with relatively large number of pixels in each patch, the normalized value of the variance of variances v(l, k)and variance of means ∈(l, k) of {Bi∈Φ(l, k)}, should be small. And so it is proposed,
where
In equations (24) and (25) ω4(l, k) and ω5(l, k) are directly proportional to ω3(l, k). Without this, it is probable to assign high values to ω4(l, k) and ω5(l, k) when the cluster has a small number of patches even though it is not homogeneous. Uniformity of mean and variance describes cluster homogeneity and leads to high value of ω4(l, k) and ω5(l, k).
3.7.5 Intensity Margins
Excluding the intensity extremes from the estimation procedure can be problematic when the signal margins are informative. For instance, the elimination of dark intensities in an underexposed image leads to the removal of the majority of data and, consequently, inaccurate estimation. It therefore herein proposed to use negative weights to margins,
Where IH=0.9 and Ii=0.06
3.7.6 Variance Margins
There are cases where underexposed or overexposed image, parts with very low variances are not observed in the intensity margins. On the other hand, extremely high variances signify image structure. For consumer electronic related applications, the PSNR usually is not below a certain value (e.g., 22 dB). Thus, similar to intensity margins, variance margins also affect the homogeneity characterization. It is therefore proposed to use the following weight,
Where δ(l, κ)=max(σ2 (l,k)−σ2, 0) σmin2=5 and σmax2=200 are variance margins.
3.7.7 Maximum Noise Level
Under PGN, the maximum noise level distinguishes the signal and noise boundary. Hence, the maximum noise level and the corresponding intensity can be used to estimate the NLF. As a result, the Φ(l, k) with the maximum level of the noise should be ranked higher. However, some consideration Should be taken into account in order to exclude clusters containing image structures for this weighting procedure. The basic assumption that noise variance slope is limited helps to restrict the maximum level of noise in each intensity class. So,
σpeak2(l)=min{αmaxmedian[σ2(l, k), max[σ2(l, k)]} (28)
where σpeak (l) is the expected peak of noise in the class l. Assuming η(l, k)=σpeak2(l)−σ2(l, k), by outlining a valid noise variance interval, the weight can be defined as follows (C8=1),
3.7.8 Clipping Factor
Due to bit-depth limitations, the intensity values of the, input images are clipped in low and high margins. It is proposed to use a weight according to 3σ bound,
where 1 and 0 are maximum and minimum intensity and C9=0.5. If all pixels are in the 3σ bound, μclip=0.
3.8 Inter-frame Weighting
Utilizing only spatial data in video signals may lead to estimation uncertainty, especially in processed noise, where the relation between low and high frequency components deviates from AWGN, which in turn makes structure and noise differentiation more challenging. Another issue to consider in video is robust estimation over time especially in joint video noise estimation and enhancement applications.
3.8.1 Temporal Error Weighting
Assume B(i,t) is ith patch in the noisy frame It at time t, and B(i,t+p) is corresponding patch in the the adjacent noisy frame at time t+p, where p=±1. Based on which adjacent frame (previous or following) has less temporal error for Whole frame p is set to −1 or +1. Assuming the noise level does not change through time the matching (or temporal consistency) factor can he defined as,
where C10=1, B(i,t) ∈ Φt (l, k) is the kth connected cluster of class l in It. Since the homogeneity detection is applied on the input noisy image, there is no guarantee that the temporal B(i,t+p) is also homogeneous. Therefore, high temporal error of few patches should not significantly affect ω10(l, k). For this, the computing system analyzes each patch error and aggregates all matching degrees. This is more reliable than assessing the aggregated variances.
3.8.2 Previous Estimates Weighting
In video applications, noise estimation should be stable through time and coarse noise level jumps are only acceptable when there is a scene (or lighting) change. Therefore, the cluster with the variance closer to previous observation is more likely to be the target cluster. Assuming σt−12 is the estimated noise σp2 for the previous frame, the following is defined to add temporal robustness,
where C11=1 and 0≦ζt−1,i≦1 measures scene change estimated at patch level. Assuming the temporally matched patches have the mean error less than the 2σmax2/(W2), the ratio of temporally matched patches to the whole patches defines the ζt−1,t. Note that (32) guides the estimator to find the most similar homogeneous region in It−1.
3.9 Camera Settings Adaptation
For a specific digital camera, the type and level of the noise can be desirably modeled using camera parameters such as ISO, shutter speed, aperture, and flash on/off. However, creating a model for each camera: requires an excessive data processing. Also such meta-data can be lost for example, due to format conversion and image transferring. Thus, the computing system cannot only rely on the camera or capturing properties to estimate the noise; however, these properties, if available; can support the selection of homogeneous regions and thereby increase estimation robustness. It is assumed the camera settings give probable range of noise level. Patch selection threshold Hth (l) in (9) can be modified according to this range. The computing system can also use variance margin weights in (27) to reject out of range values.
3.10 User Input Adaptation
In some video applications such as post-production, users require manual intervention to adjust the noise level for their specific needs. Assuming user knowledge about the noise level can define the valid noise range, the variance margin used in (27) can be used to reject the out of range clusters.
4. Experimental Results
The down-sampling rate R is a function of image resolution. For example, R=2 for low resolution (less than 720p) and R=3 for higher resolutions. As a result, noise estimation parameters become resolution independent. In an example embodiment, the down-sampled patch size W is set to 5. The number of classes was set to M=4. This is because a too high number M causes the classes to be too small and their statistics invalid. All constant parameters used in the proposed weights are given and explained directly after their respective equations. The same set of values was used in all the results described herein.
The proposed homogeneous cluster selection can be performed either on one channel of a color space or on each channel separately. Normally the Y channel is less manipulated in capturing process and therefore noise property assumptions in it are more realistic. Observation confirms that adapting the estimation to Y channel leads to better video denoising. Therefore, the estimated target cluster is used in the Y as a guide to select corresponding patches in chroman. Utilizing these patches, the computing system calculates the properties of chroma noise, i.e., γ and according to (15) and (17). Due to space constraint, simulation results here are given for the Y channel.
Target patches in (8) can be recalculated in a second iteration by adapting the {tilde over (H)}min(l) to σp2 (estimated in first iteration). A finer estimation can be performed by limiting the bound meaning smaller value for αmax. The rest of the method is the same as in the first iteration. The complexity of a second iteration is very minor and much less than the first one since patch statistics are already computed. However, tests show that a second iteration improves the estimation results slightly, not justifying iterative estimation.
Next, the performance of the proposed estimation of the NLF, AWGN, PGN, and, PPN has been evaluated separately.
4.1 Additive White Gaussian Noise (AWGN)
Six state-of-the-art approaches [References 5-9], [Reference 19] are selected and their performance is evaluated on 14 test images as in
The proposed method in video signals was also tested and
4.2 Poissonian-Gaussian Noise (PGN)
To evaluate the performance of the proposed estimation of PGN, six state-of-the-art approaches [References 5-9, [Reference 19] were tested on seven real world test image. See
The proposed PGN estimator described herein is also evaluated to denoise video signals using BM3D.
4.3 Processed Poissonian-Gaussian Noise (PPN)
If the observed noise is PPN, downscaling has the effect of converging it to white. This in tarn leads to better patch selection under processed noise. Moreover, since the proposed method uses a large patch size, it leads to include more low frequencies and more realistic estimation.
4.4 Noise Level Function
The proposed NLF estimation was applied on images with synthetic and real PGN. The ground-truth for real PGN images has been extracted manually (i.e., subjectively extracted homogeneous regions). Two state-of the-art methods [Reference 11] and [Reference 4] are selected for comparison.
4.5 Adaptation to Camera Settings and to User Input
The more image information is provided, the more reliable estimation can be performed. Capturing properties if available as a meta-data can be useful for guiding the cluster selection procedure. To test this, 10 highly-textured images taken by a mobile camera were selected (Samsung S5) in the burst mode without motion. First, the ground-truth peak of the noise was manually identified by analyzing the homogeneous patches and temporal difference of burst mode captured images. Second, the proposed noise estimator was applied using only Intra-frame weights and the estimated PSNR when compared the ground truth show an average estimation error of 1.2 dB. In the last step, both the patch selection threshold {tilde over (H)}th(l) in (9) and variance margin weight ω7(l, k) in (27) were adapted to the meta-data brightness value and ISO. This led to more reliable estimation with average error of 0.34 dB in PSNR.
Performance of image and video processing methods improves if expertise of their users can be integrated. The proposed method easily allows for such integration. For example, if the user of an offline application can define possible noise range, the proposed variance margin (27) can be used to reject the out of range clusters.
5. Conclusion
Noise estimation methods assume visual noise is either white Gaussian or white signal-dependent. The proposed systems and methods bridge the gap between the relatively well studied white Gaussian noise and the more complicated signal-dependent and processed non-white noises. In one aspect of the systems and methods, a noise estimation method is provided that widens the assumptions using vector of weights, which are designed based on statistical property of noise and homogeneous regions in the images. Based on selected homogeneous regions in the different intensity classes, noise level function and processing degree is approximated. It was shown that this visual noise estimation method, robustly handles different type of visual noise: white Gaussian, white Poissonian-Gaussian, and processed anon-white) that are visible in real-world video signals. The simulation results showed better performance of the proposed method both in accuracy and speed.
6. References
The details of the references mentioned above, and shown in square brackets, are listed below. It is appreciated that these references are hereby incorporated by reference.
[Reference 1] R. Szeliski, Computer vision: algorithms and applications, Springer, 2010.
[Reference 2] Y. Tsin, V. Ramesh, and T. Kanade, “Statistical calibration of CCD imaging process,” in Computer Vision ICCV, IEEE Int. Conf. on. IEEE, 2001, vol. 1, pp. 480-487.
[Reference 3] G. E. Healey and R. Kondepudy, “Radiometric CCD camera calibration and noise estimation,” Pattern Analysis and Machine Intelligence, IEEE Trans. on, vol. 16, no. 3, pp. 267-276, March 1994.
[Reference 4] A. Foi, M. Trimeche, V. Katkovnik, and K. Egiazarian, “Practical Poissonian-Gaussian noise modeling and fitting for single-image raw data, ”Image Processing. IEEE Trans. on, vol. 17, no. 10, pp. 1737-1754, 2008.
[Reference 5] M. Ghazal and A. Amer, “Homogeneity localization using particle filters with application to noise estimation,” Image Processing, IEEE Trans. on, vol. 20, no. 7, pp. 1788-1796, 2011.
[Reference 6] J. Tian and Li Chen, “Image noise estimation using a variation-adaptive evolutionary approach,” Signal Processing Letters, IEEE, vol. 19, no. 7, pp. 395-398, 2012.
[Reference 7] Sh.-M. Yang and Sh.-Ch. Tai, “Fast and reliable image-noise estimation using a hybrid approach,” Journal of Electronic Imaging, vol. 19, no. 3, pp. 033007-033007, 2010.
[Reference 8] S. Pyatykh, J. Hesser, and Lei Zheng, “Image noise level estimation by principal component analysis,” Image Processing, IEEE Trans. on, vol. 22, no. 2, pp. 687-699, 2013.
[Reference 9] X. Liu, M. Tanaka, and M. Okutomi, “Noise level estimation using weak textured patches of a single noisy images,” in Image Processing (ICIP), IEEE Int. Conf. on, 2012, pp. 665-668.
[Reference 10] M. Rakhshanfar and A. Amer, “Homogeneity classification for signal dependent noise estimation in images,” in Image Processing (ICIP), IEEE Int. Conf. on, October 2014, pp. 4271-4275.
[Reference 11] Ce Liu, R. Szeliski, S. B, Kang C. L. Zitnick, and W. T. Freeman, “Automatic estimation and removal of noise from a single image,” Pattern Analysis and Machine Intelligence, IEEE Trans. on, vol. 30, no. 2, pp. 299-314, 2008.
[Reference 12] T.-A. Nguyen and M.-Ch. Hong, “Filtering-based noise estimation for denoising the image degraded by Gaussian noise,” in Advances in Image and Video Technology, pp. 157-167, Springer, 2012.
[Reference 13] D.-H. Shin, R.-H. Park, S. Yang, and J.-H. Jung, “Block-based noise estimation using adaptive Gaussian filtering,” Consumer Electronics, IEEE Trans. on, vol. 51, no. 1, pp. 218-226, 2005.
[Reference 14] D. L. Donoho and J. M. Johnstone, “Ideal spatial adaptation by wavelet shrinkage,” Biometrika, vol. 81, no. 3, pp. 425-455, 1994.
[Reference 15] E. J. Balster, Y. F. Zheng, and R. L., Ewing, “Combined spatial and temporal domain wavelet shrinkage algorithm for video denoising,” Circuits and Systems for Video Technology, IEEE Trans. on, vol. 16, no. 2, pp. 220-230, 2006.
[Reference 16] Yang, Y. Wang, W. Xu, and Q. Dai, “Image and video denoising using adaptive dual-tree discrete wavelet packets,” Circuits and Systems for Video Technology, IEEE Trans. on, vol. 19, no. 5, pp. 642-655, 2009.
[Reference 17] M. Hashemi and S. Beheshti, “Adaptive noise variance estimation in Bayes-Shrink,” Signal Processing Letters, IEEE, vol. 17, no, 1, pp. 12-15, 2010.
[Reference 18] H. H. Khalil, R. O. K. Rahmat, and W. A. Mahmoud, “Chapter 15: Estimation of noise in gray-scale and colored images using median absolute deviation (MAD),” in Geometric Modeling and Imaging GMAI, 3rd Int, Conf. on, July 2008, pp. 92-97.
[Reference 19] D. Zoran and Y. Weiss, “Scale invariance and noise in natural images,” in Computer Vision, IEEE 12th Int. Conf. on, September 2009, pp. 2209-2216.
[Reference 20] A. Danielyan and A. Foi, “Noise variance estimation in nonlocal transform domain,” in Local and Non-Local Approximation in Image Processing LNLA, Int. Workshop on, IEEE, 2009, pp. 41-45.
[Reference 21] Sh.-Ch. Tai and Sh.-M. Yang, “A fast method for image noise estimation using Laplacian operator and adaptive edge detection,” in Communications, Control and Signal Processing ISCCSP, 3rd Int. Symposium on, 2008, pp. 1077-1081.
[Reference 22] P. Fu; Q. Sun; Z. Ji; Q. Chen, “A new method for noise estimation in single-band remote sensing images,” Fuzzy Systems and Knowledge Discovery (FSKD), 2012 9th International Conference on, vol., no,. pp. 1664, 1668, 29-31 May 2012.
[Reference 23] A. Foi, “Practical denoising of clipped or overexposed noisy images,” in EUSIPCO, 16th European Signal Processing Conf., 2008, pp. 1-5.
[Reference 24] A Jezierska, C. Chaux, J.-C. Pesquet, Talbot, and G. Engler, “An EM approach for time-variant Poisson-Gaussian model parameter estimation,” Signal Processing, IEEE Trans, on, vol. 62, no. 1, pp. 17-30, January 2014.
[Reference 25] J. Yang, Zh. Wu, and Ch. Hou, “Estimation of signal-dependent sensor noise via sparse representation of noise level functions,” in image Processing (ICIP), 19th IEEE Int. Conf. on, September 2012, pp. 673-676.
[Reference 26] X. Jin, Zh. Xu, and K. Hirakawa, “Noise parameter estimation for Poisson-corrupted images using variance stabilization transforms,” Image Processing, IEEE Trans. on, vol. 23, no. 3, pp. 1329-1339, March 2014.
[Reference 27] A. Kokaram, D. Kelly, H. Denman, and A. Crawford, “Measuring noise correlation for improved video denoising,” in Image Processing (ICIP), 19th IEEE Int. Conf. on, September 2012, pp. 1201-1204.
[Reference 28] M. Colom, M. Lebrun, A. Buades, and J. M. Morel, “A non-parametric approach for the estimation of intensity-frequency dependent noise,” in Image Processing (ICIP), 21th IEEE Int. Conf. on, October 2014.
[Reference 29] P. Perona and Malik, “Scale-space and edge detection using anisotropic diffusion,” Pattern Analysis and Machine Intelligence, IEEE Trans. on, vol. 12, no. 7, pp. 629-639, 1990.
[Reference 30] C. Tomasi and R. Manduchi, “Bilateral altering for gray and color images,” in Computer Vision, Sixth Int. Conf. on, January 1998, pp. 839-846.
[Reference 31] K., Dabov, A. Foi, V. Katkovnik, and. K. Egiazarian, “Image denoising by, sparse 3-D transform-domain collaborative filtering,” Image Processing, IEEE Trans. on, vol. 16, no. 8, pp. 2080-2095, 2007.
[Reference 32] X. Zhu and P. Milanfar, “Automatic parameter selection for denoising algorithms using a no-reference measure of image content,” Image Processing, IEEE Trans. on, vol. 19. no. 12, pp. 3116-3132, 2010.
It will be appreciated that the features of the systems and methods for estimating different types of image and video noise and its level function are described herein with respect to example embodiments. However, these feature§ may be combined with different features and different embodiments of these systems and methods, although these combinations are not explicitly stated.
While the basic principles of these inventions have been described and illustrated herein it will be appreciated by those skilled in the art that variations in the disclosed arrangements, both as to their features and details and the organization of such features and details, may be made without departing from the spirit and scope thereof. Accordingly, the embodiments described and illustrated should be considered only as illustrative of the principles of the inventions, and not construed in a limiting sense.
This application claims priority to U.S. Patent Application No. 61/993,469, filed May 15, 2014, titled “Method and System for the Estimation of Different Types of Noise in image and Video Signals”, the entire contents of which are hereby incorporated by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CA2015/000322 | 5/15/2015 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
61993469 | May 2014 | US |