This application is the U.S. national phase of the International Patent Application No. PCT/EP2019/064297, filed Jun. 3, 2019, which claims the benefit of French Patent Application No. 18 55955, filed Jun. 29, 2018, the entire content of which is incorporated herein by reference.
The disclosure relates to noise reduction processing for a video sequence of pixel-based digital images, in particular for correcting a “salt-and-pepper” (or “impulse noise” below) type image noise.
However, it can be considered that this noise does not affect an “image background” and one can thus be interested in image processing for estimating the background in real time, with possible applications in:
Next, one concentrates on the case of a grayscale image (a single value per pixel); the extension to the case of a color image (three channels: red, green, blue) is immediate by applying the same calculation to the three channels.
The case of a pixel of the image is then considered, denoted y(n) for the nth image of the sequence, for which the temporal evolution is tracked, with no need to subsequently specify the spatial coordinates of the pixel.
While keeping the other previously mentioned applications in mind, the case of impulse noise due to radiation can be processed based on a model of degradation of a single pixel y(n) at a given moment n. This model deviates from the conventional model of centered additive noise and can for example be written:
The probability density fγ(y) of the values taken by y(n) is written in this example:
This simple model is not the only possibility. In particular, the contaminated values are not necessarily constant, and radiation is not the only application. We are therefore more generally interested in pixel values having a probability density of the form:
fγ(y)=(1−p)fU(y−x)+pfS(y)
where fS(y) is the distribution of “contaminated” pixels (or those corresponding to a moving object) and fU(y−x) is the distribution of “uncontaminated” pixels (or those corresponding to a fixed background).
By analogy with the case of radiation noise, these terms “contaminated” and “uncontaminated” are retained below (even though the case of images with radiation is not the only possible application). In the most general case of the salt-and-pepper type impulse noise, the noise values can be both very high (white pixels, saturation) or very low (black pixels), which is a bimodal shape for fS(y), for example, for noisy pixels oscillating between 0 and the maximum of the values taken by the pixels Imax:
Techniques exist for filtering this type of noise. The two main techniques are given in detail below.
The simplest method consists of using a linear temporal filtering of the values for each pixel. Typically, the sliding average or linear temporal filters such as exponential forgetting are frequently used. Exponential forgetting is defined by:
z(n)=(1−α)y(n)+αz(n−1)
where α governs the forgetting factor of the filter (typically between 0.8 and 0.99).
It converges to the average ymoy of the values taken by y(n) over time. If s is the average value of the contaminated pixels, then we have:
ymoy=x+p(s−x)
This value can be substantially different from the exact value x if the radiation level p is high and/or if the average of the uncontaminated values x is far from the “contaminated” value s. For example, for p=20%, a normally black pixel x=0 and a contaminated value at saturation with s=255 gray levels, we have yavg=51 gray levels, which is a difference of about 50 gray levels compared to the exact value.
In the case of a camera that is moving or that can change zoom factor, the last output of the filter is reset, in order to follow the various movements or changes in zoom factor of the camera. The corresponding expression is written:
where:
y(q,n) is the value of the nth raw image, for the pixel at position q
z(q,n) is the value of the nth filtered image, for the pixel at position q
zrestored(q,n) is the value of the nth restored image, for the pixel at position q
Tn(.) is the estimated transformation between the raw images n−1 and n.
α is the forgetting factor of the filter (typically between 0.98 and 0.95)
N(q,n) is the output from the temporal filter when the input is set to a constant image equal to 1 for all pixels.
Less formally, this filtering equation is written:
This filtering serves to eliminate the effect of the radiation (“snow” effect in the image).
However, it can be shown that the image so restored does not exactly converge to the image that would have been obtained without the radiation.
Another known technique is the median filter.
It is known to be of interest to choose processing which converges not towards the average but towards the median of the values taken by y(n). In fact, if the impulse noise level is less than 50%, the median of the values taken by y(t) is located in the support of the distribution of x+u(t).
This median is independent of the deviation between the contaminated value s and the exact value x.
In the case of “salt-and-pepper” noise, this median is equal to x. In the case of a “contaminated” pixel still greater than the median, it is shown that this median is given by:
which is a small deviation ymed−x<0.32σu for contamination levels observed in practice (for example p<20%)).
The difference between the average and the median is illustrated in
In order to calculate this median over time, the usual processing consists, for each pixel, of calculating the median obtained over sliding time windows. Over each window, the median can be calculated for example by sorting algorithms (the M values taken by the pixel in the time window are sorted and then the value from the middle is taken) or from the histogram of the values taken by the pixel in the window.
This approach has three disadvantages, however:
The disclosed embodiments aim to improve this situation.
For this purpose, a method is proposed for processing data in a video sequence comprising noise, for example impulse noise, the video sequence being formed of a succession of images, and the method comprises, for filtering the noise, an application of a recursive filter hereafter called the “sign filter” and given by:
z(n)=z(n−1)+Δ if y(n)>z(n−1)
z(n)=z(n−1)−Δ if y(n)<z(n−1)
and z(n)=z(n−1) if y(n)=z(n−1)
where:
As will be seen in the examples of results presented below, such a filter offers the advantage of a median filter in comparison to a conventional temporal filter, which is that it is not degraded by an average of the processed images.
In one implementation, the elements y(n), z(n−1) and z(n) are image pixels, having the same position, and the images from the aforementioned succession are processed pixel by pixel.
Alternatively, processing by blocks of adjacent pixels can be applied to make the processing converge even faster depending on the case.
The aforementioned noise can be impulse noise, for example “salt-and-pepper” or “snow” type.
The impulse noise can for example result from a radioactive radiation received by a sensor of a camera filming the aforementioned video sequence.
Typically in this case, the impulse noise is often “snow” type noise.
With or without “salt-and-pepper” or “snow” type noise, the processing according to an embodiment can be used in video sequence images which present objects moving in front of a background of interest. In this case, these objects moving in the images (dust or other) can be treated as noise, by the method according to an embodiment.
In the case of a moving camera, or a camera with possible changes in the zoom factor, incorporating the apparent movement as input to the sign filter can be provided, for example with a module for estimating movement in successive images.
Thus, in more generic terms, in a method where the succession of images comprises an apparent movement of an image background in the succession of images, the method may further comprise:
Such an “apparent movement” can be caused by at least one of the following situations:
With the above notations given to the pixels, the application of the sign filter in the case of such apparent movement can then be given by:
z(q,n)=z(Tn(q),n−1)+Δ if y(q,n)>z(Tn(q),n−1)
z(q,n)=z(Tn(q),n−1)−Δ if y(q,n)<z(Tn(q),n−1)
z(q,n)=z(Tn(q),n−1) if y(q,n)=z(Tn(q),n−1)
with z(q,n) being the values taken by the nth image at the pixel with vector coordinates q and Tp being the estimate of the transformation between the preceding image of rank n−1 and the current image of rank n in the succession.
Preferably, it can be advantageous, for initial images of the succession until the n0th image, to apply a forgetting-factor temporal filter, without applying sign filtering, this forgetting-factor temporal filter being given by:
where ztemp(q,n)=(1−α)·y(q,n)+αztemp(Tn(q),n−1)
and N(q,n)=(1−α)+α·N(Tn(q),n−1);
ztemp(q,n) being a time variable and a being a forgetting factor included between 0 and 1.
z(q,n) then being the values taken by the nth image at the pixel with vector coordinates q and Tn being the estimate of a transformation by possible movement between the preceding image of rank n−1 and the current image of rank n in the succession, with n less than n0.
In the absence of movement between successive images up to image n0, this forgetting-factor temporal filter can be simply given by:
with ztemp(n)=(1−α)y(n)+αztemp(n−1)
ztemp(n) being a time variable and a being a forgetting factor included between 0 and 1.
Next, at least for images of the succession which follow an n0th image, the combination of the sign filter with a forgetting-factor temporal filter can be applied, the result of this combination being given by:
wtemp(n) being a temporal variable given by:
wtemp(n)=(1−β)z(n)+βwtemp(n−1), where z(n) is the result of the application of the sign filter, and where β is a forgetting factor included between 0 and 1.
An important parameter to choose is the value of the coefficient Δ. Preferably, this choice is a function of a maximum value, Imax, of the color level taken by the image elements. Tests show that the coefficient Δ can be less than 20·Imax/255.
Preferably, the coefficient Δ is included between 0 and 5·Imax/255 (which is between 0 and 5 if Imax=255 as it most often is).
“Color level” is understood here to be a gray level or a red, green, or blue (RGB) level typically quantified over 256 values.
The disclosed embodiments also concern a computer program comprising instructions for implementation of the above method, when these instructions are executed by a processor. It also concerns an information medium (memory, for example USB or other type) storing instructions for such a program.
The disclosed embodiments also concern a device comprising a processing unit for implementation of the above method (described later with reference to
Other advantages and features will become apparent upon reading the detailed description of the disclosed embodiments, presented as illustrative examples, and upon examination of the attached drawings in which:
The disclosed embodiments propose a recursive approach as follows: by considering the values taken by a single pixel, denoted y(n) for image n, this approach consists of calculating the series z(n) defined by:
z(n)=z(n−1)+Δ×sign(y(n)z(n−1))
In the following, this approach is called “processing by sign filter” or “fast temporal median.”
Δ is a parameter of the algorithm whose adjustment expresses a compromise between the convergence speed and the residual standard deviation after convergence. Its adjustment is discussed below.
There are multiple advantages to this sign filter.
The required memory remains very small: only two images are used for calculating the output at moment n: the previous output image z(n−1) and the current raw image y(n). The real-time implementation is therefore immediate.
There is nearly no complexity: for each new image and for each pixel, there are one addition, one multiplication, and one comparison (on the sign calculation).
This approach is adapted advantageously and without difficulty to the specific case of a moving camera, for which the recurrence equation becomes:
z(q,n)=z(T(q),n−1)+Δ×sign(y(q,n)z(T(q),n−1))
with z(q,n) being the values taken by the image at moment n at the pixel with vector coordinates q, and T being the estimate of the transformation between the preceding image and the current image. Methods for estimating this transformation are described in the literature for a translation and for the combination of a translation, a rotation, and a change of zoom factor.
Thus a preliminary to processing the sign may consist of defining whether or not the camera is moving:
For example, a user can be provided a binary button for specifying whether the sequence was filmed with a fixed camera or moving camera (step S0), the latter case leading to launching the movement estimation algorithm (step S1), the result thereof being taken as input to each iteration of step S2, iteratively over n up to Nimax (loop of steps S3 and S4, until S5 for n=Nimax)
In the preceding general equation, the coefficient Δ characterizes the importance given to the sign of the difference between the current image y(n) and the preceding image n−1 which has been processed recursively by the method according to an embodiment: z(n−1).
The adjustment of the value of this coefficient Δ can be done as detailed below. This adjustment results from a compromise between the convergence time of the recursive sign filter and the final pertinence of the output of the filter which can for example be estimated by the variance after convergence.
“Convergence time” is understood to mean both:
If one wants to give preference to a fast adaptation/convergence time, a high value is chosen for the coefficient Δ.
If one wants to give preference to a small residual variance, a small value is chosen for the coefficient Δ.
Typically, as output from the sign filter, the residual variance after convergence is given by:
The convergence time for a median change of amplitude A is given by:
Here, the convergence time is given for a “salt-and-pepper” impulse noise comparable to black or white. In the case of pixels contaminated with white only, one would have:
The values taken by the pixels of the image are typically between 0 and 255 (the case of 8-bit coded values). In this case, it appears empirically that a good choice of value for the coefficient Δ may be between 0 and 20, and preferably between 0 and 5. More generally, if the values taken by the pixels are between 0 and IMax, the coefficient Δ can be chosen between 0 and 20·Imax/255, preferably between 0 and 5·Imax/255.
The coefficient Δ may be set as input to the processing by sign filter in step S2, during an initialization step S6 shown in
The performance of the filter according to an embodiment can be compared with the sliding-window median filter.
The standard deviation of the output from the processing by application of the sign filter can in fact be compared to the standard deviation of the conventional calculation of the median over a sliding window of size N. This can be approximated by the following asymptotic result:
Which, in this context, gives:
For data values of the noise σu (standard deviation for “conventional” additive noise, not impulse noise) and contamination level p, the minimum size of the sliding windows needed to obtain the same performance as the sign algorithm proposed here can be deduced:
With for example σu=10 gray levels and p=20%, it is found that the size of the sliding window necessary to obtain the same residual standard deviation is:
One can then consider as a downside to this processing according to an embodiment the compromise to be made between the convergence time and the residual variance after conversion.
However, it is possible to make use of hybrid processing in order to obtain a low residual variance without impacting the convergence time.
A first solution consists of starting the processing with a linear filter, typically a normalized exponential forgetting, and then, after an initialization time n0 of several images, next switching to processing by application of the sign filter. This type of improvement can be useful in the case of a fixed camera, where a low residual variance is sought without penalizing the initial convergence time. On the other hand, this approach is less effective when needing to adapt quickly to a change of scene (and/or change of lighting or zoom in particular).
In order to reduce the residual variance, a second solution consists of using another linear temporal filter, applied to z(t), typically with exponential forgetting. If β is the coefficient for this second filter (included between 0 and 1), this allows multiplying the residual variance by
(if it is assumed that the outputs from the sign filter are uncorrelated in time).
This improvement can be particularly useful in the case of a scene which could shift, in order to:
The two improvements can be used during a single processing.
The hybrid processing with the two improvements is summarized below, with reference to
For n=0, application of an initialization step S20:
y(0)=0,ztemp(0)=0,wtemp(n0)=0
For n=1 to n0 (loop of steps S22 and S23), application in step S21 of the first forgetting-factor temporal filter, without sign filter:
For n>n0 (and until convergence in step S26), application in steps S24 and S25 of the sign filter and respectively of a second forgetting-factor temporal filter to the result of the sign filter:
The performance of the processing by application of the sign filter and of the hybrid processing with improvements (much better performance in this second case) are illustrated by
Use of the hybrid processing and adjustment of its parameters (initialization time and exponential forgetting factors) depend on the type of application. For a fixed or slowly moving camera, a small value of the coefficient Δ is sufficient (0.5 or 1 gray level), on condition of initializing the processing with normalized exponential forgetting (α=0.9 for example). The second filter wtemp) (is not necessarily useful.
For a moving camera, or a variable scene (change of lighting for example), the processing must constantly adapt. A high coefficient of up to Δ=5 gray level could be chosen, followed by exponential forgetting β=0.9. The initialization filter (ztemp) is not necessarily useful.
For an optimum adjustment of the parameters, calculations comparing the residual variance and the convergence time during a change of median of amplitude A (typically due to a variation of the lighting of the filmed scene) can be helpful.
In fact, in this case, the convergence time (for a salt-and-pepper type “black or white pixel” noise (not just white, “snow” type)) for a change of median of amplitude A (time to reach μA) is given by:
The residual variance after convergence can be approximated by:
Now referring to
The processing is done pixel by pixel, to provide an image restored from the value of the pixels at image n.
The output of the processing may present several options:
The restoration then aims to carry out the following operation:
imagerestored=imagerestored,previous,reset+Δ×sign{imageraw−imagerestored,previous,reset}
Two conventional filters (exponential forgetting) can be used in addition to this treatment:
We therefore count four possible combinations of filters:
The second and fourth combinations are detailed below.
Here, the estimate of the geometric transformation between raw image n−1 and raw image n, denoted Tn(q), is used. Continuing to denote the input images y(q,n) and the output images zrestored(q,n), the following steps may be applied:
Initialization: z(q, 0)=0; zrestored(q,0)=0; T1(q)=q (transformation identity);
N(q,0)=0 (normalization image)
and
The following can be chosen as input values:
This value of Δ corresponds to pixel values varying between 0 and 255. For the case of pixels varying between 0 and 1, it is necessary to multiply by 1/255. For the case of pixels varying between 0 and MAX, it is necessary to multiply by MAX/255.
This algorithm makes use of the values:
z(Tn(q),n−1),N(Tn(q),n−1) and zrestored(Tn(q),n−1),
which are sometimes not available when the estimated transformation Tn(q) removes a pixel from the image (because of movement of the image).
In this case, one could then choose the following values:
z(Tn(q),n−1)=0
N(Tn(q),n−1)=0
zrestored(Tn(q),n−1)=y(q,n) or 0
We will now describe the fourth combination, corresponding therefore to the representation in
The processing can be presented as follows:
By default, we can take α=0.95 and β=0.9
Here again, this processing involves the values:
z(Tn(q),n−1), N(Tn(q),n−1) and zrestored(Tn(q),n−1), which may not be available when the estimated transformation Tn(q) removes a pixel from the image (because of movement of the image).
In this case, one could then choose the following values:
z(Tn(q),n−1)=0
N(Tn(q),n−1)=0
zrestored(Tn(q),n−1)=y(q,n) or 0
It is thus shown that the recursive real-time estimation of the background of a video sequence allows restoring films highly degraded by impulse noise (“salt-and-pepper” or “snow” or actual dust (loose paper, particles, etc.) hiding a useful background and thus similar to impulse noise), without denaturing the original image as occurs with a linear filter applying an undesirable form of averaging to the succession of pixels.
The advantages of the processing proposed here are multiple: the complexity and required memory are very low because the update of one pixel for image n is done using only the previously processed value (output n−1) and that of the current pixel (image n). The real-time implementation is therefore immediate, unlike an implementation based on conventional median filters. Furthermore, the processing is directly applicable to the case of a moving camera.
Number | Date | Country | Kind |
---|---|---|---|
18 55955 | Jun 2018 | FR | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2019/064297 | 6/3/2019 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2020/001922 | 1/2/2020 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
4058836 | Drewery et al. | Nov 1977 | A |
5543858 | Wischermann | Aug 1996 | A |
7769089 | Chou | Aug 2010 | B1 |
20040125236 | Kempf | Jul 2004 | A1 |
20060056724 | Le Dinh | Mar 2006 | A1 |
20060109903 | Bergen | May 2006 | A1 |
20080056366 | Bhaskaran | Mar 2008 | A1 |
20120019727 | Zhai | Jan 2012 | A1 |
20120169936 | Persson | Jul 2012 | A1 |
20130089247 | Mercuriev | Apr 2013 | A1 |
20130208129 | Stenman | Aug 2013 | A1 |
20150288856 | Vanam | Oct 2015 | A1 |
20160180504 | Kounavis | Jun 2016 | A1 |
20160232641 | Kamath | Aug 2016 | A1 |
20170154413 | Yu | Jun 2017 | A1 |
20180315172 | Smirnov | Nov 2018 | A1 |
Number | Date | Country |
---|---|---|
1 550 981 | Jul 2005 | EP |
2 309 095 | Nov 1976 | FR |
Entry |
---|
Reza, “Adaptive Noise Filtering of Image Sequences in Real Time,” WSEAS Transactions on Systems, Apr. 30, 2013, vol. 12(4), pp. 189-201. |
PCT Search Report issued in related application PCT/EP2019/064297, dated Aug. 12, 2019, with English language translation, 7 pages. |
Number | Date | Country | |
---|---|---|---|
20210152713 A1 | May 2021 | US |