Optimal denoising for video coding

Description

BACKGROUND AND SUMMARY OF THE INVENTIONS

The present application relates generally to digital signal processing, and more particularly to a video compression, coding, filtering and representation system utilizing temporal redundancy and having both device and method aspects. It further relates to a computer program product, such as a recording medium, carrying program instructions readable by a computing device to cause the computing device to carry out methods according to the inventions.

BACKGROUND
Noising in Video Coding

The importance of video technology is constantly growing with the ever increasing use of television and video systems in consumer, commercial, medical and scientific applications.

Due to the huge size of the raw digital video data (or image sequences), compression must be applied to such data so that they may be transmitted and stored. Hybrid video coding scheme is the most common video coding scheme. It utilizes motion estimation (ME), discrete cosine transform (DCT)-based transform and entropy coding to exploit temporal, spatial and data redundancy, respectively. Most of the existing video coding standards conform to this hybrid scheme, such as the ISO/IEC MPEG-1, MPEG-2, MPEG-4 standards, the ITU-T H.261, H.263, H.264 standards and AVS or related video coding.

Video coding performance is typically evaluated in terms of two measures: coding distortion and compression capability. Due to internal quantization step, the hybrid coding scheme is a lossy coding process, which means that some information is lost in the coding process. Therefore, the decoded (reconstructed) video has some distortion compared to the original (uncoded) video. The best measure of coding distortion is subjective visual quality of the reconstructed video. But subjective quality evaluation is time consuming and requires many observers (both trained and untrained). An alternative is to use objective visual quality measure. A common objective visual quality measure is the peak signal-to-noise ratio (PSNR), defined as:

$PSNR = 20 \log \frac{255}{{[\frac{1}{MNT} \sum_{t = 1}^{T} \sum_{i = 1}^{M} \sum_{j = 1}^{N} {[f (i, j, t) - \hat{f} (i, j, t)]}^{2}]}^{\frac{1}{2}}}$

Where f (i, j, t) is the pixel at location (i,j) in frame t of the original video sequence, {circumflex over (f)}(i, j, t) is the co-located pixel in the decoded video sequence (at location (i,j) in frame t). M and N are frame width and height (in pixels), respectively. T is the total number of frames in the video sequence. Typically, the higher the PSNR is, the higher subjective visual quality is. Compression capability is usually measured in terms of bit rate, which is the number of bits used to encode video bit stream per second.

Unfortunately, digital video sequences are almost always corrupted by noise due to video acquisition, recording, processing and transmission. A main source of the noise is the noise introduced by capture device (e.g. CCD or CMOS sensor), especially when the scene is dark leading to low signal-to-noise ratio. Such noise introduced in video sequence is undesirable. It degrades the subjective quality of the video. It also affects the accuracy of motion estimation. In video coding systems such as MPEG-1/2/4 and H.261/3/4, temporal redundancy is exploited by motion estimation (ME) and motion compensation (MC) to achieve high coding efficiency. However, the noise is inherently independent among frames. Thus the presence of noise reduces temporal redundancy among consecutive frames and this can decrease the accuracy of the motion vectors obtained in motion estimation. Moreover, even if the motion vectors are accurate, the noise will make the residue frame noisy. As the noise typically exhibits little spatial redundancy, the energy of the noisy residue frame cannot be compacted by DCT. The noisy DCT coefficients would require significantly more bits to compress. As a result coding performance could drop very significantly due to the presence of noise, with a large part of the bandwidth or bit rate wasted to represent the undesirable noise. It is thus highly desirable to remove the noise before (purely pre-processing) or during (encoder-embedded denoising) the video encoding while preserving the original video contents.

Many denoising methods have been previously proposed, such as 2-D Kalman filter, Spatial Varying Filter (SVF), Spatial Temporal Varying Filter (STVF), Wavelet Shrinkage, neural network, adaptive wavelet filter, and motion-compensated Kalman filter. These filters are designed as purely pre-processing schemes independent of the video coding process and are to be cascaded with the encoders. Therefore, they require extra computation on top of the video encoder.

Optimal Denoising for Video Coding

The present application proposes a new purely pre-processing denoising method named Multi-Hypothesis Motion Compensated Filter (MHMCF). This pre-processing denoising method is based on the temporal linear minimum mean square error (LMMSE) estimator and requires much fewer pixels (in our simulation 3 pixels are enough) as input than most existing denoising filters to achieve same or better performance. Based on the MHMCF, the application further proposes an improved denoising method named Embedded Optimal Denoising Filter (EODF). Unlike the purely pre-processing approach, this filter can be seamlessly embedded into the motion compensation process of video encoders with nearly no extra computation introduced.

The advantages of the proposed approach are highlighted as follows:

- This approach is simple to implement because it requires much fewer pixels as input with some linear operations;
- This approach achieves better performance because the filter coefficients are determined based on linear minimum mean squared error estimator;
- This approach is flexible because there is no limitation on the number of references involved in the filtering, and the more the references are, the better the performance is. Further, it allows the use of both past frames and future frames, of both previously denoised frames and the undenoised frames;
- This approach can be seamlessly embedded into the motion compensation process of video encoders. Therefore, unlike the standalone purely pre-processing approach, this approach introduces nearly no extra computation;
- For P frames, video encoder with this approach embedded is compatible with MPEG-1, MPEG-2, MPEG-4, H.261, H.263, H.264 or AVS standards The standard can be modified easily to accommodate B frames; and
- The approach produces encoded frame more visually pleasing. That is because the traditional methods preserve large amplitude High Frequency (HF) coefficients caused by noise. After inverse quantization and inverse transform, these HF coefficients would cause “mosaic” phenomena, which are very visually annoying.
  
  Note that many of these advantages are obtained by the cooperation among the various elements of the invention, which work synergistically with each other.

BRIEF DESCRIPTION OF THE DRAWINGS

The disclosed inventions will be described with reference to the accompanying drawings, which show important sample embodiments of the invention and which are incorporated in the specification hereof by reference, wherein:

FIG. 1 shows a video sequence composed of consecutive frames.

FIG. 2 illustrates a video encoder with EODF embedded (1 reference frame).

FIG. 3 illustrates Rate Distortion performance of the video encoded by JM with EODF embedded. FIG. 3(a) is RD Curve for sample one “Akiyo” frame, FIG. 3(b) is RD Curve for sample two “Foreman” frame, and FIG. 3(c) is RD Curve for sample three “Stephen” frame.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The numerous innovative teachings of the present application will be described with particular reference to the presently preferred embodiment (by way of example, and not of limitation).

The Multi-hypothesis Motion Compensated Filter (MHMCF)

As shown in FIG. 1, a video sequence is composed of many consecutive frames. In the video sequence, assume there is a point object which can be tracked in all the frames. Suppose the original intensity value of this point object in frame k is X_k, and the observed value is

Y_k=X_k+N_k. (1)

where N_kis the undesirable noise. N_kis assumed zero mean, with variance σ_n². It is also assumed that N_kis independent over space and time and independent of X_k.

Ideally, X_kshould be the same for all k. However, due to many reasons including inaccurate motion estimation, changing light conditions, and deforming objects, X_kchanges from one frame to another. This can be described as

X_k=X_i+Z_i,k (2)

where Z_i,krepresents the innovation in X_kcompared to X_i. Z_i,kis a random variable with mean Z_i,k and variance σ_z(i,k)²·Z_i,kis assumed to be independent of N_kand X_i.

A preferred embodiment will use some or all of {Y₁, Y₂, . . . } to estimate X_k. Note that the pixels of those {Y₁, Y₂, . . . } are selected as the current pixel's temporal predictions. And these temporal predictions are defined as the hypotheses of the current pixel. How to select these temporal predictions is out of the scope of this application.

Suppose a class of preferred embodiments chooses {Y_{ref 1}, Y_{ref 2}, . . . , Y_refin, Y_k} to estimate X_k. The proposed non-homogenous linear estimator can be expressed as

$\begin{matrix} {\overline{X}}_{k} = \sum_{j = 1}^{m} a_{j} Y_{refj} + a_{c} Y_{k} + c & (3) \end{matrix}$

Based on the LMMSE estimator, the coefficients are determined as follows:

$\begin{matrix} a_{i} = \frac{σ_{z (refi, k)}^{- 2}}{\sum_{j = 1}^{m} σ_{z (refj, k)}^{- 2} + σ_{n}^{- 2}} i = 1 \dots m & (4) \\ a_{c} = \frac{σ_{n}^{- 2}}{\sum_{j = 1}^{m} σ_{z (refj, k)}^{- 2} + σ_{n}^{- 2}} & (5) \\ c = \sum_{j = 1}^{m} a_{j} \overline{Z_{refj, k}} & (6) \end{matrix}$

and the estimation error variance is

$\begin{matrix} E [{({\overline{X}}_{k} - X_{k})}^{2}] = \frac{1}{\sum_{j = 1}^{m} σ_{z (refj, k)}^{- 2} + σ_{n}^{- 2}} & (7) \end{matrix}$

Details of MHMCF

Suppose the current frame is Y_k. Let ref₁, ref₂, . . . ref_mbe some previously denoised frames which are used as references (hypotheses of the current frame). And the undesirable noise variance in Y_kis denoted as σ_n². Here details a preferred embodiment of applying MHMCF for Y_k:

1. Divide the current frame Y_kinto rectangle blocks.
2. The blocks are processed in raster-scan order. Steps 3-10 are the detailed steps for each block. Suppose the current block is B(i, j) and (i,j) is the coordinates of the pixel at the upper left corner. Initialize p=1 (index for ref1).
3. Set ref_pas the reference frame and perform motion estimation (using some appropriate distortion measure and search strategy). Denote R_p(i_p, j_p) as the chosen block by motion estimation with regards to ref_p.
4. Denote residue block of B(i, j) with respect to ref_pas Res_p(i, j), which is derived as follows: Res_p(i, j)=B(i, j)−R_p(i_p, j_p)
5. Calculate the mean and the variance of Res_p(i, j). Denote c₁as the mean and σ_p_—_n²as the variance. Define σ_z(refp,k)²=max(0,σ_p_—_n²−σ_n²).
6. Change p to 2-m, and repeat step 3-5. After this step, c₁. . . c_k, σ_{z(ref 1,k)}². . . σ_z(refm,k)²and R₁(i₁, j₁) . . . R_m(i_m, j_m) are supposed to be available.
7. Compute the filter coefficients a_i, i=1 . . . m:

$a_{i} = \frac{σ_{z (refi, k)}^{- 2}}{\sum_{j = 1}^{m} σ_{z (refj, k)}^{- 2} + σ_{n}^{- 2}}$

8. Compute filter coefficient a_c:

$a_{c} = \frac{σ_{n}^{- 2}}{\sum_{j = 1}^{m} σ_{z (refj, k)}^{- 2} + σ_{n}^{- 2}}$

9. Compute mean compensation c as follows:

$c = \sum_{j = 1}^{m} a_{j} c_{j}$

10. Denote the denoised output of current block B(i, j) as B′ (i, j), which is calculated as follows:

$B (i, j) = a_{c} B (i, j) + \sum_{j = 1}^{m} a_{j} R_{j} (i_{j}, j_{j}) + c$

11. If all the blocks in Y_khave been denoised, go to step 12. Otherwise, go to step 3 to process the next block.

12. Add the current denoised frame into reference buffer to facilitate denoising for the next frame.

MHMCF Simulation Results

Experiments have been conducted to evaluate the performance of MHMCF. Four samples of test video sequences named “Akiyo”, “Foreman”, “Children” and “News” are in CIF format. Gaussian noise with variance 169 is added to the luminance components. In the tests both MHMCF with 1 reference (1HMCF) and MHMCF with 2 references (2HMCF) are based on our embodiments, and STVF is not. 1HMCF, 2HMCF and STVE are used to process the first 100 frames of each sequence.

The following Table I shows the denoising performance of different filters in terms of PSNR. It can be seen that both of our MHMCF, i.e. 1HMCF and 2HMCF, have better denoising performance than STVF and 2HMCF has the best performance (for sample “Akiyo”, the PSNR gain can be 8.85 dB). This is consistent with equation (7), which indicates that the more the references (hypotheses) are used, the better denoising performance can be achieved. If 3 or more reference frames were to be used, it should have achieved better performance.

TABLE 1

Denoising performance

Sample
PSNR(dB)

Name
Unfiltered
STVF
1HMCF
2HMCF

Akiyo
25.86
30.61
33.37
34.71

Foreman
25.89
30.12
31.24
32.01

Children
25.9
29.53
29.90
30.58

News
25.89
30.02
31.45
32.80

Average
0
4.19
5.61
6.64

Gain

To evaluate the subjective quality of denoised video, some frames of sample “Akiyo” and sample “Foreman” processed by different denoising methods are shown in Appendix A. Test results show that MHMCF can dramatically increase the subjective quality. After filtering, most of the noise is removed, but fine details are still well preserved (e.g. hair of person in sample “Akiyo”). On the contrary, STVF cannot remove large amplitude noise and the picture is not visually pleasing.

To evaluate the improvement of coding efficiency, MHMCF filtered video sequences are coded using H.264 software, JM 8.3. The first frame is I frame and the rest are P frames. Only 16×16 block size and 1 reference frame are used in motion estimation. The coding performance of samples “Akiyo” and “Foreman” are given out in the following Table II and III respectively. Obviously, the coding efficiency is dramatically improved for those MHMCF filtered video sequences, especially when bit rate budget is adequate.

TABLE II

Sample “Akiyo” Test Results

Unfiltered
1HMCF
2HMCF

PSNR
Bitrate
PSNR
Bitrate
PSNR
Bitrate

QP
(dB)
(kbps)
(dB)
(kbps)
(dB)
(kbps)

19
26.54
12590
33.73
3471.86
34.80
1481.06

25
27.03
8631
34.3
883.78
34.49
431.65

31
27.66
4063
33.82
177.87
33.23
142.82

37
33.55
147
31.53
84.68
31.20
82.01

TABLE III

Sample “Foreman” Test Results

Unfiltered
1HMCF
2HMCF

PSNR
Bitrate
PSNR
Bitrate
PSNR
Bitrate

QP
(dB)
(kbps)
(dB)
(kbps)
(dB)
(kbps)

19
26.56
12426
31.45
5246.34
32.01
3219.29

25
27.02
8467
31.59
2061.73
31.72
1117.72

31
27.51
4175
31.36
503.58
30.76
371.76

37
30.95
293
29.55
173.27
29.06
160.19

Motion Compensation in Hybrid Video Coding Scheme

Motion compensation is a critical part of hybrid video coding scheme. The input of motion compensation is the residue and the reconstructed reference frames. These reference frames are combined by linear averaging to generate a predicted frame. Residue is the difference between the current frame and the predicted frame, and would be encoded and transmitted to the decoder for the reconstruction of the current frame. In motion compensation process, the current frame is reconstructed by adding the residue frame back to the predicted frame.

Taking bi-direction prediction into consideration, the motion compensation of hybrid coding scheme can be generalized as follows (the effect of quantization on residue is ignored):

Y_k(i,j)=b₁Y_ref1(i,j)+b₂Y_ref2(i,j)+res(i,j) (8)

where Y_k(i, j) is the pixel with coordinates (i, j) in the current frame which is to be reconstructed; Y_{ref 1}(i, j) and Y_{ref 2}(i, j) are the hypotheses (temporal predictions) of Y_k(i, j) found by ME in two reference frames, respectively. Linear combination of these two hypotheses, i.e. b₁Y_{ref 1}(i, j)+b₂Y_{ref 2}(i, j) is used as the prediction of the current video Y_k(i, j). As stated before, residue res(i, j) is the difference between Y_k(i, j) and its prediction b₁Y_{ref 1}(i, j)+b₂Y_ref2(i, j). res(i, j) can be expressed as follows:

res(i,j)=Y_k(i,j)−(b₁Y_ref1(i,j)+b₂Y_ref2(i,j)) (9)

For P frame, only one reference frame, say, Y_{ref 1}is used, which indicates that b₂=0. For B frames, both Y_{ref 1}and Y_{ref 2}are utilized and b₁+b₂=1.

The Encoder-embedded Optimal Denoising Filter (EODF)

The proposed embodiment for an EODF is based on MHMCF. In the embodiment, a MHMCF filter with 2 references is expressed as follows:

$\begin{matrix} X_{k} (i, j) = a_{1} Y_{ref 1} (i, j) + a_{2} Y_{ref 2} (i, j) + a_{c} Y_{k} (i, j) + c Let b_{1} = \frac{a_{1}}{(1 - a_{c})} = \frac{σ_{z (ref 1, k)}^{- 2}}{\sum_{j = 1}^{2} σ_{z (refj, k)}}, b_{2} = \frac{a_{2}}{(1 - a_{c})} = \frac{σ_{z (ref 2, k)}^{- 2}}{\sum_{j = 1}^{2} σ_{z (refj, k)}^{- 2}} & (10) \end{matrix}$

As defined before,

res(i,j)=Y_k(i,j)−(b₁Y_ref1(i,j)+b₂Y_ref2(i,j))

Combining equations (9) and (10), the following equation can be obtained:

{tilde over (X)}_k(i,j)=b₁Y_ref1(i,j)+b₂Y_ref2(i,j)+a_cres(i,j)+c (11)

Comparing equation (11) with equation (8), one of ordinary skill in the art can see the only difference is that for the former, res (i, j) is scaled by a_cand an extra constant c is added. Therefore, this filter can be seamlessly incorporated into bi-direction motion compensation in video encoder.

In the embodiment, for 1 reference case, the MHMCF has the following form:

{tilde over (X)}_k(i,j)=a₁Y_ref1(i,j)+a_cY_k(i,j)+c

It's easy to show that a₁+a_c=1 which indicates that b₁=1. The following equation can be obtained:

{tilde over (X)}_k(i,j)=Y_ref1(i,j)+a_cres(i,j)+c (12)

This filter only needs to scale down the residue and add an extra constant c, and can be easily incorporated into one-direction motion compensation process. Such a video encoder is illustrated in FIG. 2.

To summary, the preferred embodiment presents EODF for motion compensation with one and two reference frames in equation (11) and (12) respectively. At most two reference frames are used in current video coding standards. However, the proposed EODF can be easily extended to be embedded into motion estimation with more reference frames.

DCT Domain Implementation of Encoder-embedded Optimal Denoising Filter (EODF)

The previous section discussed an embodiment of EODF implementation on residue domain, operating on residue coefficients. Another embodiment of EODF can also be on Discrete Cosine Transform (“DCT”) domain.

Let Res(i, j) be the residue block of size N×M (i=1, . . . , N, j=1, . . . , M). Define TRes(i, j) to be the DCT transform of residue block Res(i, j). Obviously, TRes(i, j) is also of size N×M.

DCT domain EODF modifies DCT coefficients as follows,

TRes′(i,j)=TRes(i,j) i=1 and j=1
TRes′(i,j)=TRes(i,j)*a_c, i≠1 or j≠1

where TRes (i, j) is the modified DCT coefficient, a_cis the scale factor defined in the previous section.

EODF Details

The preferred embodiment here only describes the detailed steps of EODF with 1 reference frame. The detailed steps of EODF with 2 reference frames are similar and should be obvious to one of ordinary skill in the art.

Let Y_kbe the current frame to be processed and ref₁be the reference frame. Suppose the noise variance in the current frame is σ_n².

1. Denote the current residue block to be denoised as Res(i, j) step 2-6 are performed:
2. Calculate the mean and the variance of Res (i, j). Denote c₁as the mean and σ₁_—_n²as the variance. Define σ_{z(ref 1,k)}²=max(0,σ₁_—_n²).
3. Compute scaling factor a_c:

$a_{c} = \frac{σ_{n}^{- 2}}{σ_{z (ref 1, k)}^{- 2} + σ_{n}^{- 2}}$

4. Compute mean compensation c:

c=(1−a_c)c₁

5. Compute the denoised residue block Res′(i, j) as follows:

Res′(i,j)=a_c*Res(i,j)+c

6. Apply transform and quantization to the denoised residue block Res′(i, j). Then write it into bit stream.

7. Use the denoised residue to reconstruct the denoised current block.

8. If all the residue blocks of the current frame are processed, go to step 9; otherwise, go to step 1 to process the next residue block.

9. Add the current denoised frame into reference buffer to facilitate the denoising of the next frame.

EODF Simulation Results

In the preferred embodiment the proposed EODF is embedded into H.264 reference software, JM 8.3 and simulated using various QP, bit rate and video sequences. Another encoder-embedded denoising filter—transform domain temporal filter (TDTF) is also implemented for comparison in the simulation. Three sample test sequences named “Akiyo”, “Foreman” and “Stefan” are in CIF format. Gaussian noise with variance 169 is added to the luminance components of these test sequences. The first 100 frames are encoded by the original JM 8.3 (JMO), JM8.3 with TDTF (JMT) and JM8.3 with the proposed EODF (JMP). The first frame is I frame and the rest are P frames.

Tables IV-VI compare Rate-distortion (RD) performance with QP ranging from 19-37.

TABLE IV

Sample “Akiyo” Test Results

JMO
JMT
JMP

PSNR
Bitrate
PSNR
Bitrate
PSNR
Bitrate

QP
(dB)
(kbps)
(dB)
(kbps)
(dB)
(kbps)

19
26.54
12590
32.78
6409
35.91
1544

25
27.03
8631
33.65
1907
35.59
470

31
27.66
4063
33.2
213
33.22
218

37
33.55
147
30.72
101
31.29
103

TABLE V

Sample “Foreman” Test Results

JMO
JMT
JMP

PSNR
Bitrate
PSNR
Bitrate
PSNR
Bitrate

QP
(dB)
(kbps)
(dB)
(kbps)
(dB)
(kbps)

19
26.56
12426
31.52
6910
32.77
3074

25
27.02
8467
31.46
2500
32.33
1038

31
27.51
4175
30.68
337
31.1
356

37
30.95
293
28.1
150
29.16
166

TABLE VI

Sample “Stefan” Test Results

JMO
JMT
JMP

PSNR
Bitrate
PSNR
Bitrate
PSNR
Bitrate

QP
(dB)
(kbps)
(dB)
(kbps)
(dB)
(kbps)

19
26.54
13172
29.7
7875
31.23
4986

25
26.97
8892
29.41
3196
30.5
2234

31
27.28
4499
27.64
656
28.73
832

37
28.37
601
23.74
261
26.05
327

FIG. 3 shows the comparison of RD performance with fixed bit rate. It can be seen JMP (EODF) has much better noise suppress performance compared to JMT and JMO. It can dramatically increase PSNR and reduce bit rate for noisy video coding, especially for low QP (high bit rate). Some reconstructed frames are shown in Appendix B to evaluate the subjective quality. The test results show that the frame encoded by JMP is obviously less noisy and more visually pleasing.

When QP is large (bit rate is low), the gaps between these three curves become smaller. This is because some small amplitude high frequency (HF) coefficients caused by noise will be quantized to 0 when QP is large. In some extreme cases, the performance of JMO is even better than JMP in terms of PSNR. However, with JMP, the encoded frame is more visually pleasing. The reason is that, for JMO, although HF coefficients with small amplitude are removed, those with large amplitude can survive quantization. After inverse quantization and inverse transform, these HF coefficients will cause “mosaic” phenomena which are very visually annoying. Tests are also performed on a frame encoded at low bit rate by JMO and JMP, respectively. Appendix C shows the test results. Although PSNR is 0.47 dB lower, the frame encoded by JMP looks better.

Modifications and Variations

As will be recognized by those of ordinary skill in the art, the innovative concepts described in the present application can be modified and varied over a tremendous range of applications, and accordingly the scope of patented subject matter is not limited by any of the specific exemplary teachings given.

MHMCF Modifications and Variations

For example, in the above embodiment, step 7 uses m previously denoised frames as the reference (hypothesis). However, the reference frames are not necessarily limited to denoised ones. Another class of embodiments can use all frames, denoised or not, in this sequence. In another class of embodiments, the number of references, m, can be any natural number. Moreover, m can be different for different k.

For another example, note that in step 1 of the current embodiment, different blocks can have same or different sizes. In different class of embodiments, the blocks may be different, non-rectangle (e.g. triangular, hexagonal, irregular, etc) shapes. Further, the blocks can be disjoint or overlapped. And in extreme cases, the block can contain only 1 pixel.

Yet, for another example, there are various ways to implement the motion estimation in step 3. For instance, it can be block-based motion estimation such as full search, 3-step search, or other search. Further, such block-based motion estimation may use pixels in the block and outside the block. For color video, the motion estimation may be based on only one color component, but can use more than one component also. Apart from block-based motion estimation, in other class of embodiments the motion can also be obtained using other methods such as optical flow, or even manually input.

Note that in step 2, the blocks are processed in raster scan order in current embodiment. Yet in other embodiments, the blocks can be processed in other order.

Note that in step 5, σ_z(refi,k)²is defined as σ_z(refi,k)^2=max(0,σ_i_—_n²−σ_n²). This can be generalized as a σ_z(refi,k)²=d*max(0,σ_i_—_n²−σ_n²) to achieve more accurate parameter estimation in other class of embodiments.

In video sequences with scene changes, yet another class of embodiments might suggest that frames in different scenes should not be used to denoise each other even if some frames happen to be similar.

Finally, MHMCF can be implemented using recursive or non-recursive means.

EODF Modifications and Variations

In the preferred embodiment, EODF is embedded into video encoder with one or two reference frames. However, other embodiments can also embed EODF into video encoder with more than 2 reference frames.

There are other classes of EODF embodiments. For example, in step 1, the residue block to be denoised can be of any size and any shape.

For another example, in step 2, σ_{z(ref 1,k)}²is calculated as σ_{z(ref 1,k)}²=max(0,σ₁_—_n²−σ_n²). Again, this can be generalized in other class of embodiments as σ_{z(ref 1,k)}²=d*max(0,σ₁_—_n²−σ_n²) to achieve more accurate parameter estimation in practice.

None of the description in the present application should be read as implying that any particular element, step, or function is an essential element which must be included in the claim scope: THE SCOPE OF PATENTED SUBJECT MATTER IS DEFINED ONLY BY THE ALLOWED CLAIMS. Moreover, none of these claims are intended to invoke paragraph six of 35 USC section 112 unless the exact words “means for” are followed by a participle.

The claims as filed are intended to be as comprehensive as possible, and NO subject matter is intentionally relinquished, dedicated, or abandoned.

Claims

1. A method implemented by a video encoder of denoising in video coding, comprising: dividing a current frame into blocks;selecting at least one other video frame;selecting a current block from said current frame and a reference block corresponding to the current block from said at least one other video frame;generating a residue block as a function of a difference between said current block and said reference block; andperforming a linear combination based at least in part on a mean compensation, wherein a weight of the mean compensation is determined adaptively and at least partially dependent on a variance and a mean of the residue block, and the linear combination is performed on pixels contained in said current block and corresponding pixels contained in said reference block to output denoised values.
2. The method of claim 1, wherein said pixels are sets of sub-pixels at sub-pixel locations.
3. The method of claim 1, wherein said at least one other video frame is not sequentially adjacent to said current frame.
4. The method of claim 1, further comprising: obtaining said reference block including performing a search within a predefined search area related to said current block as a function of a difference between said current block and said reference block.
5. The method of claim 4, wherein said performing said search includes implementing a full search algorithm.
6. The method of claim 1, wherein said blocks are each a 3×3 square used to perform motion estimation and obtain the reference block.
7. The method of claim 1, wherein said blocks are rectangular.
8. The method of claim 1, wherein said blocks are each a hexagon.
9. The method of claim 1, wherein said blocks each contain one pixel.
10. The method of claim 1, wherein the mean compensation is constructed as a function of said variance and said mean of said residue block.
11. The method of claim 1, wherein said video frames are denoised frames and non-denoised frames.
12. The method of claim 1, wherein said video frames are past frames and future frames.
13. The method of claim 1, wherein said video frames are never-encoded frames and previously-encoded frames.
14. A method implemented by a video encoder of denoising in video coding, comprising: selecting a current frame and at least one other reference frame;obtaining a predicted frame by performing motion estimation on said at least one other reference frame;dividing said current frame and said predicted frame into current blocks in said current frame and co-located blocks in said predicted frame;performing motion compensation to obtain a residue frame comprising residue blocks constructed as a function of a difference between said current blocks and said co-located blocks;generating denoised values by performing linear combination, with weight determined adaptively and at least partially dependent on a variance and a mean of said residue blocks, in dependence of said predicted frame and said residue frame.
15. The method of claim 14, wherein said performing said motion estimation is adaptive to estimated noise variance of said co-located blocks.
16. The method of claim 15, wherein said performing said motion estimation includes a linear combination of said co-located blocks with the weight determined adaptively in dependence of the estimated noise variance of said co-located blocks.
17. The method of claim 14, wherein said generating said denoised values by performing said linear combination is integrated with said performing said motion compensation.
18. A video coding system, comprising: a filter; anda video coding unit;wherein said filter operates in conjunction with said video coding unit to perform: selection of a current frame and at least one other reference frame;construction of a predicted frame by performance of motion estimation on said at least one other reference frame;determination of current blocks and co-located blocks by division of said current frame and said at least one other reference frame;construction of a residue frame containing residue blocks obtained by performance of motion compensation as a function of a difference between said current blocks and said co-located blocks; anddetermination of new values by performance of a linear combination on said predicted frame and said residue frame with weights determined adaptively and at least partially dependent on a variance and a mean of said residue blocks; andwherein said video coding unit operates on said new values to generate a denoised frame.
19. The system of claim 18, wherein said filter operates in conjunction with said video coding unit to integrate said determination of said new values with said determination of said residue frame.

CROSS-REFERENCE TO OTHER APPLICATION

The present application claims priority from U.S. provisional application No. 60/801,375 filed May 19, 2006, which is hereby incorporated by reference.

US Referenced Citations (74)

Number	Name	Date	Kind
4903128	Thoreau	Feb 1990	A
5497777	Abdel-Malek et al.	Mar 1996	A
5781144	Hwa	Jul 1998	A
5819035	Devaney et al.	Oct 1998	A
5831677	Streater	Nov 1998	A
5889562	Pau	Mar 1999	A
5982432	Uenoyama et al.	Nov 1999	A
6023295	Pau	Feb 2000	A
6090051	Marshall	Jul 2000	A
6094453	Gosselin et al.	Jul 2000	A
6101289	Kellner	Aug 2000	A
6182018	Tran et al.	Jan 2001	B1
6211515	Chen et al.	Apr 2001	B1
6249749	Tran et al.	Jun 2001	B1
6285710	Hurst et al.	Sep 2001	B1
6343097	Kobayashi et al.	Jan 2002	B2
6346124	Geiser et al.	Feb 2002	B1
6424960	Lee et al.	Jul 2002	B1
6442201	Choi	Aug 2002	B2
6443895	Adam et al.	Sep 2002	B1
6470097	Lai et al.	Oct 2002	B1
6499045	Turney et al.	Dec 2002	B1
6557103	Boncelete, Jr. et al.	Apr 2003	B1
6594391	Quadranti et al.	Jul 2003	B1
6633683	Dinh et al.	Oct 2003	B1
6650779	Dorrity et al.	Nov 2003	B2
6684235	Turney et al.	Jan 2004	B1
6700933	Wu et al.	Mar 2004	B1
6716175	Geiser et al.	Apr 2004	B2
6771690	Heikkila	Aug 2004	B2
6792044	Peng et al.	Sep 2004	B2
6799141	Stoustrup et al.	Sep 2004	B1
6799170	Lee et al.	Sep 2004	B2
6801672	Thomas	Oct 2004	B1
6827695	Palazzolo et al.	Dec 2004	B2
6836569	Le Pennec et al.	Dec 2004	B2
6840107	Gan	Jan 2005	B2
6873368	Yu et al.	Mar 2005	B1
6876771	Prakash et al.	Apr 2005	B2
6904096	Kobayashi et al.	Jun 2005	B2
6904169	Kalevo et al.	Jun 2005	B2
6937765	Skourikhine et al.	Aug 2005	B2
6944590	Deng et al.	Sep 2005	B2
6950042	Nakagawa et al.	Sep 2005	B2
6950473	Kim et al.	Sep 2005	B2
RE39039	Ro et al.	Mar 2006	E
7034892	Ojo et al.	Apr 2006	B2
7110455	Wu et al.	Sep 2006	B2
7167884	Picciolo et al.	Jan 2007	B2
7197074	Biswas et al.	Mar 2007	B2
7363221	Droppo et al.	Apr 2008	B2
7369181	Kang et al.	May 2008	B2
7869500	Yankilevich	Jan 2011	B2
7911538	Ha et al.	Mar 2011	B2
20020024999	Yamaguchi et al.	Feb 2002	A1
20050025244	Lee et al.	Feb 2005	A1
20050135698	Yatsenko et al.	Jun 2005	A1
20050280739	Lin et al.	Dec 2005	A1
20060228027	Matsugu et al.	Oct 2006	A1
20060262860	Chou et al.	Nov 2006	A1
20060290821	Soupliotis et al.	Dec 2006	A1
20070053441	Wang et al.	Mar 2007	A1
20070110159	Wang et al.	May 2007	A1
20070126611	Streater	Jun 2007	A1
20070140587	Wong et al.	Jun 2007	A1
20070171974	Baik	Jul 2007	A1
20070177817	Szeliski et al.	Aug 2007	A1
20070195199	Chen et al.	Aug 2007	A1
20070257988	Ong et al.	Nov 2007	A1
20080151101	Tian et al.	Jun 2008	A1
20080292005	Xu et al.	Nov 2008	A1
20100014591	Suzuki	Jan 2010	A1
20100220939	Tourapis et al.	Sep 2010	A1
20110149122	Subbotin	Jun 2011	A1

Foreign Referenced Citations (5)

Number	Date	Country
0270405	Jun 1988	EP
0797353	Sep 1997	EP
1040666	Oct 2000	EP
1239679	Sep 2002	EP
1711019	Oct 2006	EP

Non-Patent Literature Citations (11)

Entry
Song et al, Motion-Compensated Temporal Filtering for Denoising in Video Encodeer, Jun. 24, 2004, vol. 40, No. 13, pp. 1-2.
Cheong et al, Adaptive Spatio-Temporal Filtering for Video De-Noising, 2004, IEEE, pp. 965-968.
Kwon et al, A Motion-Adaptive De-Interlacing Method, Jun. 5, 1992, IEE, pp. 145-150.
Girod, Why B-Pictures Work, 1998, IEE, pp. 213-217.
Braspenning et al, 1997, True-Motion Estimation using Feature Correspondences, Philips Research Laboratories.
L.W.Guo,O.C. Au, M.Y.Ma and Z.Q.Liang A multihypothesis motion-compensated temporal filter for video denoising,Proc.ICIP, 2006.
M.Lindenbaum, M.Fischer, and A.Bruckstein. “On gabpr contribution to image enhancement.” Pattern Recognition, , 1994.
T.W.Chan, O.C.Au, T.K.Chong; W.S.Chau “A novel content-adaptive video denosing filter”, Proc.ICASSP 2005.
R.Dugad and N.Ahuja, “Video denoising by combining kalman and wiener estimates,” in Proc.ICIP, 1999.
Songs, B.C., and Chun, K.W.; “Motion compensated temporal filtering denoising in video encoder” Electronics Letters vol. 40, Issue 13, Jun. 24, 2004.
OA dated Mar. 1, 2012 for U.S. Appl. No. 12/122,163, 25 pages.

Related Publications (1)

	Number	Date	Country
	20070291842 A1	Dec 2007	US

Provisional Applications (1)

	Number	Date	Country
	60801375	May 2006	US

Optimal denoising for video coding

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

US Classifications

Field of Search

US

International Classifications

Term Extension

Abstract