The invention relates generally to the field of digital video and image sequence processing, and in particular to video filtering and noise reduction in a video image sequence.
With the advance of digital technologies, especially the widespread use and availability of digital camcorders, digital video is getting easier and more efficient to use in a wide variety of applications, such as entertainment, education, medicine, security, and military. Accordingly, there is an increasing demand for video processing techniques, such as noise reduction.
There is always certain level of noise captured in a video sequence. The sources are numerous, including electronic noise, photon noise, film grain noise, and quantization noise. The noise adversely affects video representation, storage, display, and transmission. It contaminates visual quality, decreases coding efficiency (with increased entropy), increases transmission bandwidth, and makes content description less discriminative and effective. Therefore, it is desirable to reduce the noise while preserving video content.
After years of effort, video filtering still remains as a challenging task. Most of the time, the only information available is the input noisy video. Neither the noise-free video nor the error characteristics are available. To effectively reduce the random noise, motion estimation is necessary to enhance temporal correlation, by establishing point correspondence between video frames. However, motion estimation itself is an under-constrained and ill-posed problem, especially when there is noise involved. Perfect motion estimation is almost impossible or not practical. Meanwhile, spatiotemporal filtering is also necessary to actually reduce the random noise. The filter design heavily depends on the knowledge of the noise characteristics (which are usually not available). Furthermore, video processing requires tremendous computational power because of the amount of data involved.
Research on noise estimation and reduction in a video sequence has been going on for decades. “Noise reduction in image sequence using motion-compensated temporal filtering” by E. Dubois and M. Sabri, IEEE Trans. on Communication, 32(7):826-831, 1984, presented one of the earliest schemes using motion for noise reduction. A comprehensive review of various methods is available in “Noise reduction filters for dynamic image sequence: a review” by J. C. Brailean, et al., Proceedings of the IEEE, 83(9):1272—1292, September 1995. A robust motion estimation algorithm is presented in “The robust estimation of multiple motions: parametric and piecewise smooth flow fields” by M. Black and P. Anandan, Computer Vision and Image Understanding, 63:75-104, January 1996.
In addition, the following patent publications bear some relevance to this area; each of which are incorporated herein by reference. Commonly-assigned U.S. Published Patent Application No. 20020109788, “Method and system for motion image digital processing” by R. Morton et al., discloses a method to reduce film grain noise in digital motion signals by using a frame averaging technique. A configuration of successive motion estimation and noise removal is employed. U.S. Pat. No. 6,535,254, “Method and device for noise reduction” to K. Olsson et al., discloses a method of reducing noise in a video signal. U.S. Pat. No. 6,281,942, “Spatial and temporal filtering mechanism for digital motion video signals” to A. Wang, discloses a digital motion video processing mechanism of adaptive spatial filtering followed by temporal filtering of video frames. U.S. Pat. No. 5,909,515, “Method for the temporal filtering of the noise in an image of a sequence of digital images, and device for carrying out the method” to S. Makram-Ebeid, discloses a method for temporal filtering of a digital image sequence. Separate motion and filtering steps were taken in a batch mode to reduce noise. U.S. Pat. No. 5,764,307, “Method and apparatus for spatially adaptive filtering for video encoding” to T. Ozcelik et al., discloses a method and an apparatus for spatially adaptive filtering a displaced frame difference and reducing the amount of information that must be encoded by a video encoder without substantially degrading the decoded video sequence. The filtering is carried out in the spatial domain on the displaced frames (the motion compensated frames). The goal is to facilitate video coding, so that the compressed video has reduced noise (and smoothed video content as well). U.S. Pat. No. 5,600,731, “Method for temporally adaptive filtering of frames of a noisy image sequence using motion estimation” to M. I. Sezan et al., discloses a temporally adaptive filtering method to reduce noise in an image sequence. Commonly-assigned U.S. Pat. No. 5,384,865, “Adaptive, hybrid median filter for temporal noise suppression” to J. Loveridge, discloses a temporal noise suppression scheme utilizing median filtering upon a time-varying sequence of images.
In addition, International Publication No. WO94/09592, “Three dimensional median and recursive filtering for video image enhancement” to S. Takemoto et al., discloses methods for video image enhancement by spatiotemporal filtering with or without motion estimation. International Publication No. WO01/97509, “Noise filtering an image sequence” to W. Bruls et al., discloses a method to filter an image sequence with the use of estimated noise characteristics. Published European Patent Application EP0840514, “Method and apparatus for prefiltering of video images” to M. Van Ackere et al., discloses a method for generating an updated video stream with reduced noise for video encoding applications. European Patent Specification EP0614312, “Noise reduction system using multi-frame motion estimation, outlier rejection and trajectory correction” to S.-L. Iu, discloses a noise reduction system.
One of the common features of the previously disclosed schemes is the use of independent and separate steps of motion estimation and spatiotemporal filtering. Motion estimation is taken as a preprocessing step in a separate module before filtering, and there is no interaction between the two modules. If the motion estimation fails, filtering is carried out on a collection of uncorrelated samples, and there is no way to recover from such a failure. Also there is no attempt to explicitly estimate the noise levels, leading to a high chance of mismatch between the noise in the video and the algorithms and the parameters used for noise reduction. Furthermore, a robust method has not been used in video filtering, and the performance suffers when the underlying model and assumptions are violated occasionally, which happens when that data is corrupted by noise.
It is an objective of this invention to provide a robust video filtering method to reduce random noise in a video sequence.
It is another objective of this invention to make the computational method robust to occasional model violations and outliers.
It is yet another objective of this invention to successively improve the performance of motion estimation, spatiotemporal filtering and noise estimation through iterations.
The present invention is directed to overcoming one or more of the problems set forth above. Briefly summarized, according to one aspect of the present invention, the invention resides in a method for video filtering of an input video sequence by utilizing joint motion and noise estimation, where the filtering is based on determining the noise level, as characterized by the standard deviation, of the input video sequence as corrupted by unknown noise. The method comprises the steps of: (a) generating a motion-compensated video sequence from the input video sequence and a plurality of estimated motion fields; (b) spatiotemporally filtering the motion compensated video sequence, thereby producing a filtered, motion-compensated video sequence; (c) estimating a standard deviation from the difference between the input video sequence and the filtered, motion-compensated video sequence, thereby producing an estimated standard deviation; (d) estimating a scale factor from the difference between the input video sequence and the motion compensated video sequence; and (e) iterating through steps (a) to (d) using the scale factor previously obtained from step (d) to generate the motion-compensated video sequence in step (a) and using the estimated standard deviation previously obtained from step (c) to perform the filtering in step (b) until the value of the noise level approaches the unknown noise of the input video sequence, whereby the noise level is then characterized by a finally determined scale factor and standard deviation.
The advantages of the invention include: (a) automatically reducing the random noise in a video sequence without the availability of noise-free reference video and without knowledge of the noise characteristics; (b) using joint motion and noise estimation to improve filtering performance through iterations in a closed loop; and (c) employing a robust method to alleviate the sensitivity of occasional model violation and outliers, in motion estimation, filter design and noise estimation.
These and other aspects, objects, features and advantages of the present invention will be more clearly understood and appreciated from a review of the following detailed description of the preferred embodiments and appended claims, and by reference to the accompanying drawings.
In the following description, a preferred embodiment of the present invention will be described in terms that would ordinarily be implemented as a software program. Those skilled in the art will readily recognize that the equivalent of such software may also be constructed in hardware. Because image manipulation algorithms and systems are well known, the present description will be directed in particular to algorithms and systems forming part of, or cooperating more directly with, the system and method in accordance with the present invention. Other aspects of such algorithms and systems, and hardware and/or software for producing and otherwise processing the image signals involved therewith, not specifically shown or described herein, may be selected from such systems, algorithms, components and elements known in the art. Given the system as described according to the invention in the following materials, software not specifically shown, suggested or described herein that is useful for implementation of the invention is conventional and within the ordinary skill in such arts.
Still further, as used herein, the computer program may be stored in a computer readable storage medium, which may comprise, for example; magnetic storage media such as a magnetic disk (such as a hard drive or a floppy disk) or magnetic tape; optical storage media such as an optical disc, optical tape, or machine readable bar code; solid state electronic storage devices such as random access memory (RAM), or read only memory (ROM); or any other physical device or medium employed to store a computer program.
Before describing the present invention, it facilitates understanding to note that the present invention is preferably utilized on any well-known computer system, such as a personal computer. For instance, referring to
A compact disk-read only memory (CD-ROM) 124, which typically includes software programs, is inserted into the microprocessor-based unit for providing a means of inputting the software programs and other information to the microprocessor-based unit 112. In addition, a floppy disk 126 may also include a software program, and is inserted into the microprocessor-based unit 112 for inputting the software program. The compact disk-read only memory (CD-ROM) 124 or the floppy disk 126 may alternatively be inserted into externally located disk drive unit 122 which is connected to the microprocessor-based unit 112. Still further, the microprocessor-based unit 112 may be programmed, as is well known in the art, for storing the software program internally. The microprocessor-based unit 112 may also have a network connection 127, such as a telephone line, to an external network, such as a local area network or the Internet. A printer 128 may also be connected to the microprocessor-based unit 112 for printing a hardcopy of the output from the computer system 110.
Images and videos may also be displayed on the display 114 via a personal computer card (PC card) 130, such as, as it was formerly known, a PCMCIA card (based on the specifications of the Personal Computer Memory Card International Association) which contains digitized images electronically embodied in the card 130. The PC card 130 is ultimately inserted into the microprocessor-based unit 112 for permitting visual display of the image on the display 114. Alternatively, the PC card 130 can be inserted into an externally located PC card reader 132 connected to the microprocessor-based unit 112. Images may also be input via the compact disk 124, the floppy disk 126, or the network connection 127. Any images and videos stored in the PC card 130, the floppy disk 126 or the compact disk 124, or input through the network connection 127, may have been obtained from a variety of sources, such as a digital image or video capture device 134 (e.g., a digital camera) or a scanner (not shown). Images or video sequences may also be input directly from a digital image or video capture device 134 via a camera or camcorder docking port 136 connected to the microprocessor-based unit 112 or directly from the digital camera 134 via a cable connection 138 to the microprocessor-based unit 112 or via a wireless connection 140 to the microprocessor-based unit 112.
Referring now to
As mentioned above, the observed input video sequence {umlaut over (V)} 210 is corrupted by additive random noise {umlaut over (V)}=V+ε with ε following a Gaussian distribution N(0,σn). Given the additive degradation model
Ï(i,j,k)=I(i,j,k)+ε(i,j,k)
with ε(i,j,k) as the independent noise term, the noise level 270, measured by the standard deviation, can be estimated from the noisy input video sequence {umlaut over (V)} and the noise-free video V, as follows:
As the ground truth V is not available, we estimate the noise level σn 270 from the difference between the observed input video sequence {umlaut over (V)} and the filtered video sequence {overscore (V)} 220. The spatiotemporal filtering module 240 reduces the random noise in the motion compensated video {circumflex over (V)} 230 and generates the filtered video {overscore (V)}. Noise estimation module 250 takes both {umlaut over (V)} and {overscore (V)} as input and estimates the noise level, as characterized by the standard deviation σn 270. The process is iterated in a closed-loop fashion as shown in
Noise estimation module 250 also takes both {umlaut over (V)} and {circumflex over (V)} as input and estimates the scale factor, as characterized by the scale factor σd 280. (Generally speaking, as noise in {umlaut over (V)} increases, the scale factor function assigns bigger weights to more samples.) The process is iterated in a closed-loop fashion as shown in
The disclosed video filtering scheme is different from the previous video noise reduction schemes (shown in
The disclosed video filtering scheme can be summarized in a flow chart as presented in
Referring to the motion estimation module 260 in
As motion vectors are imperfect, the chain rule could accumulate motion errors and break the temporal correlation needed for the following filtering.
The recovery of motion vectors (u,v) from a pair of images solely based on image intensity I(i,j) is under-constrained and ill-posed. Even worse, the observed image frames are corrupted by unknown noise. A perfect motion and noise model is almost impossible or not practical. Therefore, a robust method plays an essential role to reduce the sensitivity of the violations of the underlying assumptions.
We use the robust motion estimation method by Black and Anandan to recover the motion field, which is done by minimizing the energy function
where ρd and ρs are robust functions with scale parameters εd and σs, u and v are the horizontal and vertical motion components, and S is the 4-neighbor or 8-neighbor spatial support of pixel (i,j). The first term in the above equation enforces the constant brightness constraint, i.e., the points have the same appearance in both frames. The constraint can be approximated by optical flow equation Ixu+Iyv+It=0 following Taylor expansion. The second term enforces the smoothness constraint such that the motion vectors vary smoothly. Coefficients λd and λs control the relative weights of the two constraints.
In a real dataset, especially corrupted by noise, the constraints may not be strictly satisfied at every point, due to scene changes, illumination changes, occlusions, and shadows. The occasional violations of the constant brightness and smoothness constraints can be alleviated by using a robust method and outlier rejection. Two robust functions for M-estimate are the Lorentzian function
and the Geman-McClure function
as shown in
As the noise-free video is not available, we use the filtered video {overscore (V)}, instead of the observed noisy video {overscore (V)}, for motion estimation. Compared to {umlaut over (V)}, {overscore (V)} has reduced noise and smoother intensity surface, which helps the computation of gradients {overscore (I)}x, {overscore (I)}y, and {overscore (I)}t, yielding smoother and more consistent motion fields.
Referring to the spatiotemporal filtering module 240 in
Îr(i,j,k){overscore (I)}(i+uij(k,k+r),j+vij(k,k+r),k+r)
Bilinear interpolation is carried out on the integer grid, which has a low-pass filtering effect.
The 2R+1 frames are then filtered by adaptive weighted average
where z(i,j,k)=Σ(p,q,r)εSwijk(p,q,r) is a normalization factor, and S defines a 3-D spatiotemporal neighborhood. As {circumflex over (V)} has enhanced temporal correlation, the weighted average can reduce the random noise, which is independent of the signal.
The filter is designed as
wijk(p,q,r)=1−ρG(Îpqr(i,j,k)−Ï(i,j,k),r).
where
is the Geman-McClure robust function shown in
Two parameters are involved in the filter design, namely, the spatiotemporal filtering support S and the scale factor τ. The support S is usually chosen as 1×1 or 3×3 spatial neighborhood, and 7 or 9 temporally adjacent frames (with R=3,4). As the size of S increases, it helps reduce noise, but tends to blur the images at the same time. So a balance is needed, especially when the motion is not perfect. The scale factor τ is chosen as τ=σn{square root}{square root over (σn)}, where σn is the noise level estimated from module 250. As noise in {overscore (V)} increases, the robust function assigns bigger weights to more samples.
Referring to noise estimation 250 in
εd(k,k+r)={{overscore (I)}(i+uij(k,k+r),j+vij(k,k+r),k+r)−{overscore (I)}(i,j,k)|i=1 . . . M,j=1 . . . N,k=1 . . . K,r=−R . . . R}.
A robust estimate of the scale factor is available as
σd(k,k+r)=1.4826 median{|εd(k,k+r)−median{εd(k,k+r)}|}
The robust video filtering scheme has been tested on video sequences degraded to various noise levels, and significant performance improvement has been achieved. A few factors have contributed to the performance improvement: (a) a robust method is employed in both motion estimation and spatiotemporal filtering to accommodate occasional model violations; (b) a joint motion and noise estimation process is iterated in a closed loop for the best possible performance; and (c) explicit noise estimation is carried out for temporal correlation enhancement and noise reduction.
The method disclosed according to the invention may have a number of distinct applications. For example, the video filtering may used to improve video coding and compression efficiency, due to the reduced entropy. The video filtering may also be used to minimize the storage space for a video clip or to minimize the transmission bandwidth of a video sequence. Furthermore, the video filtering may used to enhance the video presentation quality, in print or in display. Additionally, the video filtering may be used to extract more distinctive and unique descriptions for efficient video management, organization and indexing. In each case, the usage of the aforementioned robust filter designs further enhances the values of these applications.
The invention has been described in detail with particular reference to a presently preferred embodiment, but it will be understood that variations and modifications can be effected within the spirit and scope of the invention.