The invention relates generally to the field of digital video and image sequence processing, and in particular to noise estimation from a noisy video sequence.
In recent years, as video capture, storage, transmission, display, manipulation, and management become easier and cheaper, video is getting widespread use in communication, entertainment, education, security, surveillance, medicine, and military applications. However, there is always a certain level of noise captured in a video sequence, such as electronic noise, photon noise, film grain noise, and quantization noise. The noise contaminates visual quality and makes the content less useful. For example, noise makes it difficult to analyze the crime scene in a surveillance video. Noise also increases entropy and decreases coding efficiency, so it takes more storage space and wider transmission bandwidth to communicate and record video. It also makes content description less discriminative and content management less effective. Therefore, it is desirable to estimate and reduce the noise while preserving video content. To effectively reduce noise, good knowledge of the noise characteristics is needed, so appropriate algorithms and parameters can be chosen for the specific dataset.
After years of effort, noise estimation from video sequences still remains a challenging task. Most of the time, the degraded video is the only observation available. Inter-frame intensity differences observed in the degraded video are partly due to scene/object motion and partly due to noise. Estimation of the noise requires tremendous computational power because of the amount of data involved in a video sequence. Furthermore, noise estimation is used in conjunction with noise reduction, and the estimation becomes more reliable if the filtered video is closer to the noise-free groundtruth.
Research on noise estimation and reduction in video sequences has been going on for decades. “Noise reduction in image sequence using motion-compensated temporal filtering” by E. Dubois and M. Sabri, IEEE Trans. on Communication, 32(7):826-831, 1984, presented one of the earliest schemes using motion for noise reduction. A comprehensive review of various methods is available in “Noise reduction filters for dynamic image sequence: a review” by J. C. Brailean, et al., Proceedings of the IEEE, 83(9):1272-1292, September 1995.
Commonly-assigned, copending U.S. patent application Ser. No. 10/602,427 filed 24 Jun. 2003, entitled “System and method for estimating, synthesizing, and matching noise in digital images and image sequences” by G. Fielding, discloses methods to synthesize noise, match noise in two images, and automatically compute noise statistics in an image sequence. Commonly-assigned U.S. Pat. No. 5,923,775, “Apparatus and method for signal dependent noise estimation and reduction in digital images” to P. Snyder et al., discloses a method to estimate signal (code value) dependant noise in an image and subsequently to reduce that noise. The estimation is carried out on a single image. U.S. Pat. No. 5,764,307, “Method and apparatus for spatially adaptive filtering for video encoding” to T. Ozcelik et al., discloses a noise estimation method based on a displaced frame difference to facilitate video coding and compression. The estimated noise level is the difference between a video frame and a motion compensated frame after block-matching motion estimation. Noise estimation is carried out on a single frame. Published European Patent Application EP0957367, “Method for estimating the noise level in a video sequence” to F. Le Clerc, discloses a method for noise estimation by combining the analysis of displaced field or frame differences (DFD) and the values of the field or frame differences (FD) over static picture areas. Published European Patent Application EP 1126729, “A process for estimating the noise level in sequences of images and a device therefore” to A. Borneo et al., discloses a process to estimate noise level in an image sequence.
The previously disclosed approaches estimate noise on a 2-D spatial domain or on a 3-D spatiotemporal domain in an open-loop fashion. The computations are carried out in a batch mode without iterations. Moreover, the estimated noise level was not used to improve motion estimation and spatiotemporal filtering, which heavily depend on the knowledge of the error characteristics in video. Furthermore, robust methods were not used for noise estimation in these approaches. Robust methods become crucial when noise is presented, as they can alleviate the sensitivity of occasional model violations.
What is needed is a robust noise estimation method for a noise-corrupted video sequence, with decreased sensitivity to model violations and outliers.
The object of the invention is to provide a robust noise estimation method for a noisy video sequence.
The present invention is directed to overcoming one or more of the problems set forth above. Briefly summarized, according to one aspect of the present invention, the invention resides in a method for determining the noise level, as characterized by the standard deviation, of an input video sequence corrupted by unknown noise, comprising the steps of: (a) spatiotemporally filtering the input video sequence, thereby producing a filtered video sequence; (b) estimating a standard deviation from the difference between the input video sequence and the filtered video sequence, thereby producing an estimated standard deviation; and (c) iterating through steps (a) and (b) using the estimated standard deviation previously obtained from step (b) to perform the filtering in step (a) until the value of the noise level approaches the unknown noise, whereby the noise level is then characterized by a finally determined standard deviation.
The advantages of the disclosed method include: (a) estimating the noise level from the noisy video and the filtered video, without the availability of the noise-free video; (b) carrying out the estimation process in a closed loop to iteratively improve noise estimation and spatiotemporal filtering successively; (c) employing a robust method to alleviate the sensitivity of occasional model violation and outliers; and (d) using a fast median sorting scheme for efficient computation.
These and other aspects, objects, features and advantages of the present invention will be more clearly understood and appreciated from a review of the following detailed description of the preferred embodiments and appended claims, and by reference to the accompanying drawings.
In the following description, a preferred embodiment of the present invention will be described in terms that would ordinarily be implemented as a software program. Those skilled in the art will readily recognize that the equivalent of such software may also be constructed in hardware. Because image manipulation algorithms and systems are well known, the present description will be directed in particular to algorithms and systems forming part of, or cooperating more directly with, the system and method in accordance with the present invention. Other aspects of such algorithms and systems, and hardware and/or software for producing and otherwise processing the image signals involved therewith, not specifically shown or described herein, may be selected from such systems, algorithms, components and elements known in the art. Given the system as described according to the invention in the following materials, software not specifically shown, suggested or described herein that is useful for implementation of the invention is conventional and within the ordinary skill in such arts.
Still further, as used herein, the computer program may be stored in a computer readable storage medium, which may comprise, for example; magnetic storage media such as a magnetic disk (such as a hard drive or a floppy disk) or magnetic tape; optical storage media such as an optical disc, optical tape, or machine readable bar code; solid state electronic storage devices such as random access memory (RAM), or read only memory (ROM); or any other physical device or medium employed to store a computer program.
Before describing the present invention, it facilitates understanding to note that the present invention is preferably utilized on any well-known computer system, such as a personal computer. For instance, referring to
A compact disk-read only memory (CD-ROM) 124, which typically includes software programs, is inserted into the microprocessor-based unit for providing a means of inputting the software programs and other information to the microprocessor based unit 112. In addition, a floppy disk 126 may also include a software program, and is inserted into the microprocessor-based unit 112 for inputting the software program. The compact disk-read only memory (CD-ROM) 124 or the floppy disk 126 may alternatively be inserted into externally located disk drive unit 122 which is connected to the microprocessor-based unit 112. Still further, the microprocessor-based unit 112 may be programmed, as is well known in the art, for storing the software program internally. The microprocessor-based unit 112 may also have a network connection 127, such as a telephone line, to an external network, such as a local area network or the Internet. A printer 128 may also be connected to the microprocessor-based unit 112 for printing a hardcopy of the output from the computer system 110.
Images and videos may also be displayed on the display 114 via a personal computer card (PC card) 130, such as, as it was formerly known, a PCMCIA card (based on the specifications of the Personal Computer Memory Card International Association) which contains digitized images electronically embodied in the card 130. The PC card 130 is ultimately inserted into the microprocessor-based unit 112 for permitting visual display of the image on the display 114. Alternatively, the PC card 130 can be inserted into an externally located PC card reader 132 connected to the microprocessor-based unit 112. Images may also be input via the compact disk 124, the floppy disk 126, or the network connection 127. Any images and videos stored in the PC card 130, the floppy disk 126 or the compact disk 124, or input through the network connection 127, may have been obtained from a variety of sources, such as a digital image or video capture device 134 or a scanner (not shown). Images or video sequences may also be input directly from a digital image or video capture device 134 via a camera or camcorder docking port 136 connected to the microprocessor-based unit 112 or directly from the digital image or video capture device 134 via a cable connection 138 to the microprocessor-based unit 112 or via a wireless connection 140 to the microprocessor-based unit 112.
Referring now to
{overscore (I)}(i, j, k)=I(i, j, k)+ε(i, j, k)
with ε(i, j, k) as the independent noise term, the noise level 270, measured by the standard deviation, can be estimated from the noisy input video sequence {overscore (V)} and the noise-free video V, as follows:
As the groundtruth V is not available, we estimate the noise level σn 270 from the difference between the observed input video sequence {overscore (V)} and the filtered video sequence {overscore (V)} 220. A spatiotemporal filtering module 240 reduces the random noise in {overscore (V)} and generates the filtered video {overscore (V)}. Noise estimation module 250 takes both {overscore (V)} and {overscore (V)} as input and estimates the noise level, as characterized by the standard deviation σn 270. The process is iterated in a closed-loop fashion as shown in
The procedure can be summarized in a flow chart in
In the following, we present more details for the noise estimation module 250 and the specific procedure 330. The structure of {overscore (V)}-{overscore (V)} is complicated, partly due to random noise, incorrect motion trajectories, and imperfect spatiotemporal filtering. Thus a robust method is used to estimate σn and to reduce the sensitivity of the occasional violations of the underlying model and assumptions. Model violations may be caused by scene changes, illumination changes, occlusions, and shadows, yielding incorrect motion vectors and imperfect noise filtering. Let the residue {overscore (V)}-{overscore (V)} be denoted as
εn={{overscore (I)}(i, j, k)−{overscore (I)}(i, j, k)|i=1 . . . M,j=1 . . . N,k=1 . . . K}
It is mainly due to the random noise, with occasional changes in the video structure as outliers. A robust estimate of the noise level is
σn=1.4826 median {|εn−median{εn}|}
A fast (approximate) median sorting algorithm is used on the sampled subset of εn for efficient computation, because the size of εn is quite significant. The details of the median estimation algorithm are shown in
An example of the noise estimation is shown in
The estimated noise level can be used to reduce the random noise in a video sequence by spatiotemporal filtering. Numerous motion estimation algorithms, such as gradient-based, region-based, energy-based, and transform-based approaches, can be used to enhance the temporal correlation. There are also a number of filters available for spatiotemporal filtering, including Wiener filter, Sigma filter, median filter, and adaptive weighted average (AWA) filter.
Testing of this robust estimation method has been carried out for a video sequence degraded to various noise levels. After a few iterations, the estimated standard deviation σn gets very close to the groundtruth.
The invention has been described in detail with particular reference to a presently preferred embodiment, but it will be understood that variations and modifications can be effected within the spirit and scope of the invention.