The subject disclosure relates to video denoising and more particularly, to a maximum a posteriori (MAP) based optimization for denoising video.
Video denoising is used to remove noise from a video signal. Video denoising methods have generally been divided into spatial and temporal video denoising. Spatial denoising methods analyze one frame for noise suppression and are similar to image noise reduction techniques. Temporal video denoising methods use temporal information embedded in the sequencing of the images, and can be further subdivided into motion adaptive methods and motion compensative methods. For instance, motion adaptive methods use analysis of pixel motion detection and attempt to average with previous pixels where there is no motion detected and, for example, motion compensative methods use motion estimation to predict and consider pixel values from a specific position in previous frame(s). As the name implies, spatial-temporal video denoising methods use a combination of spatial and temporal denoising.
Video noise can include analog noise and/or digital noise. For just a few examples of some various types of analog noise that can result in a corrupted video signal, such noise sources can include radio channel artifacts (high frequency interference, e.g., dots, short horizontal color lines, etc., brightness and color channels interference, e.g., problems with antenna, video reduplication—false contouring appearance), VHS tape artifacts (color specific degradation, brightness and color channels interference, chaotic shift of lines at the end of frame, e.g., line resync signal misalignment, wide horizontal noise strips), film artifacts (dust, dirt, spray, scratches on medium, curling, fingerprints), and a host of other analog noise types. For a few examples of some various types of digital noise that can result in a video signal, noise sources include blocking from low bitrate, ringing, block errors or damage in case of losses in digital transmission channel or disk injury, e.g., scratches on physical disks, and a host of other digital noise types.
Conventional video denoising methods have been designed for specific types of noise, e.g., noise with particular characteristics, and different suppression methods have been proposed to remove noise from video.
For instance, one conventional denoising system proposes the use of motion compensation (MC) with an approximated 3D Wiener filter. Another conventional denoising system proposes using a spatio-temporal Kalman filter. Such conventional methods require enormous amounts of computation and storage, however. While some systems have been proposed to reduce the computation and storage, their applicability is narrow. Moreover, standard H.264 encoders fix certain variables that are inherently not optimized for dependent characteristics of the noise, such as Gaussian noise, to which denoising is to be applied.
Accordingly, it would be desirable to provide a better solution for video denoising. The above-described deficiencies of current designs for video denoising are merely intended to provide an overview of some of the problems of today's designs, and are not intended to be exhaustive. For instance, other problems with the state of the art may become further apparent upon review of the following description of various non-limiting embodiments below.
A simplified summary is provided herein to help enable a basic or general understanding of various aspects of exemplary, non-limiting embodiments that follow in the more detailed description and the accompanying drawings. This summary is not intended, however, as an extensive or exhaustive overview. The sole purpose of this summary is to present some concepts related to the various exemplary non-limiting embodiments in a simplified form as a prelude to the more detailed description that follows.
Based on maximum a posteriori (MAP) estimates for previous frames of a noisy video sequence, video denoising techniques for frames of noisy video are provided. The noise is assumed Gaussian in nature and an a priori conditional density model is measured as a function of bit rate. A MAP estimate of a denoised current frame can thus be expressed as a rate distortion optimization problem. A constraint minimization problem based on the rate distortion optimization problem can be used to optimally set a variable lagrangian parameter to optimize the denoising process. The lagrangian parameter can be determined as a function of distortion of the noise and a quantization level associated with an encoding of the noisy video.
The MAP-based optimization techniques for video denoising are further described with reference to the accompanying drawings in which:
As discussed in the background, at a high level, video denoising relates to when an ideal video becomes distorted during the process of being digitized or transmitted, which can happen for a variety of reasons, e.g., due to motion of objects, a lack of focus or deficiencies of an optical system involved with capture of the video, etc. After capture and storage, video can become further distorted during transmission over noisy channels. The resulting noisy or distorted video is visually unpleasant and makes some tasks, such as segmentation, recognition and compression, more difficult to perform. It is thus desirable to be able to reconstruct an accurate estimate of the ideal video from the “corrupted” observations in an optimal manner to improve visual appearance, reduce video storage requirements and to facilitate additional operations performed on the video.
In consideration of these issues, various embodiments optimize video denoising processing by restoring video degraded by additive Gaussian noise having values following a Gaussian or normal amplitude distribution.
I
n
=I+n Eqn. 1
where I=[I1, I2 . . . Ik, Ik+1 . . . Im]T is the original, or ideal, video while Ik is the k-th frame, n=[n1, n2 . . . nk, nk+1 . . . nm]T is the additive Gaussian noise, and In=[I1n, I2n . . . Ikn, Ik+1n . . . Imn]T is the noisy observation of the video. Ik, nk, Ikn represent length-N vectors, where N is the number of pixels in each frame. In various non-limiting embodiments described below an estimate Î for the original, or ideal, video is provided based on an analysis of noisy video In.
The above-described concepts are illustrated in the block diagram of
For instance, some embodiments can operate according to the flow diagram of
In this respect, maximum a posteriori (MAP) estimation techniques are used to perform video denoising, which are now described in more detail in connection with the flow diagram of
Various embodiments and further underlying concepts of the denoising processing are described in more detail below.
According to Bayesian principles, a MAP-based video denoising technique is provided that determines a MAP estimate by two terms: a noise conditional density model and an a priori conditional density model. As mentioned, based on the above noted assumptions that the noise satisfies Gaussian distribution and the a priori model is measured by the bit rate, the MAP estimate can be expressed as a rate distortion optimization problem.
In order to find a suitable lagrangian parameter for the rate distortion optimization problem, the rate distortion problem is transformed to a constraint minimization problem by setting the rate as an objective function and the distortion as a constraint. In this way, the lagrangian parameter can be determined by the distortion constraint. Fixing the distortion constraint, the optimal lagrangian parameter is obtained, which in turn leads to an optimal denoising result.
In further non-limiting embodiments described below, additional details are provided regarding the MAP-based video denoising techniques and some results from exemplary implementations are set forth that demonstrate the effectiveness and efficiency of the various embodiments.
In accordance with an embodiment, since the input noisy videos are denoised frame by frame, when denoising frame Ikn, the estimated original versions of previous frames Î1, . . . Îk−1 have already been reconstructed, which are the MAP estimate of previous frames. While in an exemplary implementation, one previous reference frame is used when denoising a current frame, it can be appreciated that the techniques can be extended to any number of previous reference frames. Given Îk−1 (reference frame) and Ikn (current noisy frame). the maximum a posteriori (MAP) estimate of the current frame Ik is set forth as:
By using Bayes rule, Equation 2 can be expressed as:
Ignoring all the functions that are not related to Ik, the estimates of Equation 3 can be written as:
Since Ikn=Ik+nk, Pr(Ikn|Ik, Îk−1) is equal to Pr(Ikn|Ik). Therefore, the estimates of Equation 4 can be further simplified as:
Taking the “minus log” function of Equation 5 above results in:
According to Equation 6, the MAP estimate Îk is based on the noise conditional density Pr(Ikn|Ik) and the priori conditional density Pr(Ik|Îk−1).
Given an original frame Ik, the noise conditional density Pr(Ikn|Ik) is determined by the noise's distribution in Equation 1 above. Generally, the noise satisfies or is similar to Gaussian distribution. The density for a Gaussian distribution is defined as having mean μn, and variance σn2 as
According to Equations 1 and 7, the minus log of the conditional density −log [Pr(Ikn|Ik)], which is the first term in Equation 6, can be expressed as:
With video denoising to remove noise in a current frame, the current frame can be viewed as a “corrupted” version of the previous frame:
I
k
=AÎ
k−1
+r Eqn. 9
where A can be seen as motion estimation matrix and r is the residue after motion compensation.
Assuming r satisfies the following density function:
Pr(r)=κ exp−λΦ(r) Eqn. 10
Then, the second term in Equation 6 (a priori conditional density) can be written as:
−log [Pr(Ik|Îk−1)]=λΦ(Ik−AÎk−1)−log(κ) Eqn. 11
Combining Equations 8 and 11, assuming μn=0, and ignoring the constant term (since the minimization is over Ik and the constant term is independent of Ik, ignoring the constant term has no effect on the optimization and denoising processes of interest), Equation 6 reduces to:
The first term (Ikn−Ik)2 in Equation 12 can be seen as the distortion Dk between the noisy data and the estimate of the original data. By defining the second term Φ(Ik−AÎk−1) as the bit rate Rk of the residue Ik−AÎk−1, Equation 12 can be re-written as follows:
Here, the energy function Φ( ) is measured by the bit rate R of the motion compensated residue. This is reasonable, since for the natural video, the bit rate R of the residue is usually quite small. However, for the noisy video, the bit rate may become large. Therefore, finding the reconstruction frames with a small bit rate of the residue equates to reducing the noise.
In accordance with an embodiment, from Equation 13, the minimization is observed to be over two objective functions Dk and Rk based on the regularization parameter α. Given α, an optimal solution for Equation 13 can be determined. However, determining a suitable α in the form of Equation 13 can be challenging. Thus, in one embodiment, equation 13 is solved as a constrained minimization problem, as follows:
where Dk0 is the threshold, which is determined by the noise's variance and the quantization parameter. By fixing Dk0, the optimal lagrangian parameter α can be found for Equation 13. In this respect, the MAP estimate Îk is a compressed version of the noisy data Ikn. Therefore, the system operates to simultaneously compress the video signal and remove the noise.
To determine an output of Equation 14, generally, the bit rate R is assumed to be a function of the distortion D as follows:
Since the R(D) function in the Equation 15 is convex in term of D, the optimization problem in Equation 14 is convex and the optimal solution can be achieved by solving the following Karush-Kuhn-Tucker (KKT) conditions:
obtaining the following result:
According to Equation 13, α is the lagrangian parameter in the rate distortion optimization problem. Therefore, for instance, the lagrangian parameter in the H.264 coding standard, which is a commonly used video compression standard, should be:
The video denoising algorithm described for various embodiments herein was also evaluated based on an exemplary non-limiting H.264 implementation. To simulate noising of a video, clean video sequences were first manually distorted by adding Gaussian noise. Then, the noisy videos were denoised by using the above-described techniques. As shown in
Operation was observed to perform well over a variety of different selected noise variances. In one non-limiting implementation, the parameters Dk0 and β were set to be Dk0=σ2/2 and β=0.392, respectively. The peak signal to noise ratio (PSNR) can be computed by comparing with the original video sequence, and can be used to quantify what is shown in
Thus,
In Table I below, the average PSNR performance comparison for test sequences is shown for noise variances of 49, 100, and 169, respectively. In one non-limiting implementation, it was observed that the PSNR performance is about 4 to 10 dB (e.g., 3.823˜10.186 dB) higher than that of the noisy video, a significant improvement.
At 1520, the variance of noise and quantization level associated with current frame encoding is determined. Determining the quantization level can include determining a quantization parameter of the H.264 encoding standard. At 1530, denoising is performed based on the variance of noise and quantization level associated with current frame encoding. Denoising can include maximum a posteriori (MAP) based denoising based on the prior frame, compressing the current frame and/or optimizing rate distortion of the noise. At 1540, original video for the current frame is estimated based on the denoising. For example, estimating can be based on a noise conditional density determined from the statistical distribution of the noise and/or based on an a priori conditional density model determined based on the prior frame. At 1550, steps 1500-1540 are iteratively performed for each subsequent frame of noisy video to denoise a designated sequence of the video.
In one embodiment, the estimate of original video data for the one or more prior frames is at least one MAP-based estimate determined by the denoising component. The denoising component can include a H.264 encoder for encoding the output of the MAP based denoising performed by the denoising component according to the H.264 format. In one embodiment, the denoising component further determines a level of quantization associated with an encoding of the current frame.
In other embodiments, the denoising component optimally determines the estimate of the original image data of the current frame by optimally setting a variable lagrangian parameter. In this regard, the denoising component optimally sets a variable lagrangian parameter associated with a rate distortion function based on a distortion between the noise image data and the estimate and a bit rate associated with a residue after motion compensation. As a result, the denoising component achieves an increase in peak signal to noise ratio (PSNR) of the estimate of the original data over the PSNR of the current frame including the noise image data substantially in the range of about 4 to 10 decibels.
One of ordinary skill in the art can appreciate that the various embodiments of cooperative concatenated coding described herein can be implemented in connection with any computer or other client or server device, which can be deployed as part of a computer network or in a distributed computing environment, and can be connected to any kind of data store. In this regard, the various embodiments described herein can be implemented in any computer system or environment having any number of memory or storage units, and any number of applications and processes occurring across any number of storage units. This includes, but is not limited to, an environment with server computers and client computers deployed in a network environment or a distributed computing environment, having remote or local storage.
Distributed computing provides sharing of computer resources and services by communicative exchange among computing devices and systems. These resources and services include the exchange of information, cache storage and disk storage for objects, such as files. These resources and services also include the sharing of processing power across multiple processing units for load balancing, expansion of resources, specialization of processing, and the like. Distributed computing takes advantage of network connectivity, allowing clients to leverage their collective power to benefit the entire enterprise. In this regard, a variety of devices may have applications, objects or resources that may implement one or more aspects of cooperative concatenated coding as described for various embodiments of the subject disclosure.
Each object 1810, 1812, etc. and computing objects or devices 1820, 1822, 1824, 1826, 1828, etc. can communicate with one or more other objects 1810, 1812, etc. and computing objects or devices 1820, 1822, 1824, 1826, 1828, etc. by way of the communications network 1840, either directly or indirectly. Even though illustrated as a single element in
There are a variety of systems, components, and network configurations that support distributed computing environments. For example, computing systems can be connected together by wired or wireless systems, by local networks or widely distributed networks. Currently, many networks are coupled to the Internet, which provides an infrastructure for widely distributed computing and encompasses many different networks, though any network infrastructure can be used for exemplary communications made incident to the cooperative concatenated coding as described in various embodiments.
Thus, a host of network topologies and network infrastructures, such as client/server, peer-to-peer, or hybrid architectures, can be utilized. The “client” is a member of a class or group that uses the services of another class or group to which it is not related. A client can be a process, i.e., roughly a set of instructions or tasks, that requests a service provided by another program or process. The client process utilizes the requested service without having to “know” any working details about the other program or the service itself.
In a client/server architecture, particularly a networked system, a client is usually a computer that accesses shared network resources provided by another computer, e.g., a server. In the illustration of
A server is typically a remote computer system accessible over a remote or local network, such as the Internet or wireless network infrastructures. The client process may be active in a first computer system, and the server process may be active in a second computer system, communicating with one another over a communications medium, thus providing distributed functionality and allowing multiple clients to take advantage of the information-gathering capabilities of the server. Any software objects utilized pursuant to the techniques for performing cooperative concatenated coding can be provided standalone, or distributed across multiple computing devices or objects.
In a network environment in which the communications network/bus 1840 is the Internet, for example, the servers 1810, 1812, etc. can be Web servers with which the clients 1820, 1822, 1824, 1826, 1828, etc. communicate via any of a number of known protocols, such as the hypertext transfer protocol (HTTP). Servers 1810, 1812, etc. may also serve as clients 1820, 1822, 1824, 1826, 1828, etc., as may be characteristic of a distributed computing environment.
As mentioned, advantageously, the techniques described herein can be applied to any device where it is desirable to transmit data from a set of cooperating users. It should be understood, therefore, that handheld, portable and other computing devices and computing objects of all kinds are contemplated for use in connection with the various embodiments, i.e., anywhere that a device may wish to transmit (or receive) data. Accordingly, the below general purpose remote computer described below in
Although not required, embodiments can partly be implemented via an operating system, for use by a developer of services for a device or object, and/or included within application software that operates to perform one or more functional aspects of the various embodiments described herein. Software may be described in the general context of computer-executable instructions, such as program modules, being executed by one or more computers, such as client workstations, servers or other devices. Those skilled in the art will appreciate that computer systems have a variety of configurations and protocols that can be used to communicate data, and thus, no particular configuration or protocol should be considered limiting.
With reference to
Computer 1910 typically includes a variety of computer readable media and can be any available media that can be accessed by computer 1910. The system memory 1930 may include computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) and/or random access memory (RAM). By way of example, and not limitation, memory 1930 may also include an operating system, application programs, other program modules, and program data.
A user can enter commands and information into the computer 1910 through input devices 1940. A monitor or other type of display device is also connected to the system bus 1922 via an interface, such as output interface 1950. In addition to a monitor, computers can also include other peripheral output devices such as speakers and a printer, which may be connected through output interface 1950.
The computer 1910 may operate in a networked or distributed environment using logical connections to one or more other remote computers, such as remote computer 1970. The remote computer 1970 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, or any other remote media consumption or transmission device, and may include any or all of the elements described above relative to the computer 1910. The logical connections depicted in
The word “exemplary” is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms “includes,” “has,” “contains,” and other similar words are used in either the detailed description or the claims, for the avoidance of doubt, such terms are intended to be inclusive in a manner similar to the term “comprising” as an open transition word without precluding any additional or other elements.
Various implementations and embodiments described herein may have aspects that are wholly in hardware, partly in hardware and partly in software, as well as in software. As used herein, the terms “component,” “system” and the like are likewise intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on computer and the computer can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers.
Thus, the methods and apparatus of the embodiments described herein, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the techniques. In the case of program code execution on programmable computers, the computing device generally includes a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.
Furthermore, the disclosed subject matter may be implemented as a system, method, apparatus, or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer or processor based device to implement aspects detailed herein. The terms “article of manufacture”, “computer program product” or similar terms, where used herein, are intended to encompass a computer program accessible from any computer-readable device, carrier, or media. For example, computer readable media can include but are not limited to magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips . . . ), optical disks (e.g., compact disk (CD), digital versatile disk (DVD) . . . ), smart cards, and flash memory devices (e.g., card, stick). Additionally, it is known that a carrier wave can be employed to carry computer-readable electronic data such as those used in transmitting and receiving electronic mail or in accessing a network such as the Internet or a local area network (LAN).
The aforementioned systems have been described with respect to interaction between several components. It can be appreciated that such systems and components can include those components or specified sub-components, some of the specified components or sub-components, and/or additional components, and according to various permutations and combinations of the foregoing. Sub-components can also be implemented as components communicatively coupled to other components rather than included within parent components, e.g., according to a hierarchical arrangement. Additionally, it should be noted that one or more components may be combined into a single component providing aggregate functionality or divided into several separate sub-components, and any one or more middle layers, such as a management layer, may be provided to communicatively couple to such sub-components in order to provide integrated functionality. Any components described herein may also interact with one or more other components not specifically described herein but generally known by those of skill in the art.
In view of the exemplary systems described sura, methodologies that may be implemented in accordance with the disclosed subject matter will be better appreciated with reference to the various flowcharts. While for purposes of simplicity of explanation, the methodologies are shown and described as a series of blocks, it is to be understood and appreciated that the claimed subject matter is not limited by the order of the blocks, as some blocks may occur in different orders and/or concurrently with other blocks from what is depicted and described herein. Where non-sequential, or branched, flow is illustrated via flowchart, it can be appreciated that various other branches, flow paths, and orders of the blocks, may be implemented which achieve the same or a similar result. Moreover, not all illustrated blocks may be required to implement the methodologies described hereinafter.
Furthermore, as will be appreciated various portions of the disclosed systems above and methods below may include or consist of artificial intelligence or knowledge or rule based components, sub-components, processes, means, methodologies, or mechanisms (e.g., support vector machines, neural networks, expert systems, Bayesian belief networks, fuzzy logic, data fusion engines, classifiers . . . ). Such components, inter alia, can automate certain mechanisms or processes performed thereby to make portions of the systems and methods more adaptive as well as efficient and intelligent.
While the embodiments have been described in connection with the embodiments of the various figures, it is to be understood that other similar embodiments may be used or modifications and additions may be made to the described embodiment for performing the same function without deviating therefrom.
While exemplary embodiments may be presented in the context of particular programming language constructs, specifications or standards, such embodiments are not so limited, but rather may be implemented in any language to perform the optimization algorithms and processes. Still further, embodiments can be implemented in or across a plurality of processing chips or devices, and storage may similarly be effected across a plurality of devices. Therefore, the invention should not be limited to any single embodiment, but rather should be construed in breadth and scope in accordance with the appended claims.
This application claims priority to U.S. Provisional Application Ser. No. 60/945,995, filed on Jun. 25, 2007, entitled “RATE DISTORTION OPTIMIZATION FOR VIDEO DENOISING”, the entirety of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
60945995 | Jun 2007 | US |