The present invention is related to the field of signal processing, and, more particularly, to statistical-based signal detection and estimation.
A filter, or estimator, typically refers to a system that is designed to extract information from a signal affected by or otherwise corrupted with noise. Accordingly, a filter is intended to extract information of interest from noisy data. Filter, or estimation, theory has been applied in a wide variety of fields, including communications, radar, sonar, navigation, seismology, finance, and biomedical engineering.
The Wiener filter, which remains one of the outstanding achievements of 20th Century optimal system design, optimally filters a signal. The filtering or estimation effected with the Wiener filter is optimal in the statistical sense of minimizing the average squared error between the desired and the actual output of a system. The Wiener filter extends the well-known solution of regression to linear functional spaces; that is, the space of functions of time, or Hilbert Space.
The manner in which Wiener filters are typically applied in digital systems and computers is in an L-dimensional linear vector space (RL). This is due to the fact that the filter topology normally utilized in this context is a finite-duration impulse response (FIR) filter. Given an input signal x(n), considered to be stationary random process, and a desired response d(n), also a stationary random process, the best linear filter of order L for approximating the desired response d(n) in the mean square error sense is a FIR filter. The FIR filter is a weight vector w=R−1p, where R is the autocorrelation matrix of the input signal and p is the crosscorrelation vector between the input signal x(n) and the desired response d(n).
Due to the properties of the autocorrelation function of real or complex stationary random processes, the weight vector w can be computed with an algorithmic complexity of O(L2). Alternatively, search procedures based on the least mean square (LMS) algorithm can find the optimal weight vector in O(L) time. Due to the power of the solution and its relatively straightforward implementation, Wiener filters have been extensively utilized in most, if not all, areas of electrical engineering.
There are three basic types of estimation problems: (1) filtering, which involves the extraction of information in real time (i.e., using data until time n); (2) smoothing, according to which the extraction is done at time n1<n, where n represents the present time; and (3) prediction, according to which the extraction of information is done at a time or sample n2>n. The Wiener filter is the optimal linear estimator for each one of these estimation problems.
There are four general classes of applications for Wiener filters: (1) identification, in which the input and desired response for the Wiener filter come from the input and output of an unknown plant (man-made or physical and biological systems); (2) inverse modeling, in which the input and desired response of the Wiener filter come respectively from the output of the plant and from its input (eventually with a delay included): (3) prediction, in which the input and desired responses to the Wiener filter are given respectively by the delayed version of the time series and the current sample; and (4) interference cancellation, in which the input and desired responses for the Wiener filter come respectively from the reference signal (signal+noise) and primary input (noise alone).
Wiener filters have also been applied in the context of multiple-input—single-input (MISO) systems and devices, such as beamformers, whereby several antennas are used to capture parts of the signal, and the objective is to optimally combine them. Additionally, Wiener filters have been applied in the context of multiple-input—multiple-output (MIMO) systems and devices, whereby the goal is to optimally estimate the best projection of the input to achieve simultaneous multiple desired responses. The engineering areas where Wiener filers have been applied include communication systems (e.g., channel estimation and equalization, and beam forming), optimal controls (e.g., system identification and state estimation), and signal processing (e.g., model-based spectral analysis, and speech and image processing). Not surprisingly, Wiener filters are one of the central pillars of optimal signal processing theory and applications.
Despite their wide-spread use, Wiener filters are solutions limited to linear vector spaces. Numerous attempts have been made to create nonlinear solutions to the Wiener filter, based in the main on Volterra series approximation. Unfortunately, though, these nonlinear solutions are typically complex and usually involve numerous coefficients. There are also two types of nonlinear models that have been commonly used: The Hammerstein and the Wiener models. The Hammerstein and Wiener models are characterized by static nonlinearity and composed of a linear system, where the linear system is adapted using the Wiener solution. However, the choice of the nonlinearity is critical to achieving adequate performance, because it is a linear solution that is obtained in the transformed space according to these conventional techniques.
Recent advances in nonlinear signal processing have used nonlinear filters, commonly known as dynamic neural networks or fuzzy systems. Dynamic neural networks have been extensively used in the same basic applications of Wiener filters when the system under study is nonlinear. However, there typically are no analytical solutions to obtain the parameters of neural networks. They are normally trained using the backpropagation algorithm or its modifications (backpropagation through time (BPTT) or real-time recurrent learning (RTRL), as well as global search methods such as genetic algorithms or simulated annealing.
In some other cases, a nonlinear transformation of the input is first implemented and a regression is computed at the output. A good example of this is the radial basis function (RBF) network and more recently the kernel methods. The disadvantage of these alternate techniques of projection is the tremendous amount of computation required, which makes them impractical for most real world cases. For instance, to implement kernel regression on a 1,000-point sample, a 1,000×1,000 signal matrix has to be solved. By comparison, if a linear Wiener filter of order 10 is to be computed, only a 10×10 matrix is necessary.
Accordingly, there is a need to extend the solutions for the Wiener filter beyond solutions in linear vector spaces. In particular, there is a need for a computationally efficient and effective mechanism for creating nonlinear solutions to the Wiener filter.
The present invention provides a nonlinear correntropy filter that can extent filter solutions, such as the those for the Wiener filter, beyond solutions in linear vector spaces. Indeed, the present invention can provide an optimal nonlinear correntropy filter.
Moreover, the invention can provide iterative solutions to a correntropy Wiener filter, which can be obtained using a least mean square and/or recursive least square algorithm using correntropy. The various procedures can provide optimum nonlinear filter solutions, which can be applied online.
One embodiment of the invention is signal processing method. The method can include receiving a signal input and filtering the signal input using a nonlinear correntropy filter. The method further can include generating an output based upon the filtering of the signal input. More particularly, the nonlinear correntropy filter can comprise a nonlinear Wiener filter, a correntropy least mean square (LMS) filter, or a correntropy Newton/LMS filter.
Another embodiment of the invention is a nonlinear filter. The nonlinear filter can include a signal input that receives a signal input from an external signal source. Additionally, the nonlinear filter can include a processing unit that generates a filtered signal output by filtering the signal input using a nonlinear Wiener filter, a correntropy least mean square (LMS) filter, or a correntropy Newton/LMS filter.
Still another embodiment of the invention is a method of constructing a nonlinear correntropy filter. The method can include generating a correntropy statistic based on a kernel function that obeys predetermined Mercer conditions. The method further can include determining a plurality of filter weights based upon the correntropy statistic computed. The plurality of filter weights, moreover, can be computed based on an inverse correntropy matrix, correntropy least mean square (LMS) algorithm or correntropy LMS/Newton algorithm.
There are shown in the drawings, embodiments which are presently preferred, it being understood, however, that the invention is not limited to the precise arrangements and instrumentalities shown.
The correntropy of the random process x(t) at instances t1 and t2 is defined as
V(t1,t2)=E(k(xt
where E[·] is the expected value operator, and k, a is kernel function that obeys the Mercer conditions. The kernel function, k, can be, for example, the Gaussian function:
It will be apparent from the discussion herein that other functions can be used in lieu of the Gaussian function of equation (2). Indeed, the correntropy defined in equation (1) can be based on any other kernel function obeying the Mercer conditions as well. As will be readily appreciated by one of ordinary skill in the art, in accordance with the Mercer conditions k is both symmetric and positive definite.
The correntropy is a positive function that defines a unique reproducing kernel Hilbert space that is especially appropriate for statistical signal processing. According to one aspect of the invention, the samples xi of an input time series are mapped to a nonlinear space by φ(xi), where k(xi,xj)=<φ(x1),φ((xj)>, the brackets denoting the inner product operation. When the Gaussian kernel is utilized, the input signal x(t) is transformed to the surface of a sphere of radius
in kernel space. Therefore, correntropy estimates the average cosine of the angle between two points separated by a lag on the sphere.
Correntropy for discrete, strictly stationary and ergodic random processes can be estimated as
The relationship with information theoretic learning is apparent from the following. The mean of the correntropy estimate of a random process xk over the lag is
which equals the entropy of the random variable x estimated with Parzen windows
where, as will be readily understood by one of ordinary skill in the art, entropy provides a measure of randomness, and the Parzen windows correspond to methods of estimating the probability density function of a random variable.
Wiener Filter Based On Correntropy
Another aspect of the invention is a nonlinear Wiener filter based on the correntropy function already described. According to this aspect of the invention, for an input φ(x(n)) to a Wiener structure, (L+1) being the order of the filter and φ being a function defined such that E[k(xi−xj)]=E[φ(xi), φ(xj)], the following composite vector is generated using L lags of φ(x(n)):
The (L+1) filter weights are given by the following vector:
According to this formulation, the output is
The configuration of the filter, according to this aspect of the invention, follows from the following formulation of the optimization problem: Minimize the mean square error, E{y(n)−d(n)}2 with respect to Ω. Initially,
The optimization solution is determined as follows:
where, V is the correntropy matrix whose ijth element, for i,j=1,2, . . . , L+1, is
E{K(x(n−i+1), x(n−j+1))}.
Moreover, assuming ergodicity, the expected value E{.} can be approximated by the time average. Accordingly,
where V−1 represents the inverse of the correntropy matrix and N is the number of samples in the window of calculation. The output, therefore, is
where aij is the ijth element of V−1 the final expression is obtained by approximating {φ(n−i)φ(k−j)} by K(x(n−i),x(k−j)), which holds good on an average sense. Equation 10 shows the calculation that needs to be done to compute the Wiener filter based on correntropy.
This solution effectively produces a nonlinear filter in the original space due to the mapping to the surface of infinite dimensional sphere, although the solution can still be analytically computed in the tangent bundle of the sphere. This aspect of the invention provides a significant advance over the conventional Wiener filter.
The least mean squares filter can be derived by using the stochastic version of cost function (obtained by dropping the expected value operator in J(Ω)=E{ΩTΦ(n)−d(n)}2 above) resulting in Ĵ(Ω)={ΩTΦ(n)−d(n)}2 and the gradient given by
∇{circumflex over (J)}(Ω)=−e(n)Φ(n), (11)
where e(n)=d(n)−ΩTΦ(n) is the instantaneous error at time n.
Since the cost function is being minimized, the method of gradient descent is applied using the stochastic gradient (11). Thus the updated weight at each instant n is given by,
Ωn=Ωn−1+ηe(n)Φ(n). (12)
From (12) it follows that ΩN is related to the initialization Ω0 such that
With Ω0=0, the output at n is given by
where
is approximated by
It further follows from linear filtering theory that (14), which gives the correntropy least mean square (CLMS) filter, converges to minimize the original mean square error. According to the invention, the step size is chosen according the trade-off between the speed of convergence and the final excess mean square error or mis-adjustment. The solution (14) does not require regularization as employed in conventional methods that use kernels. Accordingly, the procedure provides a solution that is automatically regularized by the process of using the previous errors to estimate the next output.
A better trade-off of misadjustment versus speed of convergence can be obtained at the expense of extra computation by incorporating a covariance matrix in the formulation (14). This results in
where aij is the ijth element of V−1. The solution (15) is termed the correntropy Newton/LMS (CN/LMS)filter.
It is to be noted at this juncture that the above-described techniques introduce an extra user-determined free parameter. The extra parameter to be determined by the user is the size of the Gaussian kernel that is used in the transformation to the sphere. It effectively controls the curvature of the infinitely dimensional sphere, and it affects the performance. There are three ways to set this free parameter: (1) given knowledge of the signal statistics, the user can apply Silverman's rule, known to those of ordinary skill in the art, applying the rule to set the kernel size as in a density estimation; (2) the user can employ maximum likelihood estimation in the joint space; or (3) the user can adaptively determine the free parameter using the LMS algorithm. Another degree of freedom is the choice of the kernel function. Although this invention does not specify the mechanisms of its choice, the mathematics of Mercer Theorem provide an inclusion of the invention to any such kernels.
According to a particular embodiment, the correntropy filter output y(n)=ΦT(n)Ω is computed by averaging over the data set the product of the desired signal samples with the Gaussian kernel of the input at the defined lags and weighted by the corresponding entries of the inverse of the correntropy matrix, as explained in equation 11.
Still referring to
A particular application of the nonlinear correntropy-based filter is identification of a model representing an unknown plant. A system 200 for determining an identification is schematically illustrated in
Another application of the nonlinear correntropy-based filter is that of inverse modeling of an unknown “noisy” plant, as will be readily understood by one of ordinary skill in the art. A system 300 for providing an inverse model is schematically illustrated in
The filtered output is supplied to a summer 308 along with the system input, the latter being delayed by the delay 304 interposed between the system input and the summer. The summer 308 generates an error based on the difference between the filtered output and the delayed system input. The nonlinear correntropy-based filter adaptively responds to the resulting error term, through the illustrated feedback. The supply of system input and corresponding adaptation repeat until the error meets a predefined criterion.
Yet another application of the nonlinear correntropy-based filter is prediction.
If the system is used a predictor, then the output of the system (system output 1) is the output of the nonlinear correntropy-based filter 404. If the system is used as a prediction-error filter, then the output of the system (system output 2) is the difference between the random signal and the output of the nonlinear correntropy-based filter 404, both of which are supplied to the summer 406.
Still another application of the nonlinear correntropy-based filter is interference cancellation. A system 500 using a nonlinear correntropy-based filter 502 is schematically illustrated in
According to one embodiment of the method 600, the step of generating an output 606 comprises generating a prediction of a random signal, the prediction being a best prediction based upon a predetermined criterion. Moreover, the prediction can comprise an estimation of an error, whereby the error is based on a difference between an output generated by a system in response to the signal input and a predefined desired system output.
According to another embodiment of the method, the step of generating an output 606 comprises generating an identification of a nonlinear system. Alternatively, the step of generating an output 606 can comprise generating an inverse model representing a best fit to a noisy plant. According to yet another embodiment, the step of generating an output 606 comprises generating an inverse model representing a best fit to a noisy plant.
Yet another method aspect of the invention, is a method of generating a nonlinear function. The method, more particularly, comprises generating a correntropy function as already described and computing an expected value of the correntropy function. The method further includes generating a nonlinear function for which the expected value of the pairwise product of data evaluations is equal to the expected value of the correntropy function.
As noted herein, the invention can be realized in hardware, software, or a combination of hardware and software. The invention, moreover, can be realized in a centralized fashion in one computer system, or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system or other apparatus adapted for carrying out the methods described herein is suited. As also noted herein, a typical combination of hardware and software can be a general purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein.
The invention also can be embedded in a machine-readable storage medium or other computer-program product, which comprises all the features enabling the implementation of the methods described herein, and which when loaded in a computer system is able to carry out these methods. Computer program in the present context means any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following: a) conversion to another language, code or notation; b) reproduction in a different material form.
This invention can be embodied in other forms without departing from the spirit or essential attributes thereof. Accordingly, reference should be made to the following claims, rather than to the foregoing specification, as indicating the scope of the invention.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US06/60397 | 10/31/2006 | WO | 00 | 4/30/2008 |
Number | Date | Country | |
---|---|---|---|
60731747 | Oct 2005 | US |