The present invention relates generally to statistical inference of real-world data and more specifically to estimating parameters of an evolution model using neural networks.
In order to obtain insights into real-world sequential data, the task is often to prescribe a descriptive parameterized evolution model and estimate the parameters-of-interest (PoI) using the data. Uncertainty quantification (UQ), in turn, provides insight into the accuracy of the parameter estimate. Traditional models for the uncertainty of parameter estimates are often crafted using the prescribed evolution model.
In some embodiments of the invention, a method for estimating parameters of an evolution model includes identifying an evolution model; obtaining a set of training data values, where each value in the set is associated with a parameter-of-interest (PoI) associated with the evolution model; obtaining a noise model representing noise affecting the output of the evolution model; obtaining a prior model that represents prior information on characteristics of the parameter-of-interest; constructing a loss function for a neural network, where the loss function incorporates the set of training data values, the evolution model, the noise model, and the prior model; and training the neural network with the loss function to obtain updated weights.
In additional embodiments of the invention, the loss function is
In further embodiments of the invention, the neural network is a recurrent neural network.
In some embodiments of the invention, inputting an observed sequence of data into the trained neural network to obtain statistics of a posterior model.
In several embodiments of the invention, the statistics of a posterior model include mean and covariance of a Gaussian distribution.
In still further embodiments of the invention, the identified evolution model utilizes a log-spectral amplitude expressed as:
In yet more embodiments of the invention, the identified evolution model utilizes an optimally modified log-spectral amplitude expressed as:
Still more embodiments of the invention include inputting a noisy signal short-time spectral amplitude (STSA) to the neural network; and obtaining as output an estimate of minimum mean squared estimate spectral gain.
In still further embodiments, the noise model is expressed as {tilde over (y)}t˜(ηxt, ηxt).
The advent of machine learning has brought upon a paradigm shift where the information contained in datasets are employed to augment traditional evolution models for the uncertainty of parameter estimates. Sequential neural networks in accordance with embodiments of the invention can take in a set of sequential data of any length and an evolution model as input, then yield estimates of parameters-of-interest (PoI) and an accompanying uncertainty quantification (UQ) of the estimates. Both these quantities of PoI and UQ may be informed by the dataset as well as the evolution model and a model of prior information. This can be achieved during the training process of a neural network by incorporating the dataset and evolution model into the loss function so that the information they contain can be embedded into the neural network weights. In some embodiments of the invention, the sequential data is a time dependent series.
Training a sequential neural network in many embodiments of the invention utilizes a loss function that facilitates the training of variable-length sequential neural networks using datasets that include variable-length sequential data. This loss function is motivated by the Bayesian paradigm of uncertainty quantification (UQ). A characteristic of the Bayesian framework is that it allows the leverage of prior information to perform UQ. Therefore, along with the dataset and prescribed evolution model, the inclusion of a prior model in the loss function ensures that the sequential neural network is also exposed to prior information. A schematic of a loss function in accordance with embodiments of the invention is conceptually illustrated in
Some embodiments provide a method of training a neural network that can determine an estimate of a parameter of interest (PoI) and an indication of uncertainty quantification (UQ) for the estimate of the PoI. In several embodiments the neural network is provided a training dataset of sequential data, with data of the dataset paired with PoI in accordance with an evolution model. In some embodiments a loss function represents a distance between the PoI in accordance with the evolution model and an output of the neural network. In some embodiments the loss function includes components a) indicative of signal noise in input of the neural network, b) indicative of a difference between a noisy signal and a result of application of a parameter of interest to the evolution model, and/or c) indicative of distance between outputs of the neural network and prior knowledge of the outputs quantified as a statistical distribution. In several embodiments results of evaluation of the loss function are used in modifying weights of the neural network.
In some embodiments the method includes inputting an observed dataset of sequential data to a trained neural network, data of the sequential data paired with PoI in accordance with an evolution model, with outputs of the neural network including estimates of the PoI and UQ for the estimates of the PoI. In several embodiments results of evaluation of a loss function, such as described above, are used in modifying weights of the neural network to train it.
In many embodiments, the neural network is housed or provided by a computer architecture. In some embodiments the computer architecture is a server-based architecture, while in other embodiments the computer architecture is a microcontroller. In some embodiments the process is performed by one or more processors configured by program instructions. In some embodiments the process is performed by circuitry of an application specific circuit (ASIC). In some embodiments the neural network is used in conjunction with or as part of a control system, with the control system modulating control based, in part, on UQ estimated or determined by the neural network.
The discussions here utilize the following notation:
Let :
k×
h→
n denote a prescribed evolution model that maps from the parameter space
k to the space of sequential data
n. Additionally,
h refers to the space of indexing quantities.
Let yn=[y1, y2, . . . , yn]T∈n denote a sequence of length n where yt∈
denotes a point in the sequence.
Let uk=[u1, u2, . . . , uk]T∈k denote a PoI of the prescribed evolution model where ut∈
denotes a real-valued parameter-of-interest.
Let {yn,(m), uk,(m)}m=0M denote a dataset of sequential data paired with their associated PoI. Note that k, n∈ are not necessarily fixed; the dataset can contain sequences of varying lengths with each sequence paired with PoI of varying number of parameters. In processes such as those discussed further below, this may be used as a training dataset.
Let Ψ denote a sequential neural network and W its weights.
Let μ and Γ denote mean and covariance respectively of a Gaussian distribution (μ, Γ).
Let μpostk and Γpostk denote the posterior mean and posterior covariance which are the outputs of the sequential neural network. In the field of UQ, the posterior mean is often considered to be the PoI estimate and the posterior covariance is the quantified uncertainty of the estimate.
Let (μvn,(m), (Γvn,(m))−1) and
(μprk,(m), (Γprk,(m))−1) denote the respective noise and prior model associated with the pair (yn,(m), uk,(m)) of the dataset.
Let α denote a tunable parameter.
The loss function for training a sequential neural network in accordance with several embodiments of the invention is as follows:
Now suppose some sequential data yobsn is observed. The posterior model p(uk|yobsn)=(μpostk, Γpostk) of the PoI of the prescribed evolution model can be formed using the trained sequential neural network. Specifically, this involves modeling the statistics μpostk, Γpostk of the posterior model which, in turn, can be considered as forming an estimate μpostk of the PoI and quantifying the uncertainty with Γpostk. In some embodiments, the posterior model can be considered to be a truncated Gaussian distribution for cases where the space of possible values for PoI is bounded.
Many embodiments of the invention include training a neural network that can determine an estimate of a parameter of interest (PoI) and an indication of uncertainty quantification (UQ) for the estimate of the PoI, which may utilize noise and prior models and a loss function as discussed above. In several embodiments, the training can be thought of as an offline phase (i.e., which may be performed in advance and/or where the result is not expected in real-time). A process for training a neural network in accordance with embodiments of the invention is illustrated in
The process 200 includes selecting or creating an evolution model and obtaining (202) a set of data paired with PoI associated with the evolution model. In several embodiments of the invention, this includes collecting M instances of sequential data yn,(m) paired with PoI uk,(m) associated via the prescribed evolution model . Together, these form the training dataset {yn,(m), uk,(m)}m=0M to be used for training a sequential network. Any of a number of methods may be used to construct this dataset. For example, experiments or simulation studies can be performed that represent the action of the prescribed evolution model
on selected PoI values uk,(m) in order to obtain corresponding sequential data yn,(m).
The process includes constructing (204) a noise model (μvn,(m), Γvn,(m)) representing noise afflicting the output of the prescribed evolution model to yield observed sequential data.
The process includes constructing (206) a prior model (μprk,(m), Γprk,(m)) that quantitatively represents any prior information on the characteristics of the PoI.
The process includes selecting a sequential neural network architecture Ψ. This could be, for example, a recurrent neural network (RNN), long short-term memory network (LSTM) or transformer.
The process includes constructing (208) a loss function including the set of training data values, the evolution model, the noise model, and/or the prior model, and optimizing the neural network weights W by training the neural network using the loss function. In several embodiments of the invention, the loss function includes the components (1), (2), and (3) discussed further above.
Given a trained neural network, uncertainty quantification (UQ) can be performed by inputting an observed sequence of data into the trained sequential neural network to obtain the statistics of the posterior model. That is:
with W* the trained weights resulting from the optimization problem (3). Although a specific process is discussed above, one skilled in the art will recognize that variations may be utilized in accordance with embodiments of the invention as appropriate to a particular application. Descriptions of techniques for forming noise models and prior models that may be utilized in accordance with embodiments of the invention can be found in “Solving Bayesian Inverse Problems via Variational Autoencoders” by Goh et al.,
Consider the following example where:
The prescribed evolution model is :
3×
n→
n such that
(α, β, t)=αe−βt where t=[t1, t2, . . . , tn].
The PoI is of fixed length k=2 where u2=[α, β]T. The superscript k=2 will henceforth be omitted for brevity.
The dataset consists of M=50000 samples {u(m)}m=150000, the distributions which are illustrated in the histograms shown in
The corresponding paired values {yn,(m),u(m)}m=150000 are obtained by mapping {u(m)}m=150000 through the prescribed evolution model .
The noise model (μvn,(m),Γvn,(m)) arises from a Gaussian approximation to the Poisson distribution which, in turn, represents a multiplicative noise assumption.
The prior model (μprk,(m),Γprk,(m)) is constructed from fixed values that are selected to represent the dataset distribution displayed in
A neural network is trained by optimizing (3) with the above components. UQ is performed on three examples of sequential data. In each example, the uncertainty estimates are displayed using
with j∈{1,2} where j=1 corresponds to α and j=2 corresponds to β. Here, σrel,j refers to the relative standard deviation:
The first sequential data represents an expected case where yobs(t)=20000e−6t. Here, both α=20000 and β=6 lie within the dataset distribution displayed in
The second sequential data represents an outlier case where yobs(t)=20000e−50t. Here, β=50 lies outside the dataset distribution displayed in
The third sequential data represents a case where the prescribed model is ‘incorrect’ as it is not the correct underlying model of the observed sequential data. That is yobs(t)=10000t which is a linear function and not an exponential. The results are displayed in
Signal denoising is a prevalent challenge in various industrial applications. As such, there are many approaches to removing noise. One approach is to frame denoising as an optimization problem that works to obtain filter coefficients which can be used construct enhanced signals from observed noisy signals. Additionally, this optimization problem can be tailored to target specific components of observed noisy signals in order to obtain filters coefficients that can recover components used to construct the enhanced signal.
One such component is the short-time spectral amplitude (STSA) of the short-time fourier transform (STFT) of a time-dependent signal. The key departure of spectral amplitude estimation from other Fourier based signal processing techniques is the assumption that the phase distortion can be ignored. Therefore, after performing the Fourier transform on a noisy signal, the complex values are decoupled into their amplitude and phase. From here, the amplitudes of the clean signal are estimated from the amplitudes of the noisy signal and the phase is later re-introduced before an inverse Fourier transform is applied to reconstruct the signal.
A main short-coming of traditional spectral estimation methods is the non-linear processing distortions caused by the random variations of the noise spectrum. Specifically, the non-linear mapping of the negative or small-valued spectral estimates. Bayesian spectral amplitude estimation methods in accordance with embodiments of the invention offer substantial performance improvements on spectral subtraction by utilizing the probability density functions of the signal and noise process.
Bayesian spectral amplitude estimation may be translated into a loss function that is used to train an artificially intelligent noise filter in accordance with several embodiments of the invention.
Let ŷ(k), ŝ(k), {circumflex over (v)}(k) denote the random variable representation of the kth spectral coefficient of the noisy signal, clean signal and noise respectively. Apply the short-time Fourier transform (STFT) to obtain ŷl(k), ŝl(k), {circumflex over (v)}l(k) where k denotes the frequency index of the discrete Fourier transform (DFT) and l denotes the frame index. With this, ŷl(k) is referred to as the kth STFT coefficient of the signal for the Ith frame. Assuming the additive noise model, in the STFT domain this becomes:
However, as most of this theory can be discussed within a context of a specific frame the subscript l may be omitted. Also, since the DFT is used which produces a finite vector of frequencies, the kth frequency can be denoted with the subscript k to represent the entry of a vector instead of the argument (k).
In the Bayesian estimation approach for signal denoising, the value of the clean signal spectrum ŝ is estimated as a function of the noisy signal spectrum ŷ. To this end, define a distance metric or loss function J between the clean signal ŷ and its estimator {circumflex over (f)} and endeavour to minimize the expectation of this distance:
where pŝ
In some embodiments, it may be more useful to instead consider the logarithmic distance between the noisy and clean signals. For example, when the signal is an audio signal and the application involves human speech recognition, since the human ear perceives volume in a logarithmic manner, consider the metric function
Now in order to obtain a closed form, Gaussian assumptions can be employed. Specifically, assume that all random vectors are circular symmetric complex Gaussian vectors. Denote the spectral amplitudes of the noisy signal, clean signal, filtered signal and noise as respectively so that:
The conditional mean of this is the log-spectral amplitude (LSA):
An extension of this is the optimally modified log-spectral amplitude (OMLSA) which involves the following modification:
where pk∈[0,1] is the speech presence probability and the value gmin∈(0, ∞) represents the lowest boundary threshold.
A term underlying (17) and (21) is the minimum mean squared estimate spectral gain:
Indeed, it can be shown that γk, vk and pk can be approximated with respect to gMMSE:
With this in mind, estimates of the gMMSE may be formed from noisy signal STSA k which can, in turn, be used to construct enhanced signals. Most traditional estimation methods are entirely model driven. However, utilizing machine learning, the information contained in datasets can be employed in estimation techniques. The advantage of this is that more ‘personalized’ filters can be crafted that target specific applications. Accordingly, a neural network can be trained that takes in the noisy signal STSA
k as input to yield an estimation of the corresponding gMMSE,k.
Applying the neural network training discussed further above, the STSA of audio signals can be considered the sequential data. The evolution model could be the LSA (17) or the OMLSA (21). The parameter-of-interest u=gMMSE. Noise model can be the sample statistics of the STSA of audio signals representing background noise. the prior model can be the sample statistics of the parameter-of-interest.
In additional embodiments of the invention, more modifications may be applied. For example, in some embodiments, a truncated Gaussian distribution is used in place of (5) to accommodate for the parameter-of-interest being bound in (0,1).
Aerosol physics can be incorporated into the AIUQ loss function in order to better perform uncertainty quantification for aerosol related use-cases. Specifically, aerosol measurement systems that employ particle counts naturally follow a Poisson distribution. In turn, Poisson distributions can be approximated by Gaussian solutions in the cases where particle counts reach sufficiently large values. Specifically, denoting aerosol particle counts by y and x the true particle quantities, observations yt∈ can be computed by
where {tilde over (y)}t is drawn by
Here, the hyperparameter η>0 to simulate the noise in the data. Specifically, higher values of η represents lower values of noise. With this, the Gaussian approximation to the Poisson distribution is given by:
which can be used as the noise model of the AIUQ loss function in some embodiments of the invention. Thus, an AIUQ trained neural network can be made aware of the governing aerosol physics. The remaining components, such as the evolution model and the parameter-of-interest, may depend on the application at hand.
At the most basic level, control systems involve feedback loops which are tasked with perpetually forming estimates of the quantity of interest and then performing some action based on the estimate. The appeal of control systems lies in the automation of reactive tasks. The issue here is that, by design, control systems are required to act on estimates even if the estimates are performed on outlier events. It would therefore be beneficial to employ an outlier detection strategy to ensure that the reactive task is only performed, or only performed to its fullest extent, in response to ideal situations. Otherwise, feedback loops can be irrecoverably disturbed from a desired equilibrium state of operation. A common quantity used to inform outlier detection mechanisms is UQ. The UQ method discussed herein can perform UQ using information from both data and models.
In most cases, UQ is a computationally demanding task. Broadly speaking, there are two main categories of UQ methods: sampling and optimization. Sampling methods are often inaccurate for large dimensional problems and so often only have efficacy for small problems with a restricted number of Pols. In contrast, although optimization methods preserve accuracy for scenarios with a large number of Pols, they generally require gradient computation which can be expensive. Additionally, both methods are usually iterative and therefore potentially time-consuming and computationally expensive.
For the UQ approach here, although the optimization of the loss function (3) may be time-consuming, this can represent an offline phase. The goal is to provide components to (3) that are descriptive enough to train a neural network capable of performing desirably in all reasonable variations of the scenario it is to operate in. Achieving this, estimation and UQ (which represents the online phase) generally only requires only the propagation of the observed sequential data through the trained neural network. Since this is essentially a series of linear algebraic operations, the propagation through a neural network is mostly unrivalled in speed and computational cost. Thus, with these methods, estimation and UQ on devices as small as microcontrollers may be performed in some embodiments; therefore introducing UQ to internet-of-things related applications. Computational efficiency can be furthered by compressing neural networks via quantization of the weights. Quantization of weights refers to a process of reducing the precision of the weights in a neural network (e.g., from 32-bit floating-point to 8-bit integers) to decrease memory usage and computation while maintaining acceptable model accuracy.
Although the description above contains many specificities, these should not be construed as limiting the scope of the invention but as merely providing illustrations of some of the presently preferred embodiments of the invention. Various other embodiments are possible within its scope. Accordingly, the scope of the invention should be determined not by the embodiments illustrated, but by the appended claims and their equivalents.
The current application claims priority to U.S. Provisional Application No. 63/582,473 entitled “UNCERTAINTY AWARE TRAINING METHOD FOR ARTIFICIALLY INTELLIGENT NOISE FILTER,” filed Sep. 13, 2023, and to U.S. Provisional Application No. 63/582,471 entitled “ARTIFICIALLY INTELLIGENT UNCERTAINTY QUANTIFICATION FOR ESTIMATES OF EVOLUTION MODEL PARAMETERS,” filed Sep. 13, 2023, the disclosures of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
63582473 | Sep 2023 | US | |
63582471 | Sep 2023 | US |