Computational science and engineering have enabled researchers to model complex physical processes in many disciplines, climate projection, subsurface flow and reactive transport, seismic wave propagation, and power grid planning. However, inherent uncertainty in these complex physical processes makes the underlying problems essentially stochastic in nature. As a result, the current techniques for generating models result in models that are often not sufficiently accurate to take actions based on these models
The accuracy of models depends in large part on the accuracy of the key parameters for the models. The consequences of inaccurate value for key parameters can lead to dire consequences. For example, in the area of power generation, accurate modeling of generator dynamic is of paramount importance to power system dynamic security analysis. Indeed, inaccurate parameter values may lead to inappropriate control actions, which in turn may result in power system failures. In 1996, a widespread blackout in the Western Electricity Coordinating Council (WECC) system was caused by severe model discrepancy. Because of adverse consequences of inaccurate models, the North American Electric Reliability Corporation (NERC) has issued reports that require the validation and the calibration of the models of large generators and their controllers in North America every five years, aiming to have the dynamic model responses match the recorded measurements reasonably well.
The traditional offline stage-testing based methods that are used to generated key parameter values have problems in that they are very costly, time-consuming, and labor intensive. To overcome these problems, online low-cost phasor-measurement-unit (PMU)-based methods have been developed. Some of these PMU-based methods estimate the generator moment of inertia, while others estimate other generator parameters using the Kalman filtering approach. Unfortunately, the former methods calculate only point estimated values of the inertia parameters without providing their confidence intervals, while the latter methods suffer from the weaknesses of Kalman filters such as the Gaussian assumption for the process and measurement noise and the slow convergence rate. Unlike the above methods, Bayesian-inference based methods provide maximum-a-posteriori (MAP) parameter estimated values along with the corresponding posterior probability distributions. Furthermore, they are not limited to Gaussian assumptions. However, the posterior distribution in the Bayesian inference is typically obtained by Markov Chain Monte Carlo (MCMC)-based methods, which require prohibitive computing times. To reduce computing times, some methods keep the approximated Gaussian assumption while taking the local optimization approach. Unfortunately, with these methods, the local optima may be highly biased when the initial guess is far from the true value or the posterior distribution is non-Gaussian.
Other approaches for improving the accuracy of key parameter values to improve model predictability is to apply uncertainty quantification requires solving an inverse problem by combining prior knowledge, simulations, and experimental observations. One example approach for performing the inversion is the “variational data assimilation” method, which is analogous to the deterministic nonlinear least square methods. However, this method by its deterministic nature cannot produce solutions with complete description of their probabilistic density functions (PDFs), which are critical for predicting the system performance. Unlike deterministic inversion, stochastic inversion aims to provide more complete description of the PDFs. Bayesian inference, in particular, provides a systematic framework for integrating prior knowledge of the stochastic sources and measurement uncertainties to compute detailed posteriors. However, Bayesian inference of high-dimensional and nonlinearly correlated stochastic source is computationally intractable. Moreover, unreasonable choices of source priors may have major effects on posterior inferences. Other conventional approaches are limited in their applications because they often assume Gaussian priors and posteriors, linearly correlated source fields, and/or no parametric and model form uncertainties in the simulation models. These deficiencies may result in unreliable solutions that may result in taking of inappropriate actions.
Techniques, systems, and devices are disclosed for providing a computational frame for estimating high-dimensional stochastic behaviors. In particular, the disclosed techniques can be implemented in various embodiments to perform efficient and robust statistical estimation for high-dimensional and nonlinearly correlated sources and with non-standard probability distributions.
In one exemplary aspect, a method for performing numerical estimation is disclosed. The method includes receiving a set of measurements of a stochastic behavior. The set of correlated measurements follows a non-standard probability distribution and is non-linearly correlated. Also, a non-linear relationship exists between a set of system variables that describes the stochastic behavior and a corresponding set of measurements. The method includes determining, based on the set of measurements, a numerical model of the stochastic behavior. The numerical model comprises a feature space comprising non-correlated features corresponding to the stochastic behavior. The non-correlated features have a dimensionality of M and the set of measurements has a dimensionality of N, M being smaller than N. The method includes generating a set of approximated system variables corresponding to the set of measurements based on the numerical model.
In another exemplary aspect, a system for performing numerical estimation comprises a plurality of sensors configured to collect a set of measurements of a stochastic behavior. The set of correlated measurements follows a non-standard probability distribution and is non-linearly correlated. Also, a non-linear relationship exists between a set of system variables that describes the stochastic behavior and a corresponding set of measurements. The system includes a processor coupled to the plurality of sensors and a memory including processor executable code. The processor executable code upon execution by the processor configures the processor to determine, based on the set of measurements, a numerical model of the stochastic behavior. The numerical model comprises a feature space comprising non-correlated features corresponding to the stochastic behavior. The non-correlated features have a dimensionality of M and the set of measurements has a dimensionality of N, M being smaller than N. The processor is also configured to generate a set of approximated system variables according to the set of measurements based on the numerical model.
In yet another exemplary aspect, a non-transitory storage medium having code stored thereon is disclosed. The code, upon execution by a processor, causes the processor to implement a numerical estimation method that comprises determining, based on the set of measurements, a numerical model of the stochastic behavior. The numerical model comprises a feature space comprising non-correlated features corresponding to the stochastic behavior. The non-correlated features have a dimensionality of M and the set of measurements has a dimensionality of N, M being smaller than N. the method includes adjusting the feature space such that the non-correlated features follow a standard probability distribution, expanding the feature space having the dimensionality of M to the dimensionality of N, and sampling the expanded feature space based on a statistical inference approach to determine a set of approximated system variables corresponding to the set of measurements.
The above and other aspects and features of the disclosed technology are described in greater detail in the drawings, the description and the claims.
Techniques are provided to generate models of physical processes such as model for power generation. The techniques can be implemented in various embodiments to perform efficient stochastic inversion for problems with high-dimensional and nonlinearly correlated sources and with non-standard probability distributions. The techniques can also be used to manage uncertainties in simulation models and measurement data. More specifically, the techniques can perform (1) nonlinear dimension reduction of high-dimensional, non-standard, and/or nonlinearly correlated stochastic sources, (2) fast forward uncertainty propagation (e.g., through ensemble model simulations), (3) backward uncertainty propagation (e.g., via fast adjoint-based Bayesian inversion algorithms), and (4) high-dimensional source recovery via nonlinear optimization.
The relationship between measurements z and state variables x of many of the systems to be modeled can be formulated as:
z=h(x)+e Eq. (1)
where z is the M-dimensional vector of measurements, x is the N-dimensional vector of state variables, h is the non-linear function that relates x and z, and e is the vector of measurement errors. Given a limited number of measurements and non-linearities in the measurement model h, solving the inverse problem is mathematically ill-posed. Furthermore, measurements errors typically follow non-Gaussian distributions as opposed to Gaussian distributions as assumed by many conventional techniques, thereby leading to non-Gaussian posterior states.
Using Bayesian inference, a framework for integrating prior knowledge and measurement uncertainties can be defined as:
πpost(x):=π(x|z)∝πprior(x)πlik(z|x) Eq.(2)
where 90prior(x) is a prior state model based on expert knowledge, πlik (x) is the likelihood function, a conditional probability of the measurements given the states x, and πpost(x) is posterior state model.
In some embodiments, the feature space is determined based on a statistical transformation that converts a first set of correlated values into a second set of uncorrelated values having a lower dimensionality. In some embodiments, the statistical transformation may employ a kernel principal component analysis, a Karhunen-Loeve expansion, or a manifold-learning based Isomap construction.
In some embodiments, the non-correlated features in the feature space follow a non-standard probability distribution. The techniques may, prior to generating the set of approximated system variables, map the feature space to a set of variables representing the numerical model. The set of variables follows a standard probability distribution.
In some embodiments, the techniques determine set of variables that follows the standard probability distribution based on a response surface approach that may employ a polynomial chaos expansion.
In some embodiments, the techniques generate the set of approximated system variables by sampling the numerical model based on a statistical inference approach. The statistical inference approach may be a Metropolis-Hasting Markov Chain Monte Carlo (MCMC) approach and a Langevin MCMC approach. The system inference approach may alternatively be a combination of a Langevin MCMC approach and an adaptive MCMC that calibrates a covariance of a distribution function based on a history of a Markov Chain.
In some embodiments, the techniques determine a second numerical model of the stochastic behavior. The second numerical model has a space of the same dimensionality of N. The stochastic behavior is associated with one or more target domains. The set of approximated system variables is generated by sampling the numerical model using a first statistical inference approach and reevaluating a subset of samples in the one or more target domains based on the second numerical using a second statistical inference approach. The one or more target domains may correspond to one or more failure events with corresponding probabilities of occurrences.
In some embodiments, the techniques determine a set of weighted measurements based on evaluating weights for the set of measurements. The techniques determine the numerical model of the stochastic behavior based on the set of weighted measurements. The weights may be evaluated iteratively for the set of measurements.
In some embodiments, the techniques generate the set of approximated system variables by expanding the non-correlated features having the dimensionality of M to the set of approximated system variables having the dimensionality of N. The standard probably distribution may be at least a Gaussian distribution or a Beta distribution. The techniques may update the numerical model based on the set of approximated system variables for a subsequent estimation.
In some embodiments, the stochastic behavior is the behavior of a power distribution system that includes a renewable energy system.
Details of the disclosed techniques are further described in the following example embodiments. It is noted that section headings are used in the present document only to improve readability and do not limit scope of the disclosed embodiments and techniques in each section to only that section.
An electric power grid is an interconnected network for delivering electricity from generators to loads by transmission and distribution systems.
In sensor and measurement fields, the synchrophasor is one of the most important smart grid technologies. A synchrophasor system 310, such as shown in FIG. 3, includes phasor measurement units (PMUs), phasor data concentrators (PDCs), and communication networks. The PMUs are used to produce synchrophasor measurements from current and voltage signals (e.g., the ones from current and voltage transducers) and a standard time signal (e.g., the one from a global positioning system (GPS)). Then the PDCs are used to transfer synchrophasor data from PMUs/PDCs to a control center and/or various applications.
In practice, the synchrophasor inevitably involves measurement errors, which may affect or even disable certain synchrophasor applications. It is a demanding task to analyze and model the synchrophasor measurement error. Traditionally, the synchrophasor is designed for transmission systems and the synchrophasor measurement error is assumed as a Gaussian noise in most synchrophasor applications. However, real PMU measurement errors do not follow a Gaussian distribution.
Furthermore, compared to transmission systems, distribution systems are invested with fewer sensors and measurements, and distribution networks are often plagued by large measurement uncertainties due to their highly distributed and diverse infrastructure. The current distribution system algorithms mainly derive from the transmission systems and inherit the Gaussian/linearity assumptions. However, these assumptions are violated especially in the uncertain and diverse distribution network. Furthermore, the existing techniques view the power grid as an over determined system, which ignore the measurement possibilities and model uncertainties and only work out a group of deterministic estimates. From a statistical point of view, the deterministic state estimates ignore some valuable information for the distribution systems with dynamic and diverse nature. The disclosed techniques can be used to implement an advanced state estimation system that can overcome such limitations.
Referring back to the Bayesian inference framework in Eq. (2), the measurement error can be assumed to be a Gaussian mixture (GM) model having a PDF as weighted sum of the Gaussians:
p(e|γ)=Σi=1N
where Nc is the number of Gaussian mixtures and ωi is a positive quantity with Σωi=1, and γ is the set of parameters such as mean μi, covariance Σi, and ωi that describe each component of the GM. p(e|μi, ρi) is M-dimensional Gaussian density function of each mixture component given by:
The set of parameters of the GM, γ, is computed from the measurements via expectation-maximization (EM) algorithm. Given the nonlinear mapping between the states and measurements and GM model for the error, the posterior states follow non-standard distributions and Markov chain Monte Carlo (MCMC) methods can be relevant techniques for posterior sampling.
MCMC methods comprise a class of statistical inference algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain a sample of the desired distribution by recording states from the chain. Through recent developments, such as delayed rejection, adaptive Metropolis, etc., the MCMC methods have become sophisticated and computationally expensive, especially for problems that have a high-dimensional parameter space. However, with the increasing sophistication, the MCMC methods can also be computationally intensive given the expensive measurement model and high-dimensionality of the states. One way to reduce the computational complexity is to construct a low-fidelity model. The disclosed techniques can be used to construct a surrogate model (that is, a model having a low dimensionality or low fidelity) of the stochastic behavior by reducing the dimensionality of the states and mapping corresponding feature variables to Gaussian random variables to enable efficient subsequent processing.
As discussed above, the high dimensionality of the discretized parameter space can lead to intractability of the stochastic inversion problem. To efficiently solve the inverse problems that involve high-dimensional input data that does not follow any standardized distribution, a mapping from the original states x to a standardized distribution space (e.g., Gaussian space) with lower dimensionality is needed.
In some embodiments, the mapping can be performed using a kernel principal component analysis (KPCA) method. Let Y(x, ω):=ln(μ(x, ω)) be a random field. The covariance function can be defined as CY(x,y)=<{tilde over (Y)}(x, ω){tilde over (Y)}(y, ω)>ω, where {tilde over (Y)}(x, ω):=Y(x, ω)−<Y(x, ω)>w and <. >ω is an expectation factor. Assume CY is bounded, symmetric and positive definite, it can be represented as
CY(x,y)=Σi=1∞γiei(x)ei(y) Eq. (5)
where γ1≥γ2≥ . . . >0 are eigenvalues, and ei(x) and ei(y) are deterministic and mutually orthogonal functions. The random process
where {ξi(ω)} are zero-mean and uncorrelated variables. Given CY,
Σi=iMωiCY(siy)e(xi)=γe(y) Eq. (7)
where M is the number of sample points. Assuming there are enough samples points and equal weights
Eq. ( 7) can be solved by simple eigen-decomposition of CY(xi, y), for which principal component analysis (PCA) can be used to reduce the dimension. In general, PCA can only obtain efficient embeddings for linearly correlated data points, the KPCA method can deal with complex models that have nonlinearly correlated data. The nonlinear mapping can be seen as a kernel map, thus allowing the handling of the high dimensionality by using a technique called “kernel trick.” A kernel trick introduces a virtual mapping that results in smaller dimensional equivalent system. The kernel trick allows the KPCA to be performed in the high dimensional feature space, with similar computational expense as PCA. An eigen-decomposition can be performed on a relatively small space, which is independent of the selection of the nonlinear mapping and the feature space.
Referring back to
In general, ξd (the discrete observations of ξ) are non-Gaussian, uncorrelated and dependent random variables, which may complicate the Bayesian inversion procedures (e.g., by requiring more frequent sampling from their distributions). Determination of a unique map from the dependent ξd to a standard independent random variable space η can be achieved in various ways. In some embodiments, a response surface-based method, such as a polynomial chaos expansion (PCE), can be constructed to perform the mapping. Each component of {ξld}l=1M can be obtained using PCE as
ξld=Σn=0∞cn,lΨn(ηl(ω)), l=1, 2, Eq. (8)
where ηl are the independent standard Gaussian random variables. Ψn(ηl(ω)) are Hermite polynomials, and cn,1 are real valued deterministic coefficients. The coefficients in the equation above can be computed using Bayesian inference or using a non-intrusive projection method to find a continuous parameterized representation based on the discrete ξd. Let {ηl} be a standard Gaussian random variable, then by matching the cumulative density function (CDF) of ξld and ηl, each component of ξl can be expressed in terms of random variables ηl by following non-linear mapping:
ξld=Fξ
where Fξ
With the mapping of the original states x(ω) to the Gaussian space η(ω), the Bayesian inversion can be performed in the η space and the corresponding posterior can be obtained by:
πpostiror(η):=π(η|z)∝πprior(η)πlik(z|η) Eq. (10)
Unlike deterministic inversions, Eq. (10) provides a probabilistic characterization of the solution to the stochastic inverse problem. In this context, the likelihood function πlik(z|η) is a conditional probability of the model outputs with given model parameters η. Also, the prior probability density function πprior(η) allows the framework to inject prior knowledge to the model.
The prior density function prior can be expressed as a multivariate Gaussian in a form of:
The simplification is due to the independence of the η vector. Specifically, the covariance mastrix Γprior is an identify matrix and
In some embodiments, the likelihood function can be expressed as:
The corresponding posterior density can be derived as
πprior(η)∝ exp(J(η)) Eq. (14)
where J(η) is given by:
where
It is nontrivial to obtain the functional derivative of J(η). In some embodiments, an adjoint model and automatic differentiation to compute the gradients. The relationship between the variables η, ξ, y, μ, λ, u can be summarized as:
where the pre-imaging problem is solved to project snapshots from the feature space back to original space. One method to perform pre-imaging involves solving an optimization problem:
where y and Y are points in the feature space and the original space respectively, and ∥⋅∥ is the Euclidean norm. The above minimization problem can be solved as an iterative fixed-point problem.
The objective function J can be expressed in terms of η by:
η→½(f(η)−uobs, Γnoise−1(f(η)−uobs))+½(η−
The expression for the gradient of J2(η) can be obtained as:
∇ηJ2(η)=Γprior−1(η−
The gradient information of J1(η) can be expressed as
∇ηJ1(η)=(∇ηξ)T(∇ξy)T(∇yλ)T(∇λ, πu)TΓnoise−1(f(η)−uobs) Eq. (18)
The linear operator ∇λ, πu represents the tangent linear model of the forward problem and its adjoint operator is (∇λ, πu)T. The linear operation (∇ηξ)T represents the adjoint model of the PCE, and (∇ξy)T represents the adjoint model of the pre-image iteration mapping.
Due to the non-linear relation between the parameters η and the measurements, direct sampling from the posterior is not possible. MCMC methods provide a systematic way to sample from the corresponding posteriors (also shown as operation 130 in
To accelerate the MCMC method, the forward solver can be replaced with a PCE-based response surface. The generalized polynomial chaos expansion has been shown to be a cost-effective tool in modeling response surfaces. The stochastic outputs are represented as a weighted sum of a given set of orthogonal polynomial chaos basis functions constructed from the probability distribution of the input random variables. Let ξ be a vector of random variables following a standard probability distribution (e.g., the Gaussian or the beta distribution). A unique orthogonal polynomial is associated with ξ. Let Φ denote the corresponding polynomial chaos basis and let a denote the ith polynomial chaos coefficient, we have:
z=Σi=0N,pαiΦi(ξ) Eq. (20)
where Np=(N+P)!/(N!P!)−1, N is the total number of the random variables, and P is the maximum order of the polynomial chaos basis function. From the polynomial chaos coefficients, the mean μ and the various σ2 of the output z can be determined as follows:
μ=a0, σ2=Σi=1Npαi2E[Φi2] Eq. (21)
where E[.] is the expectation operation.
A set of one-dimensional polynomial chaos basis functions with respect to some real positive measure satisfy
where λ is a probability measure defined as cumulative distribution function (CDF) of ξ. For every CDF, the associated orthogonal polynomials are unique. Similarly, any set of multi-dimensional polynomial chaos basis functions is orthogonal to each other with respect to their joint probability measure.
A set of multi-dimensional polynomial chaos basis functions can be constructed as the tensor product of the one-dimensional polynomial chaos basis associated with each input random variable.
Φ(ξ)=Φ(ξ1⊗Φ(ξ2) . . . ⊗Φ(ξN) Eq. (22)
where Φ(ξi) denotes the one-dimensional polynomial chaos basis for the ith random variable.
Collocation points can be regarded as a finite sample of ξ=[ξ1, ξ2, . . . , ξN] that are chosen to approximate the polynomial chaos coefficients. The elements of the collocation points are generated by using the union of the zeros and the roots of one higher-order, one-dimensional polynomial for every random variable.
In the Bayesian inference, the parameters x are viewed as random variables and hence are given prior PDFs. By mapping the parameters x into ξ, a PCE as the response surface of the dynamic power system model can be constructed as follows (also shown as operation 140 in
1. Map the ith random parameter xi to a given random variable as ξi as Pi=Fi−1(T(ξi)), wherein Fi−1 is the inverse cumulative probability distribution function of xi and T is the cumulative probability distribution function of ξi.
2. Construct the polynomial chaos basis, then express the output z in the expansion form.
3. Construction M combinations of collocation points and put them into the polynomial chaos basis (M×(Np+1)) matrix Hpc.
4. Compute the power system dynamic model output for the selected collocation points to get the (M×1) output Z matrix given by
Z=(z(t, ξ1) z(t, ξ2) . . . z(t, ξM))T Eq. (24)
5. Estimate the unknown coefficients A based on the collocation points that are selected and the model output from Z=HpcA. A is the (Np×1) coefficient vector expressed as:
A=(α0(t) α1(t) . . . αNp(t))T Eq. (25)
6. Let  denote the estimated coefficient vector and let us define a residual vector r as r=Z−HpcÂ,  can be estimated by minimizing the 2-norm of the residual vector
which yields Â=(HpcTHpc)−1HpcTZ.
With the coefficients estimated and the bases selected, the PCE can be built for the target output. The system response surface can now be represented as polynomial form.
Tests have been conducted applying the disclosed techniques to power systems under the disturbances caused by one or more faults. Results have demonstrated that the techniques described herein can provide accurate estimation results under a three-phase fault, but with a much smaller computing time compared with conventional techniques. An example algorithm of a response-surface-based Bayesian Inference using the MHMCMC framework is illustrated by the following pseudo code.
indicates data missing or illegible when filed
In some embodiments, a gradient-based Langevin Markov Chain Monte Carlo (LMCMC) can be adopted (e.g., in operation 130 shown in
LMCMC considers the following overdamped Langevin-Ito diffusion process:
dX=∇logπposterior(X)dt+√{square root over (2)}dW Eq. (26)
The probability distribution of ρ(t) of x(t) approaches a stationary distribution, which is invariant under diffusion, and ρ(t) approaches the true posterior (ρ∞=πposterior) asymptotically. Approximate sample paths of the Langevin diffusion can be generated by many discrete-time methods. Using a fixed time step t>0, the above equation can be written as:
X
k+1
=X
k
+t∇logπ(Xk)+√{square root over (2t)}ξk Eq. (27)
Here, each ξk is an independent draw from a multivariate normal distribution with mean 0 and identity covariance matrix. This proposal is accepted or rejected similar to the Metropolis-Hasting algorithm using α:
An example posterior sampling algorithm using the LMCMC framework is illustrated by the following pseudo code:
Introducing the LMCMC under the PCE framework, the original complicated dynamic solver cab be simplified into the polynomial forms at ξ space. This can greatly simplify the derivation of the gradients of the posterior likelihood. Eq. (20) can be re-organized as:
Because of the fast convergence rate of the LMCMC, tests have shown that the PCE-based LMCMC can converge using 5×104 samples with a solution time of 15 seconds. The disclosed techniques herein can achieve a three-orders-of-magnitude speedup factor compared to the traditional Bayesian inference methods. An example algorithm of a polynomial-chaos-expansion-based Bayesian Inference using the LMCMC framework is illustrated by the following pseudo code:
indicates data missing or illegible when filed
To further reduce the large sample size required in conventional MCMC approaches, a two-stage hybrid MCMC algorithm can be used (e.g., in operation 130 shown in
As discussed above, the Langevin dynamics enables the LMCMC to have a better convergence rate than the MHMCMC. This benefit tends to increase when the dimension increases. This is especially the case for the decentralized dynamic parameter estimation problem, where several generator dynamic parameters can influence the dynamic performance of power systems. However, analyzing the gradient information in LMCMC can take extra computing time. Also, the selection of the step size t can influence the search path for the global optimum. Once the LMCMC sampler is approaching the local optima, it may not jump out of the local optimum when t is too small. In contrast, when t is too large, the acceptance rate becomes low. To obtain unbiased results, the sample size must be increased significantly.
Adaptive MCMC (AM) strategy (also referred to as hybrid MCMC) is to create a Gaussian proposal distribution function with a covariance matrix calibrated using the previous samples in the MCMC chain. The key point of the AM adaptation is how the covariance of the proposal distribution function depends on the history of the chain. After an initial nonadaptive period, the Gaussian proposal function is centered at the current position of the Markov Chain, mk, and resets its covariance Ck. This can be expressed as:
where, C0 denotes initial covariance chosen according to a prior knowledge (which may be quite poor), k0 denotes a time index that defines this initial nonadaptive period, sN denotes the scaling parameter, ∈ is a small constant to ensure the non-singularity of Ck, and IN denotes N-dimensional identity matrix. The larger k0 is, the longer it takes for the adaptation to take place. This adaptation mechanism enables the AM algorithm to overcome local optima caused by the strong correlation between the parameters and to choose the appropriate variance values for the proposal function. In some embodiments, the adaptation can be done at given intervals. Some simulation results of the HMS method have been compared with the MC method with 1,000,000 samples of power-flow cases. Compared to the MC method execution that requires approximately 1.5 h to complete all the tests, the HMS method execution requires only about 1 min. The HMS method can also provide equally accurate simulation results even for risk events with a very low probability, e.g., 10−4 or 10−5. An example algorithm that uses the adaptive MCMC framework for parameter estimation is illustrated by the following pseudo code:
indicates data missing or illegible when filed
Although the techniques described above obtain an accurate PCE-based surrogate model, the topological uncertainties in the probabilistic power-flow (PPF) analysis are not fully accounted for. Thus, the PCE-based surrogate model can be inadequate for problems involving model discontinuities, because the construction of the surrogate model is based on the smoothness assumption of the system response. However, when the topology of a power system changes due to random branch outages, system responses tend to have abrupt changes that violate this assumption. This is especially true for the rare events involving high-order contingencies. To address these challenges, the sampling of the surrogate and original system models (e.g., operation 130 in
Every system topology can be deemed as a single element. Thus, multiple elements can be associated with different system topologies. Within every single element, a corresponding surrogate model can be constructed that only considers the nodal power-injection uncertainties. However, the number of possible topologies caused by random branch outages is directly proportional to the number of system branches. Even for a very-small-scale power system, modeling all possible combinations of these topologies can be computationally infeasible. Thus, there is a need to consider the balance between the construction of the surrogate model and the usage of the direct MC method.
Following the power system planning tradition, different system topologies can be classified N-1, N-2, N-3, etc. criteria. Apart from the original system model without branch outages, the contingencies of the N-1 criterion occur more frequently than those of the higher-order criteria, e.g., the N-2 and the N-3 criteria, as shown in Table 1.
For the original system model without any branch outages and for the N-1 criterion, the idea of “multi-element” is maintained and Nbranch+1 surrogate models are constructed separately. For the topologies corresponding to N-2 and higher-order contingencies, the traditional MC method without constructing any surrogate model.
Constructing the PCE-based surrogate model depends on the realizations at a small number of collocation points and accounts for nearly all the computational cost associated with using the surrogate model. Once the surrogate model is constructed, the realizations of a large number of samples through this surrogate model can be computationally negligible. However, if the number of samples that need to be propagated through the surrogate model is very small, e.g., 2, 1, or even 0, then the construction of the surrogate model itself already becomes computationally more expensive than directly using the original system model. In these cases, using direct MC simulations is more cost-effective.
In some embodiments, importance sampling and adaptive importance sampling can be incorporated in the PCE-based Bayesian Inference frame (e.g., in operation 120 as shown in
The Importance Sampling (IS) method includes drawing K independent samples {x(k)}k=1K from the proposal PDF, q(x) which has heavier tails than the target function ir(x). Each sample has an associated importance weight given by (k=1, 2, . . . , K):
The PDF q (x) comes from the Bayesian prior pdf πprior(x), and the target function comes from the Bayesian posterior pdf πpost(x|z). This enables a sample set from πprior(x) to represent parameter uncertainties and evaluate the weight at each parameter
value through
The normalized with can be calculated as
and the full probability distributions of πpost(x|z) can be recovered as:
πpost(x|z)=Σk=1K
Here, δ represents Dirac-delta function. The estimated model parameters via a maximum-a-posteriori (MAP) estimator can be expressed as:
An example algorithm of importance-sampling-based Bayesian Inference is illustrated by the following pseudo code:
An example algorithm of PCE-based Bayesian Inference via Importance Sampling is illustrated by the following pseudo code:
The Adaptive IS (AIS) method is based on an iterative process for gradual evaluation of the proposal functions to accurately approximate the posterior functions. The AIS method includes three steps: (i) generate samples from proposal functions; (ii) calculate weights for samples; and (iii) update the scale and location parameters that define the proposals to obtain the new proposal for further iterations.
More specifically, for parameter estimation problem with N unknown parameters, the AIS algorithm is initialized with a set of N proposals. Each proposal is parameterized by a vector that comes from Bayesian prior pdfs πprior(x). After drawing a set of samples {x(k)}k=1K, the normalized weights can be determined. These weights enable a determination of a discrete probability distribution that approximates the target Bayesian posteriors πpost(x). Then, the parameters of the nth proposal are updated based on πpost(x). This process is repeated until an iterative stopping criterion is satisfied. This updating enables the system to find better Bayesian priors that not only propose more samples in the sample space with high likelihood, but also fine-tune the sample space to construct new surrogate models, thereby improving the fidelity. An example algorithm of multifidelity-surrogate-based Bayesian Inference via AIS is illustrated by the following pseudo code:
The techniques disclosed herein can also be applied to stochastic economic dispatch (SED) for the increasing penetration of renewable energy resources. A SED method based on Gaussian process emulator (GPE) can be implemented for power system. The GPE can evaluate, with a negligible computational cost, the SED solver at sampled values through a nonparametric reduced order representation.
The GPE is known to be a powerful Bayesian-learning-based method based on a nonlinear regression problem. The SED model can be denoted as f(⋅) and its corresponding vector-valued input of p dimensions by x. Due to the randomness of x, n samples can be observed as a finite collection of the input as {x1, . . . , xn}. Accordingly, its model output f(x) also becomes random and has its corresponding n realizations, denoted by {f(x1), . . . , f(xn)}.
Assuming that the model output is a realization of Gaussian process, then the finite collection of the random variables follows a joint multivariate model probability distribution:
where, m(⋅)is the mean function, and k(⋅,⋅) is a kernel function that represents the covariance function. Eq. (34) can be simplified into f (X)|X˜N(m (X), k (X, X)) where X=[x1, x2, . . . , xn]T. Adding the observation noise into f (X) yields Y=f (X)+e, e˜N(0, σ2In) where ln and σ2 are an n-dimensional identify matrix and variance respectively. A similar Bayesian inference can be applied to {Y, X}. A joint distribution of Y and y(x) can be formulated as:
where, K11=k(X, X)+σ2In, K12=k(X, x), K21=k(x, X), and K22=k(x,x)+σ2In. The Bayesian posterior distribution of the system output y(x) conditioned upon the observations (Y, X) follows a Gaussian distribution is given by:
y(x)|x, Y, X˜N(μ(x),Σ(x)) Eq. (36)
μ(x)=m(x)+K21K11−1(Y−m(X)) Eq. (37)
Σ(x)=K22−K21K11−1K12 Eq. (38)
The form of GPE (e.g., operation 120-140 as shown in
The high-dimensional samples can be projected into low-dimensional latent variables using the KLE. Each random variable x(t) is indexed by t. Assuming that x(t) for any time has a finite variance and the covariance function C is defined as C(t,s)=Cov(X(t), X(s)). Because C is positive definite, the spectral decomposition is:
C(t,x)=Σi=1∞λiui(t)ui(s) Eq. (39)
where, λi denotes the ith eigenvalue and ui denotes the ith orthonormal eigenfunction of C. X(t) can be input into the KLE framework as follows:
X(t)=Σi=1∞√{square root over (λi)}ui(t)ξi Eq. (40)
where, ξi are mutually correlated univariate random variables with zero mean and unit variance.
The dimension reduction can also be presented from the variance point of view. Consider the total variance of X(t) over a bounded domain D . Using the orthonormality property of ui (t), we have:
∫Var[X(t)]dt=Σi=1∞λi Eq. (41)
The first p largest eigenvalues can be chosen as {λ1, . . . , λp}, with the corresponding eigenfunctions as {u1, . . . u} such that the obtained ξ contain over 95% of the total variance. Since the calculated p is smaller than the original dimension of the raw data sequence, the KLE has mapped the high-dimensional correlated X to the low-dimensional uncorrelated ξ. Simulation results have shown that an example GPE surrogate, trained with 100 Latin hypercube simulations, can successfully calculate the SED values for the tested system under 8,000 scenarios, which is significantly more computationally efficient than the traditional Monte Carlo method while achieving the desired simulation accuracy. An example algorithm for the GPE-based SED method is illustrated by the following pseudo code:
In some embodiments, to reduce the dimension for the GPE-based surrogate model described above, a manifold-learning based Isomap method is used (e.g., in operation 110 as shown in
Starting from creating a neighborhood network using k-nearest neighbors based on local Euclidean distances for the high-dimensional raw data, the internal nonlinear manifold structure of the data set is revealed by the geodesic distance between all pairs of points approximated from the local Euclidean distances in the neighborhood network. By applying eigenvalue decomposition on the approximated geodesic distance matrix, the low-dimensional embedding of the data set can be preserved efficiently for a nonlinear manifold.
Denote the raw data before dimensionality reduction as an n×praw matrix expressed as W, where praw>p. The goal is to use Isomap to transform W into a matrix X that can be effectively adopted to construct the GPE as described above. The Isomap can be determined based on three main steps: creating neighborhood graph, obtaining geodesic of ISomap, and eigen-decomposition of Isomap.
(1) Neighborhood Graph
A weighted graph G can be expressed as G (V, E, L) where V is the set of vertices and E is the set of edges with corresponding lengths L. To construct the graph G from the draw data W, n observations in W can be used to represent n vertices. Between any two of vertices, wi and wj, their pairwise Euclidean distance is denoted as δi,j. A n×n matrix can be formed to represent the Euclidean distance information between all vertices. The edges can be assigned accordingly based on the vertices. Using these vertices and edges, the topology of a graph can be generated. In general, a properly small k is helpful in classifying different groups of individuals. A large K, on the other hand, provides more concentrated embedding.
(2) Geodesic Calculation
Given the graphs and the Euclidean distances, the geodesic distance (shortest path) between ith and jth vertices di,j can be found. For example, the Dijkstra's algorithm can be applied to find the shortest path.
(3) Eigen-Decomposition
The Isomap objection function can be defined as follows:
Here, τ(D) stands for the shortest-path inner product matrix. DX is symmetric, hollow matrix of the Euclidean distance. τ(DX) denotes the Euclidean inner product matrix and ∥⋅∥ is the Frobenius matrix norm. From Eq. (37), the global minimum can be obtained by the largest p eigenvectors of τ(D). τ(D) is then reformed as τ(D)=½JD(2)J. The embedding X is obtained as
denotes the first p eigenvalues of τ(D) and {νs}s=1p=are the corresponding eigenfunctions. Simulation results have shown that the GPE surrogates with Isomap offers comparable results as the GPE surrogates with KLE. The GPE-Isomap method can embed input into a lower dimensionality, e.g., 9 latent variables. With the ability to explore much-lower-dimensional representations, the techniques disclosed herein can be applied to handle larger-scale systems.
While the embodiments focus on the application of power systems, the techniques described herein can also be applied to study other stochastic behaviors.
The processor(s) 805 may include central processing units (CPUs), graphics processing units (GPUs), or other types of processing units (such as tensor processing units) to control the overall operation of, for example, the host computer. In certain embodiments, the processor(s) 805 accomplish this by executing software or firmware stored in memory 810. The processor(s) 805 may be, or may include, one or more programmable general-purpose or special-purpose microprocessors, digital signal processors (DSPs), programmable controllers, application specific integrated circuits (ASICs), programmable logic devices (PLDs), or the like, or a combination of such devices.
The memory 810 can be or include the main memory of the computer system. The memory 810 represents any suitable form of random access memory (RAM), read-only memory (ROM), flash memory, or the like, or a combination of such devices. In use, the memory 810 may contain, among other things, a set of machine instructions which, when executed by processor 805, causes the processor 805 to perform operations to implement embodiments of the presently disclosed technology.
Also connected to the processor(s) 805 through the interconnect 825 is a (optional) network adapter 815. The network adapter 815 provides the computer system 800 with the ability to communicate with remote devices, such as the storage clients, and/or other storage servers, and may be, for example, an Ethernet adapter or Fiber Channel adapter.
In one example aspect, a system for performing numerical estimation comprises a plurality of sensors configured to collect a set of measurements of a stochastic behavior. The set of correlated measurements follows a non-standard probability distribution and is non-linearly correlated. A non-linear relationship exists between a set of system variables that describes the stochastic behavior and a corresponding set of measurements. The system also includes a processor coupled to the plurality of sensors and a memory including processor executable code. The processor executable code upon execution by the processor configures the processor to determine, based on the set of measurements, a numerical model of the stochastic behavior. The numerical model comprises a feature space comprising non-correlated features corresponding to the stochastic behavior. The non-correlated features have a dimensionality of M and the set of measurements has a dimensionality of N, M being smaller than N. The processor is also configured to generate a set of approximated system variables according to the set of measurements based on the numerical model.
In some embodiments, the processor is configured to determine the feature space based on a statistical transformation that converts a first set of correlated values into a second set of uncorrelated values having a lower dimensionality. In some embodiments, statistical transformation comprises a kernel principal component analysis. In some embodiments, the statistical transformation comprises a Karhunen-Loeve expansion. In some embodiments, the statistical transformation comprises a manifold-learning based Isomap construction.
In some embodiments, the non-correlated features in the feature space follow a non-standard probability distribution. In some embodiments, the processor is configured to map the feature space to a set of variables representing the numerical model prior to generating the set of approximated system variables. The set of variables follows a standard probability distribution. In some embodiments, the processor is configured to map the feature space to the set of variables based on a response surface approach. In some embodiments, the response surface approach comprises a polynomial chaos expansion.
In some embodiments, the processor is configured to sample the numerical model based on a statistical inference approach to determine the set of approximated system variables. In some embodiments, the statistical inference approach comprises a Metropolis-Hasting Markov Chain Monte Carlo (MCMC) approach. In some embodiments, the statistical inference approach comprises a Langevin MCMC approach. In some embodiments, the statistical inference approach comprises combining a Langevin MCMC approach and an adaptive MCMC that calibrates a covariance of a distribution function based on a history of a Markov Chain.
In some embodiments, the stochastic behavior is associated with one or more target domains. The processor is configured to determine a second numerical model of the stochastic behavior. The second numerical model comprises a space having a same dimensionality of N. The processor is configured to generate the set of approximated system variables based on sampling the numerical model using a first statistical inference approach; and reevaluating a subset of samples in the one or more target domains based on the second numerical using a second statistical inference approach. In some embodiments, the one or more target domains corresponds to one or more failure events with corresponding probabilities of occurrences.
In some embodiments, the processor is configured to determine a set of weighted measurements based on evaluating weights for the set of measurements. The numerical model of the stochastic behavior is determined based on the set of weighted measurements. In some embodiments, the weights are evaluated iteratively for the set of measurements.
In some embodiments, the processor is configured to generate the set of approximated system variables based on expanding the non-correlated features having the dimensionality of M to the set of approximated system variables having the dimensionality of N. In some embodiments, the standard probably distribution comprises at least a Gaussian distribution or a Beta distribution. In some embodiments, the processor is configured to update the numerical model based on the set of approximated system variables for a subsequent estimation.
In some embodiments, the stochastic behavior comprises a behavior of a power distribution system. In some embodiments, the power distribution system comprises a renewable energy system.
In yet another example aspect, a non-transitory storage medium has code stored thereon. The code, upon execution by a processor, causes the processor to implement a numerical estimation method that comprises determining, based on the set of measurements, a numerical model of the stochastic behavior. The numerical model comprises a feature space comprising non-correlated features corresponding to the stochastic behavior. The non-correlated features have a dimensionality of M and the set of measurements has a dimensionality of N, M being smaller than N. The method also includes adjusting the feature space such that the non-correlated features follow a standard probability distribution, expanding the feature space having the dimensionality of M to the dimensionality of N, and sampling the expanded feature space based on a statistical inference approach to determine a set of approximated system variables corresponding to the set of measurements. In some embodiments, the numerical estimation method further comprises updating the numerical model based on the set of approximated system variables.
Implementations of the subject matter and the functional operations described in this patent document can be implemented in various systems, digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Implementations of the subject matter described in this specification can be implemented as one or more computer program products, e.g., one or more modules of computer program instructions encoded on a tangible and non-transitory computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The term “data processing unit” or “data processing apparatus” encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Computer readable media suitable for storing computer program instructions and data include all forms of nonvolatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
While this patent document contains many specifics, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular inventions. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described in this patent document should not be understood as requiring such separation in all embodiments.
Only a few implementations and examples are described and other implementations, enhancements and variations can be made based on what is described and illustrated in this patent document.
This application claims priority to and benefits of U.S. Provisional Application 62/782,231, titled “COMPUTATIONAL FRAMEWORK FOR DATA ASSIMILATION AND UNCERTAINTY MANAGEMENT OF LARGE-DIMENSIONAL DYNAMICS MODELS,” filed on Dec. 19, 2018. The entire disclosure of the aforementioned application is incorporated by reference as part of the disclosure of this application.
The United States Government has rights in this invention pursuant to Contract No. DE-AC52-07NA27344 between the U.S. Department of Energy and Lawrence Livermore National Security, LLC, for the operation of Lawrence Livermore National Laboratory.
Number | Date | Country | |
---|---|---|---|
62782231 | Dec 2018 | US |