METHOD AND THE DEVICE FOR OPERATING A TECHNICAL SYSTEM

Description

FIELD

The present invention relates to a method and device for operating a technical system.

BACKGROUND INFORMATION

Gaussian process state-space models use Gaussian processes as the transition function in a state-space model to describe time series data in a fully probabilistic manner. These models have two types of latent variables, the temporal states required for modelling noisy sequential observations, and the so-called inducing outputs which are needed to treat the Gaussian process part of the model in an efficient manner.

Ialongo, Alessandro Davide, Mark Van Der Wilk, James Hensman, and Carl Edward Rasmussen. “Overcoming Mean-Field Approximations in Recurrent Gaussian Process Models.” International Conference on Machine Learning. 2019 describe a use of Variational Inference to approximate a true posterior of a Gaussian process model. A conditional dependence of temporal states on the inducing outputs is taken into account and a Markov Gaussian model over the temporal states is assumed. The Markov Gaussian model is parametric and allows for non-linear transitions.

Skaug, Hans Julius and David A. Fournier. “Automatic approximation of the marginal likelihood in non-Gaussian hierarchical models.” Comput. Stat. Data Anal. 51, pp. 699-709. 2006 describe an application of the Laplace approximation to generic, i.e. not Gaussian process, state-space models in an efficient manner. This is possible by using the Implicit Function Theorem and exploiting the sparsity and structure of the Hessian, i.e. a matrix that is needed for applying the Laplace approximation.

SUMMARY

A computer-implemented method and the device according to the present invention provides a model and a combination of these inference methods by treating two different types of the latent variables in Gaussian process state-space model distinctly and applying variational inference to a Gaussian process part of the model and the Laplace approximation to the temporal states of the model. The distinction of the two types of latent variables makes it possible to process the model efficiently. The method does not require sequential sampling of the temporal states during inference and instead performs the Laplace approximation that involves a joint optimization over those temporal states. This point helps in optimizing the model. The approximate posterior that is used in the model further assumes that dynamics can be locally linearly approximated. The improvements in the optimization that are provided by this model also lead to better calibrated uncertainties for different time-series prediction tasks.

According to an example embodiment of the present invention, the computer-implemented method for machine learning with time-series data representing observations related to a technical system, comprises providing the time-series data, and model parameters of a distribution over the time-series data and over a first latent variable and over a second latent variable, and variational parameters of an approximate distribution over a second latent variable, sampling a value of the second latent variable from the approximate distribution over the second latent variable, finding a value of the first latent variable depending on a density of the distribution over the time-series data and over the first latent variable and over the value of the second latent variable, in particular that maximizes the density of the distribution over the time-series data and over the first latent variable and over the value of the second latent variable, determining a Hessian depending on a second order Taylor approximation of the distribution over the time-series data and the first latent variable and the value of the second latent variable evaluated at the value of the first latent variable, determining a determinant of the Hessian, determining a Laplace approximation of a distribution over the time-series data conditioned with the value of the second latent variable depending on the determinant of the Hessian, determining an inverse of the Hessian, determining a Jacobian of the distribution over the time-series data and the first latent variable and the value of the second latent variable, evaluating an approximate lower bound that depends on the Laplace approximations that are determined for a plurality of values of the second latent variable, determining gradients of the Laplace approximations depending on the inverse Hessians and the Jacobians, updating the model parameters and the variational parameters depending on the gradients. This method uses a distinction of two types of latent variables, and uses an approximate posterior, and assumes that the dynamics can be locally linearly approximated. This method provides an improved way of doing inference in Gaussian process state-space models. The method has the following advantages: The method does not require sequential sampling of the temporal states during inference and instead performs the Laplace approximation that involves a joint optimization over those temporal states.

According to an example embodiment of the present invention, preferably, providing the time-series data comprises receiving the time-series data or receiving a sensor signal comprising information about the technical system and determining the time-series data depending on the sensor signal.

According to an example embodiment of the present invention, the method preferably comprises determining an instruction for actuating the technical system depending on the time-series data, the model parameters and the variational parameters, and outputting the instruction to cause the technical system to act.

Preferably, the technical system is a computer-controlled machine, like a robot, in particular a vehicle, a domestic appliance, a power tool, a manufacturing machine, a personal assistant or an access control system.

The technical system may comprise an engine or a part thereof, wherein the time-series data comprises as input to the technical system a speed and/or a load, and as output of the technical system an emission, a temperature of the engine, or an oxygen content in the engine.

The technical system may comprise a fuel cell stack or a part thereof, wherein the time-series data comprises as input to the technical system a current in the fuel cell stack, a hydrogen concentration in the fuel cell stack, a stoichiometry of an anode or a cathode of the fuel cell stack, a volume stream of a coolant for the fuel cell stack, an anode pressure for an anode of the fuel cell stack, a cathode pressure for a cathode of the fuel cell stack, an inlet temperature of a coolant for the fuel cell stack, an outlet temperature of a coolant for the fuel cell stack, an anode dew point temperature of an anode of the fuel cell stack, a cathode dew point temperature of a cathode of the fuel cell stack, and as output of the technical system (102) an average of the cell tensions across cells of the fuel cell stack, an anode pressure drop at an anode of the fuel cell stack, a cathode pressure drop at a cathode of the fuel cell stack, a coolant pressure drop between an inlet and an outlet for the coolant of the fuel cell stack, or a coolant temperature rise between an inlet and an outlet for the coolant of the fuel cell stack.

The instruction preferably comprises a target operating mode for the technical system.

According to an example embodiment of the present invention, the method may comprise determining the determinant of the Hessian depending on a factorization comprising a strictly upper triangular part of a part of the Hessian a strictly lower triangular part of the part of the Hessian and a block diagonal matrix of recursively defined blocks of a matrix. This is a very computing resource efficient way of determining the Hessian.

The method may comprise determining the inverse of the Hessian depending on a factorization comprising a strictly upper triangular part of a part of the Hessian a strictly lower triangular part of the part of the Hessian and a block diagonal matrix of recursively defined blocks of a matrix. This is a very computing resource efficient way of determining the inverse of the Hessian.

Evaluating the approximate lower bound may comprise sampling with samples of the second latent variable that are drawn from the approximate distribution over the second latent variable.

According to an example embodiment of the present invention, the device for machine learning with time-series data representing observations related to a technical system comprises at least one processor and at least one memory, wherein the at least one processor is adapted to execute instructions that when executed by the at least one processor cause the device to perform steps in a method for operating the technical system according to the present invention. This device provides advantages that correspond to the advantages the method of the present invention provides.

The device may comprises an interface that is adapted to receive information about the technical system and/or that is adapted to output an instruction that causes the technical system to act. This device is capable of interacting with the technical system.

A computer program may comprise computer readable instructions that when executed by a computer cause the computer to perform the steps of the method of the present invention.

Further advantageous embodiments are derived from the following description and the figures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 schematically depicts a device for operating a technical system, according to an example embodiment of the present invention.

FIG. 2 schematically depicts steps in a method for operating the technical system, according to an example embodiment of the present invention.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

FIG. 1 depicts a device 100 for operating a technical system 102 schematically.

The device 100 comprises at least one processor 104 and at least one memory 106. The at least one processor 104 is adapted to execute instructions that when executed by the at least one processor 104 cause the device 100 to perform steps in a method for operating the technical system 102.

The device 100 in the example comprises an interface 108. The interface 108 is for example adapted to receive information about the technical system 102. The interface 108 is for example adapted to output an instruction that causes the technical system 102 to act. The technical system 102 may comprise an actuator 110. The actuator 110 may be connected at least temporarily with the interface 108 via a signal line 112.

FIG. 2 depicts steps of the method. The method comprises analyzing data, e.g. given time-series data Y_T={y_t}_t=1^Twith y_t∈ custom-character ^d^yof a given dimension d_y, and then operating the technical system 102 accordingly.

The time series data Y_Tcomprises for example noisy observations from the technical system 102.

In addition to the time series data Y_Tthe method may consider additional d_udimensional time series data U_Twith u_t∈R^d^u.

The technical system 102 may comprise an engine or a part thereof. The time-series data may comprise as input to the technical system 102 a speed and/or a load, and as of the technical system 102 output an emission, a temperature of the engine, or an oxygen content in the engine.

The technical system 102 may comprises a fuel cell stack or a part thereof. The time-series data may comprise as input to the technical system 102 a current in the fuel cell stack, a hydrogen concentration in the fuel cell stack, a stoichiometry of an anode or a cathode of the fuel cell stack, a volume stream of a coolant for the fuel cell stack, an anode pressure for an anode of the fuel cell stack, a cathode pressure for a cathode of the fuel cell stack, an inlet temperature of a coolant for the fuel cell stack, an outlet temperature of a coolant for the fuel cell stack, an anode dew point temperature of an anode of the fuel cell stack, a cathode dew point temperature of a cathode of the fuel cell stack, and as output of the technical system 102 an average of the cell tensions across cells of the fuel cell stack, an anode pressure drop at an anode of the fuel cell stack, a cathode pressure drop at a cathode of the fuel cell stack, a coolant pressure drop between an inlet and an outlet for the coolant of the fuel cell stack, or a coolant temperature rise between an inlet and an outlet for the coolant of the fuel cell stack

The method operates on the given time-series data Y_Tfor a given number I of iterations i and a given number N of samples n.

The method is based on a probabilistic model and an approximate model. The approximate model is based on the fully independent training conditional, FITC, assumption. Details of this assumption are described e.g. in Edward Snelson and Zoubin Ghahramani. Sparse Gaussian Processes using Pseudo-inputs. In Advances in Neural Information Processing Systems, 2005.

The probabilistic model is based on a Gaussian process state-space model wherein a Gaussian process prior is placed on the mean of the transition model that learns the mapping from a latent state x_t-1to the next latent state x_t:

$p_{Θ} (Y_{T}, X_{T_{0}}, F_{T}) = p_{Θ} (x_{0}) p_{Θ} (F_{T} | X_{T_{0}}) \prod_{t = 1}^{T} p_{Θ} (y_{t} | x_{t}) p_{Θ} (x_{t} | x_{t - 1}, f_{t - 1})$

wherein the initial distribution p_Θ(x₀) and the emission model p_Θ(y_t|x_t) are left unspecified, and the transition model is given by

$p_{Θ} (x_{t} | x_{t - 1}, f_{t - 1}) = N (x_{t} | x_{t - 1} + f_{t - 1}, Q)$

where N is iid. Gaussian noise with variance Q, where F_T={ƒ(x_t)}_t=0^T-1, and wherein in the example, ƒ˜GP(0,k (⋅,⋅)) is zero-mean Gaussian process, i.e. a distribution over functions that is fully specified by a positive-definite, symmetric kernel k(⋅,⋅)∈ custom-character ^d^x^×d^x→.

In case the method considers the additional d_udimensional time series data U_Twith u_t∈ custom-character ^d^uthe kernel of the Gaussian process accepts input pairs of dimension ^d^x^+d^u.

The probabilistic model comprises a first latent variable. The first latent variable is in the example a temporal state X_T₀={x_t}_t=0^Twith x_t∈ custom-character ^d^xof a given dimension d_x.

Gaussian Process posteriors can be summarized by sparse Gaussian processes in which the information of the posterior is contained in the pseudo-dataset (X_M, F_M) where X_Mare the inducing inputs and F_Mare the inducing outputs.

The inducing output F_Mand F_Tshare a joint Gaussian distribution p_Θ(F_T, F_M). The model employs the fully independent training conditional approximation that assumes independence of the latent GP evaluations given the inducing outputs:

$p_{Θ} (F_{T} | X_{T_{0}}, F_{M}) \approx \prod_{t = 1}^{T - 1} p_{Θ} (f_{t} | x_{t}, F_{M})$

that leads to:

$p_{Θ} (Y_{T}, X_{T_{0}}, F_{M}) = p (F_{M}) p_{Θ} (Y_{T}, X_{T_{0}} | F_{M})$

$with$

$p_{Θ} (Y_{T}, X_{T_{0}} | F_{M}) = p_{θ} (x_{0}) \prod_{t = 1}^{T} p_{θ} (y_{t} | x_{t}) p_{θ} (x_{t} | x_{t - 1}, F_{M})$

The inducing output F_Mis a second latent variable.

In case the method considers additional d_udimensional time series data U_Twith u_t∈R^d^uthe inducing points X_Mlive in that higher dimensional space custom-character ^d^x^+d^u.

The approximate model comprises a distribution over the time-series data Y_Tand the first latent variable, e.g. the temporal state X_T₀, and the second latent variable, e.g. the inducing output F_M.

Lower bounding the log marginal log-likelihood log p_Θ(Y_T) by variational inference allows finding an approximation to the true posterior over the inducing outputs p_Θ(F_M|Y_T):

$\log p_{θ} (Y_{T}) \geq \int q_{Ψ} (F_{M}) \log p_{θ} (y_{T} | F_{M}) {dF}_{M} - KL (q_{Ψ} (F_{M}) || p_{θ} (F_{M})),$

$where p_{θ} (Y_{T} | F_{M}) = \int p_{θ} (Y_{T}, X_{T_{0}} | F_{M}) {dX}_{T_{o}} .$

The approximate model comprises an approximate distribution over the second latent variable, e.g. a variational distribution q_Ψ(F_M)=N(F_M|m, S) over the inducing output F_Mwith mean m and variance S. These are e.g. given initial variational parameters Ψ={m, S}.

For every inducing input X_M={x_m}_m=0^Mwith x_m∈ custom-character ^d^xthe inducing output F_Mis distributed according to a given approximate distribution q_Ψ(F_M) over the inducing output F_M.

The inducing input X_Mand the inducing output F_Mare referred to as pseudo data point. The prior over the inducing outputs is given by the Gaussian process prior p(F_M)=N(F_M|0, K_MM), where K_MM={k(x_m, x_m′}_m,m′=1^M.

The approximate model comprises a predictive distribution p(x_t|x_t-1, F_M)=N (x_t|x_t-1+μ(x_t-1, F_M),Σ(x_t-1)+Q), where the mean is given by μ(x_t, F_M)=K_tMK_MM⁻¹F_Mand the covariance is given by Σ(x_t)=k_tt−K_tMK_MM⁻¹K_tM^T, wherein k_tt=k(x_t, x_t) and K_tM={k(x_t, x_m)}_m=1^M.

The method comprises a step 200.

The step 200 comprises providing the time series data Y_T. The time series data U_Tmay be provided and used additionally.

The method may comprise receiving the time series data Y_Tat the interface 108.

Step 200 may comprise receiving a sensor signal comprising information about the technical system 102 and determining the time-series data Y_Tdepending on the sensor signal. The time series data U_Tmay be received or determined from a received sensor signal additionally. The time series data u_tis for example concatenated with the latent state x_tand used as input to the transition model and kernel function.

The step 200 comprises providing a given true distribution p_Θ(Y_T, X_T₀|F_M) over time-series data Y_Tand the latent states X_T₀that is conditioned on an inducing output F_M. This means, providing a distribution over the time-series data Y_Tand the first latent variable X_T₀and the second latend variable F_M.

The method operates with given initial model parameters Θ and the given initial variational parameters Ψ′={m, S}.

The method comprises an outer loop 202 and an inner loop 204.

The outer loop 202 is processed for iterations i=1, . . . , I. In the iterations, the model and variational parameters are optimized.

The inner loop 204 is processed for samples n=1, . . . , N. The samples are used to obtain a stochastic approximation to the log-likelihood.

The inner loop 204 comprises a step 204-1.

The step 204-1 comprises sampling a value of the second latent variable from the approximate distribution over the second latent variable.

In the example, an inducing output sample F_M⁽ⁿ⁾is determined from the distribution q_Ψ(F_M) over the inducing output F_M:

$F_{M}^{(n)} \sim q_{Ψ} (F_{M})$

The inner loop 204 comprises a step 204-2.

The step 204-2 comprises finding a value of the first latent variable depending on the density of the distribution over the time-series data and the first latent variable and the value of the second latent variable. In the example, step 204-2 comprises finding a value of the first latent variable for that the density of the distribution over the time-series data and the first latent variable and the value of the second latent variable is maximized.

In the example, a mode

${\hat{X}}_{T_{o}}^{(n)}$

is found that is a maximizer of a logarithmic density

$g_{G P} (X_{T_{0}}, Θ, F_{M}) = \log p_{Θ} (Y_{T}, X_{T_{0}} | F_{M}^{(n)})$

with respect to the latent states X_T₀,

${\hat{X}}_{T_{0}}^{(n)} = \arg \max_{X_{T_{0}}} g_{GP} (X_{T_{0}}, Θ, F_{M}^{(n)})$

This means, the method comprises finding a mode

${\hat{X}}_{T_{0}}^{(n)}$

that maximizes the logarithmic density.

The inner loop 204 comprises a step 204-3.

The step 204-3 comprises determining a Hessian of the logarithmic density g_GP(X_T₀, Θ, F_M⁽ⁿ⁾) depending on a mode of first latent variable {circumflex over (X)}_T₀the model parameters Θ, and the value of the second latent variable F_M⁽ⁿ⁾.

In the example, non-zero elements of a Hessian H(A_t, B_t)∈ custom-character ^d^x^(T+1)×d^x^(T+1)are obtained, wherein

${A_{t} = - \frac{\partial^{2} g_{G P} (X_{T_{0}}, Θ, F_{M}^{(n)})}{\partial x_{t} \partial x_{t}} ❘}_{X_{T_{0} = {\hat{X}}_{T_{o}}}} \in ℝ^{d_{x} \times d_{x}}$

${B_{t} = - \frac{\partial^{2} g_{G P} (X_{T_{0}}, Θ, F_{M}^{(n)})}{\partial x_{t} \partial x_{t - 1}} ❘}_{X_{T_{0} = {\hat{X}}_{T_{o}}}} \in ℝ^{d_{x} \times d_{x}}$

In the example the non-zero elements comprise the quantities {A_t}_t=1^Tand {B_t}_t=1^T. The Hessian is used to provide a second order Taylor approximation of g_GP(X_T₀, Θ, F_M⁽ⁿ⁾) around the mode {circumflex over (X)}_T₀. Note that Y_Tis constant.

In the example, the non-zero elements of the Hessian are determined with only 3d_xvector-Hessian products, reducing the memory and time requirements to O(Td_x²).

The inner loop 204 comprises a step 204-4.

The step 204-4 comprises determining a determinant of the Hessian.

The determinant of the Hessian is determined for example depending on a factorization comprising a strictly upper triangular part of a part of the Hessian a strictly lower triangular part of the part of the Hessian and a block diagonal matrix of recursively defined blocks of a matrix.

In the example, a determinant det H(A_t, B_t) of the Hessian H(A_t, B_t) is evaluated.

In the example, the determinant det H(A_t, B_t) of the Hessian H(A_t, B_t) is determined from a factorization

$H (A_{t}, B_{t}) = (Λ + B^{T}) Λ^{- 1} (Λ + B)$

wherein B is the strictly upper triangular part of the Hessian H(A_t, B_t) comprising of the different B_tand Λ is a block diagonal matrix of recursively defined blocks:

$Λ_{0} = A_{0}, Λ_{t} = A_{t} - B_{t}^{T} Λ_{t - 1}^{T} B_{t}, t = 1, \dots, T$

$as$

$\det H (A_{t}, B_{t}) = \prod_{t = 0}^{T} \det Λ_{t}$

wherein Λ_tis a block diagonal matrix and B_tis strictly upper triangular and B_t^Tis strictly lower triangular. These operations can be performed in O(Td_x³) steps.

The inner loop 204 comprises a step 204-5.

The step 204-5 comprises determining a Laplace approximation of a distribution over the time-series data conditioned with the value of the second latent variable depending on the determinant of the Hessian.

In the example, a Laplace approximation {tilde over (p)}_Θ(Y_T|F_M⁽ⁿ⁾) of a conditional p_Θ(Y_T|F_M⁽ⁿ⁾) is evaluated:

${\tilde{p}}_{Θ} (Y_{T} | F_{M}^{(n)}) \propto p_{Θ} (Y_{T}, {\hat{X}}_{T_{o}}^{(n)} | F_{M}^{(n)}) {\det (H (A_{t}, B_{t}))}^{- 1 / 2}$

The inner loop 204 comprises a step 204-6.

The step 204-6 comprises determining an inverse of the Hessian. The inverse of the Hessian is determined for example depending on the factorization comprising the strictly upper triangular part of the part of the Hessian the strictly lower triangular part of the part of the Hessian and the block diagonal matrix of recursively defined blocks of the matrix.

In the example, the inverse H⁻¹of the Hessian H is determined.

The inverse of the Hessian is determined from the factorization.

$H^{- 1} = {(Λ + B)}^{- 1} {Λ (Λ + B^{T})}^{- 1}$

The inner loop 204 comprises a step 204-7.

The step 204-7 comprises determining a Jacobian of the distribution over the time-series data and the first latent variable and the value of the second latent variable.

In the example, a Jacobian h of the function g_GP(X_T₀, Θ, F_M⁽ⁿ⁾) is determined:

${h ({\hat{X}}_{T_{0}}^{(n)}, Θ, F_{M}^{(n)}) = - \frac{\partial \log p_{Θ} (Y_{T}, X_{T_{0}} ❘ F_{M}^{(n)})}{\partial X_{T_{0}}} ❘}_{{\hat{X}}_{T_{0}}^{(n)} = x}$

The outer loop 202 comprises a step 202-1.

The step 202-1 comprises evaluating an approximate lower bound that depends on the Laplace approximations that are determined for a plurality of values of the second latent variable.

In the example, an approximate lower bound L(Θ, Ψ)

$L (Θ, Ψ) = \int q_{Ψ} (F_{M}) \log {\tilde{p}}_{Θ} (Y_{T} | F_{M}) d F_{M} - K L (q_{Ψ} (F_{M})  p_{Θ} (F_{M}))$

is evaluated, that comprises a Kullback-Leibler term, KL-term, for comparing the approximate distribution q_Ψ(F_M) with the true posterior distribution p_Θ(F_M). The distribution {tilde over (p)}_Θ(Y_T|F_M) is given by the Laplace approximation around {circumflex over (X)}_T₀.

In the example, the plurality of values of the second latent variable are the samples of the second latent variable that are determined in step 204-1 when processing the inner loop repeatedly. This means, the approximate lower bound is evaluated depending on samples of the second latent variable that are drawn from the approximate distribution over the second latent variable.

In the example, in order to evaluate and optimize this optimization objective, a parametric family is chosen for the approximate distribution q_Ψ(F_M).

In the example, q_Ψ(F_M) is a Gaussian distribution. This allows an analytical evaluation of the KL-term. The other term of L(Θ, Ψ) is analytically intractable. In the example the other term is optimized by sampling:

$\int q_{Ψ} (F_{M}) \log {\tilde{p}}_{Θ} (Y_{T} | F_{M}) d F_{M} \approx \sum_{n = 1}^{N} \log {\tilde{p}}_{Θ} (Y_{T} | F_{M}^{(n)})$

with samples of inducing outputs F_M^(N)that are drawn from the approximate distribution q_Ψ(F_M) over the inducing output F_M.

$F_{M}^{(n)} \sim q_{Ψ} (F_{M})$

The outer loop 202 comprises a step 202-2.

The step 202-2 comprises determining gradients of the Laplace approximations depending on the inverse Hessians and the Jacobians.

In the example, gradients

$\frac{\partial L (Θ, Ψ)}{\partial Θ} and \frac{\partial L (Θ, Ψ)}{\partial Ψ}$

of L(Θ, Ψ) are obtained using

$\frac{\partial {\hat{X}}_{T_{0}}^{(n)}}{\partial Θ} = H^{- 1} (Θ, F_{M}^{(n)}) \frac{\partial h ({\hat{X}}_{T_{0}}^{(n)}, Θ, F_{M}^{(n)})}{\partial Θ}$

$and$

$\frac{\partial {\hat{X}}_{T_{0}}^{(n)}}{\partial Ψ} = H^{- 1} (Θ, F_{M}^{(n)}) \frac{\partial h ({\hat{X}}_{T_{0}}^{(n)}, Θ, F_{M}^{(n)})}{\partial Ψ}$

wherein Ψ are the variational parameters that enter the equation implicitly over the sampled inducing outputs F_M⁽ⁿ⁾.

This exchanges a potentially costly automatic differentiation computations with a Hessian solve. This requires only the value {circumflex over (X)}_T₀⁽ⁿ⁾so that the complete computational graph of how it has been obtained is no longer required.

The outer loop 202 comprises a step 202-3.

The step 202-3 comprises updating the model parameters and variational parameters depending on the gradients.

In the example, the model parameters Θ and variational parameters Ψ={m, S} are updated.

Updating the model parameters Θ and the variational parameters Ψ={m, S} comprises determining the model parameters Θ and variational parameters Ψ′={m, S} for that minimize L(Θ, Ψ). This means, the model parameters Θ and variational parameters Ψ′={m, S} are determined for that L(Θ, Ψ) is smaller than for other model parameters Θ and variational parameters Ψ′={m, S}.

The aforementioned steps of the method describe an inference method to learn the model parameters Θ and variational parameters Ψ′ of a Gausssian process state space model in a training. These steps may be determined in an offline phase, e.g. for given time-series data Y_Tand optionally given additional time series data U_T.

The following steps of the method may be executed for a prediction e.g. in an online phase. These steps may be executed with a trained model, i.e. with given model parameters Θ and variational parameters Ψ. These steps may be executed independently of the training, i.e. without training, or jointly with the training after the training.

In the example, the model parameters Θ and variational parameters Ψ′ that are determined in a last iteration of updating the model parameters Θ and variational parameters Ψ are used for the prediction.

The method may comprise a step 206.

In the step 206, the method comprises determining an instruction for actuating the technical system 102 depending on the time-series data Y_T, the model parameters Θ and the variational parameters Ψ′={m, S}.

Optionally the additional time series data U_Tmay be used as well.

For example, the time-series data comprises as input to the approximate model of the technical system 102 a speed and/or a load. For example, the output of the approximate model of the technical system 102 is an emission, a temperature of the engine, or an oxygen content in the engine.

For example, the time-series data comprises as input to the approximate model of the technical system 102 a current in the fuel cell stack, a hydrogen concentration in the fuel cell stack, a stoichiometry of an anode or a cathode of the fuel cell stack, a volume stream of a coolant for the fuel cell stack, an anode pressure for an anode of the fuel cell stack, a cathode pressure for a cathode of the fuel cell stack, an inlet temperature of a coolant for the fuel cell stack, an outlet temperature of a coolant for the fuel cell stack, an anode dew point temperature of an anode of the fuel cell stack, a cathode dew point temperature of a cathode of the fuel cell stack. For example, the output of the approximate model of the technical system 102 is an average of the cell tensions across cells of the fuel cell stack, an anode pressure drop at an anode of the fuel cell stack, a cathode pressure drop at a cathode of the fuel cell stack, a coolant pressure drop between an inlet and an outlet for the coolant of the fuel cell stack, or a coolant temperature rise between an inlet and an outlet for the coolant of the fuel cell stack.

The instruction for example comprises a target operating mode for the technical system 102. The target operating mode may be determined depending on the output of the approximate model, e.g. by a controller or a characteristic curve or a map that maps the output to the target operating mode.

The method may comprise a step 208.

In the step 208, the method comprises outputting the instruction to cause the technical system 102 to act.

The instruction for example comprises the target operating mode for the technical system 102.

The time series data Y_Tmay be processed in the training in minibatches. A minibatch is a subsequence

$Y_{b} = {y_{t}}_{t = t_{0}}^{t_{0} + T_{b}}$

of the time series data Y_Tof length T_band starting at an arbitrary time index t₀. The method may be applied to minibatches. The method may comprise drawing a minibatch for a sample from the approximate distribution q_Ψ(F_M) and approximate the term

$\int q_{Ψ} (F_{M}) \log {\tilde{p}}_{Θ} (Y_{T} | F_{M}) d F_{M} \approx \frac{T}{T_{b}} \sum_{n = 1}^{N} \log {\tilde{p}}_{Θ} (Y_{T_{b}}^{(n)} | F_{M}^{(n)}), F_{M}^{(n)} \sim q_{Ψ} (F_{M})$

The method is applied to one-dimensional or multidimensional latent states x_talike. For multi-dimensional latent states x_tan independent Gaussian process may be used for each dimension of the latent state x_t:

$p_{Θ} (χ_{t} | x_{t - 1}, F_{M}^{d_{x}}) = \prod_{d = 1}^{d_{x}} N (x_{t}^{(d)} | x_{t - 1}^{(d)} + μ^{(d)} (x_{t - 1}, F_{M}^{(d)}), q_{d} + Σ^{(d)} (x_{t - 1}))$

$where F_{M}^{d_{x}} = {F_{M}^{(d)}}_{d = 1}^{d_{x}}$

is a collection of all inducing outputs, x_t^(d)is the d-th dimension of the latent state and μ^(d)is the mean and Σ^(d)is the covariance of the Gaussian process of the d-th dimension.

Claims

1-13. (canceled)
14. A computer-implemented method for machine learning with time-series data representing observations related to a technical system, the method comprising the following steps: providing the time-series data, and model parameters of a distribution over the time-series data and over a first latent variable and over a second latent variable, and variational parameters of an approximate distribution over the second latent variable;sampling a value of the second latent variable from the approximate distribution over the second latent variable;finding a value of the first latent variable depending on a density of the distribution over the time-series data and over the first latent variable and over the value of the second latent variable, that maximizes the density of the distribution over the time-series data and over the first latent variable and over the value of the second latent variable;determining a Hessian depending on a second order Taylor approximation of the distribution over the time-series data and the first latent variable and the value of the second latent variable evaluated at the value of the first latent variable;determining a determinant of the Hessian;determining a Laplace approximation of a distribution over the time-series data conditioned with the value of the second latent variable depending on the determinant of the Hessian;determining an inverse of the Hessian;determining a Jacobian of the distribution over the time-series data and the first latent variable and the value of the second latent variable;evaluating an approximate lower bound that depends on the Laplace approximations that are determined for a plurality of values of the second latent variable; anddetermining gradients of the Laplace approximations depending on the inverse Hessian and the Jacobian; andupdating the model parameters and the variational parameters depending on the gradients.
15. The method according to claim 14, wherein the providing of the time-series data includes: (i) receiving the time-series data, or (ii) receiving a sensor signal including information about the technical system and determining the time-series data depending on the sensor signal.
16. The method according to claim 14, further comprising: determining an instruction for actuating the technical system depending on the time-series data, the model parameters, and the variational parameters; andoutputting the instruction to cause the technical system to act.
17. The method according to claim 14, wherein the technical system is a computer-controlled machine, or a robot, or a vehicle, or a domestic appliance, or a power tool, or a manufacturing machine, or a personal assistant, or an access control system.
18. The method according to claim 14, wherein the technical system includes an engine or a part of an engine, wherein the time-series data includes as input to the technical system a speed and/or a load, and as output of the technical system an emission, or a temperature of the engine, or an oxygen content in the engine.
19. The method according to claim 14, wherein the technical system includes a fuel cell stack or a part of a fuel cell stack, wherein the time-series data includes as input to the technical system: (i) a current in the fuel cell stack, or (ii) a hydrogen concentration in the fuel cell stack, or (iii) a stoichiometry of an anode or a cathode of the fuel cell stack, or (iv) a volume stream of a coolant for the fuel cell stack, or (v) an anode pressure for an anode of the fuel cell stack, or (vi) a cathode pressure for a cathode of the fuel cell stack, or (vii) an inlet temperature of a coolant for the fuel cell stack, or (viii) an outlet temperature of a coolant for the fuel cell stack, or (ix) an anode dew point temperature of an anode of the fuel cell stack, or (x) a cathode dew point temperature of a cathode of the fuel cell stack, and as output of the technical system: (i) an average of the cell tensions across cells of the fuel cell stack, or (ii) an anode pressure drop at an anode of the fuel cell stack, or (iii) a cathode pressure drop at a cathode of the fuel cell stack, or (iv) a coolant pressure drop between an inlet and an outlet for the coolant of the fuel cell stack, or (v) a coolant temperature rise between an inlet and an outlet for the coolant of the fuel cell stack.
20. The method according to claim 16, wherein the instruction includes a target operating mode for the technical system.
21. The method according to claim 13, wherein the determining of the determinant of the Hessian depends on a factorization including a strictly upper triangular part of a part of the Hessian, a strictly lower triangular part of the part of the Hessian, and a block diagonal matrix of recursively defined blocks of a matrix.
22. The method according to claim 13, wherein the determining of the inverse of the Hessian depends on a factorization includes a strictly upper triangular part of a part of the Hessian, a strictly lower triangular part of the part of the Hessian, and a block diagonal matrix of recursively defined blocks of a matrix.
23. The method according to claim 13, wherein the evaluating of the approximate lower bound includes sampling with samples of the second latent variable that are drawn from the approximate distribution over the second latent variable.
24. A device for machine learning with time-series data representing observations related to a technical system, the device comprising: at least one processor; andat least one memory;wherein the at least one processor is adapted to execute instructions, the instructions, when executed by the at least one processor, cause the at least one processor to perform the following steps:providing the time-series data, and model parameters of a distribution over the time-series data and over a first latent variable and over a second latent variable, and variational parameters of an approximate distribution over the second latent variable;sampling a value of the second latent variable from the approximate distribution over the second latent variable;finding a value of the first latent variable depending on a density of the distribution over the time-series data and over the first latent variable and over the value of the second latent variable, that maximizes the density of the distribution over the time-series data and over the first latent variable and over the value of the second latent variable;determining a Hessian depending on a second order Taylor approximation of the distribution over the time-series data and the first latent variable and the value of the second latent variable evaluated at the value of the first latent variable;determining a determinant of the Hessian;determining a Laplace approximation of a distribution over the time-series data conditioned with the value of the second latent variable depending on the determinant of the Hessian;determining an inverse of the Hessian;determining a Jacobian of the distribution over the time-series data and the first latent variable and the value of the second latent variable;evaluating an approximate lower bound that depends on the Laplace approximations that are determined for a plurality of values of the second latent variable; anddetermining gradients of the Laplace approximations depending on the inverse Hessian and the Jacobian;updating the model parameters and the variational parameters depending on the gradients;determining an instruction for actuating the technical system depending on the time-series data, the model parameters, and the variational parameters; andoutputting the instruction to cause the technical system to act.
25. The device according to claim 24, wherein the device further comprises an interface that is adapted to receive information about the technical system and/or that is adapted to output the instruction that causes the technical system to act.
26. A non-transitory computer-readable medium on which is stored a computer program including computer readable instructions for machine learning with time-series data representing observations related to a technical system, the instructions, when executed by a computer, causing the computer to perform the following steps: providing the time-series data, and model parameters of a distribution over the time-series data and over a first latent variable and over a second latent variable, and variational parameters of an approximate distribution over the second latent variable;sampling a value of the second latent variable from the approximate distribution over the second latent variable;finding a value of the first latent variable depending on a density of the distribution over the time-series data and over the first latent variable and over the value of the second latent variable, that maximizes the density of the distribution over the time-series data and over the first latent variable and over the value of the second latent variable;determining a Hessian depending on a second order Taylor approximation of the distribution over the time-series data and the first latent variable and the value of the second latent variable evaluated at the value of the first latent variable;determining a determinant of the Hessian;determining a Laplace approximation of a distribution over the time-series data conditioned with the value of the second latent variable depending on the determinant of the Hessian;determining an inverse of the Hessian;determining a Jacobian of the distribution over the time-series data and the first latent variable and the value of the second latent variable;evaluating an approximate lower bound that depends on the Laplace approximations that are determined for a plurality of values of the second latent variable; anddetermining gradients of the Laplace approximations depending on the inverse Hessian and the Jacobian; andupdating the model parameters and the variational parameters depending on the gradients.

Priority Claims (1)

Number	Date	Country	Kind
22 17 3335.5	May 2022	EP	regional

PCT Information

Filing Document	Filing Date	Country	Kind
PCT/EP2023/062296	5/9/2023	WO

METHOD AND THE DEVICE FOR OPERATING A TECHNICAL SYSTEM

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (1)

PCT Information