1. Field of the Invention
The present invention relates to methods for ascertaining a gradient of a data-based function model, in particular using a control module having a hardware unit, which is designed to calculate the data-based function model in a hard-wired way.
2. Description of the Related Art
Data-based function models may be provided for implementing function models in control units, in particular in engine control units for internal combustion engines. Data-based function models are also referred to as parameter-free models and may be prepared without specific inputs from training data, i.e., a set of training data points.
Control modules having a main computing unit and a separate model calculation unit for calculating data-based function models in a control unit are known from the related art. Thus, for example, the published German patent application document DE 10 2010 028 259 A1 describes a control unit having an additional logic circuit as a model calculation unit which is designed for calculating exponential functions to assist in carrying out Bayesian regression methods, which are required in particular for calculating Gaussian process models.
The model calculation unit is designed as a whole for carrying out mathematical processes for calculating the data-based function model based on parameters and supporting points or training data. In particular, the functions of the model calculation unit are implemented solely in hardware for efficient calculation of exponential and summation functions, so that it is made possible to calculate Gaussian process models at a higher computing speed than may be carried out in the software-controlled main computing unit.
For many applications, the calculation of function values of data-based function models in control units, in particular for internal combustion engines, is sufficient. However, applications are known in which a gradient of a data-based function model is necessary, in particular to calculate an inverse data-based function model therewith.
According to a first aspect, a method is provided for calculating a gradient of a data-based function model, in particular a Gaussian process model. A model calculation unit is designed to calculate a function value of the data-based function model using an exponential function, summation functions, and multiplication functions in two nested loop operations in a hardware-based way, the model calculation unit being used for calculating the gradient of the data-based function model for a desired value of a predefined input variable.
One idea of the above method is to carry out the calculation of a gradient of a data-based function model, essentially the existing algorithms implemented in hardware being used for calculating the function value of the data-based function model. This enables the calculation of the gradient for the data-based function model to be carried out on a hardware-based model calculation unit, in which the algorithm for calculating the data-based function model is implemented essentially permanently wired, i.e., in hardware. Due to the simplified calculation of the gradient of the data-based function model, it is possible, in particular with the aid of a Newtonian iteration method, to calculate a backward model, in which a numeric inversion may be carried out locally for a given target value with respect to a fixed input dimension.
Furthermore, it may be provided that the data-based function model is defined by supporting point data, hyperparameters, and a parameter vector, the parameter vector containing a number of elements which corresponds to the number of the supporting point data points, for calculating the gradient of the data-based function model for the desired value of the predefined input variable, the data-based function model being modified by applying a weighting vector, which is dependent on supporting point data points, to the parameter vector.
According to another specific embodiment, the gradient of the data-based function model may be calculated as a function value of the modified data-based function model for the desired value of the predefined input variable in the model calculation unit and an offset value may be added.
Furthermore, if the supporting point data points are scaled, the result of the sum of the function value of the modified data-based function model and the offset value may be multiplied by a factor, which is based on the standard deviation of the supporting point data with regard to the output data, to obtain the gradient of the data-based function model.
A weighting vector, which is dependent on supporting point data points, may be repeatedly applied to the parameter vector during a calculation of the modified data-based function model.
According to one specific embodiment, the data-based function model may be defined by supporting point data, hyperparameters, and a parameter vector, the parameter vector containing a number of elements which corresponds to the number of the supporting point data points, the data-based function model being modified for calculating the gradient of the data-based function model with regard to a predefined input variable by calculating the function value of the data-based function model in the model calculation unit for a desired value of the predefined input variable, multiplying the result with the desired value of the predefined input variable, and subsequently carrying out a renewed calculation of the data-based function model using a changed parameter vector in the model calculation unit.
According to another aspect, a method for carrying out a Newtonian iteration method for a data-based function model in a control module having a main computing unit and a model calculation unit is provided, the model calculation unit being designed to calculate in a hardware-based way function values of the data-based function model using an exponential function, summation functions, and multiplication functions in two loop operations, a gradient of the data-based function model being ascertained according to the above method and the data-based function model being calculated with the aid of the model calculation unit.
Furthermore, the gradient of the data-based function model may be calculated in a first computing core of the model calculation unit and the function value of the data-based function model may be calculated in a second computing core of the model calculation unit.
According to another aspect, a device, in particular a control module having a main computing unit and a model calculation unit is provided, the model calculation unit being designed to calculate function values of the data-based function model using an exponential function, summation functions, and multiplication functions in two loop operations in a hardware-based way, the device being designed to carry out the above method.
Model calculation unit 3 is basically essentially hard-wired and accordingly is not designed like main computing unit 2 for carrying out a software code. Alternatively, an approach is possible in which model calculation unit 3 provides a restricted, highly specialized command set for calculating the data-based function model. Model calculation unit 3 is designed as a specialized computing unit only for calculating predetermined computing processes. This enables resource-optimized implementation of such a model calculation unit 3 or a surface-optimized configuration in integrated architecture.
Model calculation unit 3 has a number of computing cores; thus, for example, in the exemplary embodiment shown in
Control module 1 may include an internal memory 5 and a further DMA unit 6 (DMA=direct memory access). Internal memory 5 and further DMA unit 6 are connected to one another in a suitable way, for example, via internal communication link 4. Internal memory 5 may include a shared SRAM memory (for main computing unit 2, model calculation unit 3, and optionally further units) and a flash memory for the configuration data (parameters and supporting point data).
The use of nonparametric, data-based function models is based on a Bayesian regression method. The fundamentals of Bayesian regression are described, for example, in C. E. Rasmussen et al., “Gaussian Processes for Machine Learning,” MIT Press 2006. Bayesian regression is a data-based method which is based on a model. To prepare the model, measuring points of training data and associated output data of an output variable to be modeled are required. The preparation of the model is carried out based on the use of supporting point data, which entirely or partially correspond to the training data or are generated therefrom. Furthermore, abstract hyperparameters are determined, which parameterize the space of the model functions and effectively weight the influence of the individual measuring points of the training data on the later model prediction.
The abstract hyperparameters are determined by an optimization method. One possibility for such an optimization method is an optimization of a marginal likelihood p(Y|H, X). Marginal likelihood p(Y|H, X) describes the plausibility of the measured y values of the training data, represented as vector Y, given model parameters H and the x values of the training data. In model training, p(Y|H, X) is maximized by searching for suitable hyperparameters which result in a curve of the model function determined by the hyperparameters and the training data and which image the training data as precisely as possible. To simplify the calculation, the logarithm of p(Y|H, X) is maximized, since the logarithm does not change the consistency of the plausibility function.
The calculation of the Gaussian process model takes place according to the steps which are schematically shown in
In this formula, mx corresponds to the mean value function with respect to a mean value of the input values of the supporting point data, sx corresponds to the variance of the input values of the supporting point data, and d corresponds to the index for dimension D of test point x.
The following equation is obtained as the result of the preparation of the nonparametric, data-based function model:
Model value v thus ascertained is scaled with the aid of an output scaling, specifically according to the following formula:
{tilde over (v)}=vs
y
+m
y
In this formula, v corresponds to a scaled model value (output value) at a scaled test point x (input variable vector of dimension D), {tilde over (v)} corresponds to a (non-scaled) model value (output value) at a (non-scaled) test point {tilde over (x)} (input variable vector of dimension D), xi corresponds to a supporting point of the supporting point data, N corresponds to the number of the supporting points of the supporting point data, D corresponds to the dimension of the input data/training data/supporting point data space, and Id and σf correspond to the hyperparameters from the model training, namely the length scale and the amplitude factor. Vector Qy is a variable calculated from the hyperparameters and the training data. Furthermore, my corresponds to the mean value function with respect to a mean value of the output values of the supporting point data and sy corresponds to the variance of the output values of the supporting point data.
The input and output scaling is carried out, since the calculation of the Gaussian process model typically takes place in a scaled space.
At the start of a calculation, in particular computing unit 2 may instruct local DMA unit 34 or further DMA unit 6 to transfer the configuration data relating to the function model to be calculated into model calculation unit 3 and to start the calculation, which is carried out with the aid of the configuration data. The configuration data include the hyperparameters of a Gaussian process model and supporting point data, which are preferably specified with the aid of an address pointer on the address area of internal memory 5 assigned to model calculation unit 3. In particular, SRAM memory 33 for model calculation unit 3, which may be situated in particular in or on model calculation unit 3, may also be used for this purpose. Internal memory 5 and SRAM memory 33 may also be used in combination.
The calculation in model calculation unit 3 is carried out in a hardware architecture of model calculation unit 3, which is implemented by the following pseudocode and which corresponds to the above calculation guideline. It is apparent from the pseudocode that calculations are carried out in an inner loop and an outer loop and the partial results thereof are accumulated. At the beginning of a model calculation, a typical value for a counter start variable is Nstart 0.
The model data required for calculating a data-based function model thus include hyperparameters and supporting point data, which are stored in a memory area in the memory unit assigned to the relevant data-based function model. According to the above pseudocode, the variables for calculating data-based function models include the scaling parameters, which are defined for each dimension, s_x (corresponds to sx), m_x (corresponds to mx), s_y (corresponds to sy), m_y (corresponds to my), parameter vector Q_y (corresponds to Qy), scaled training data X, number N of the supporting points, number D of the dimensions of the input variables, a starting value nStart of an outer loop, a loop index vInit in the event of a resumption of the calculation of the inner loop (normally=0), and length scale I for each of the dimensions of the input variables.
In integrated control modules, functional values of the Gaussian process model defined by hyperparameters and supporting point data are generally calculated. Furthermore, it may be necessary, depending on the implemented function in integrated control module 1, to calculate an inverted function, for a given output value ya and established input data x1, x2, . . . , xp−1, xp+1, . . . , xD, the value of xp is to be calculated so that
y(x)=y(x1,x2, . . . , xD)=ya
results.
Since the function of y(x) generally is not invertible, a method for zero point determination, in particular a Newtonian method for solving the inverse problem, may be used. The Newtonian method provides searching for the zero points of the function
f(x)=y(x)−ya
To find the zero points of the real value function, the Newtonian method provides an iteration process, n corresponding to the nth iteration:
In the nth iteration, an update of xpn+1 is thus obtained. Function f(x) and its derivative f′(x) are thus evaluated at input point x=x1, x2, . . . , xpn, . . . , xD. Three cases may be differentiated in the calculation of the function value of the data-based function model and the first derivative of the data-based function model at input vector x.
The first case relates to the situation in which the sets of supporting point data points X(k) and Y(k) are not scaled for the kth data-based partial function model in each case.
Proceeding from a specific example having a linear mean value function and two data-based partial function models (Gaussian process models), the gradient of the data-based function model is calculated. The procedure may be expanded arbitrarily to more than two partial function models. The data-based function model is described as follows:
gi(x) and hi(x) corresponding to data-based partial function models, σf(k), (Qy(k))i, ld(k) corresponding to hyperparameters or the parameters derived therefrom of the kth Gaussian process model, ya corresponding to the target value, m1(x)=a1x1+a2x2+a3x3+c corresponding to the mean value function, and x(k) corresponding to the supporting point data. First partial derivative f′(x) at xp is:
In a second case, the training data sets are scaled. One difficulty in the case of the use of scaled data for training the summation model including individual Gaussian process models is that for each partial model, the parameters for the scaling, i.e., standard deviation σX(k), σY(k) and mean value of the data
By way of the use of non-scaled data for training the Gaussian process model, the value of f(x)=ax+c+y2(x)+y3(x)−ya is obtained. By way of the use of scaled data for training the Gaussian process model, the function value of function f(x) is calculated by back-scaling of each function value of the Gaussian process model using its corresponding scaling parameters. The linear mean value function does not use scaled data, no back-scaling is therefore necessary for it. Therefore, the following equation is obtained for function value f(x):
The difference between y2(x) and y2(x(2)) here is that the first expression means that the first Gaussian process model has a non-scaled input vector x and the model has been trained on non-scaled data, while in contrast the second expression means that input vector x(2) has been scaled using scaling parameters σx(2) and
First derivative f′(x) then reads:
The inputs of the two Gaussian process models x(2) and x(3) differ since each Gaussian process model has its own scaling. Since vector X is D-dimensional, the standard deviation of dimension p of the second partial function model is specified by (σX(2))p.
In a third case, the training data set is Box-Cox transformed with respect to the outputs using function b(y) and X is scaled. The calculation may also be carried out using an arbitrary number of data-based partial function models in the third case.
Function f(x) is specified in this case by:
f(x)=b−1(b(m1(x))+y2(x)+y3(x))−ya
The additive Gaussian process models have been trained using scaled and Box-Cox transformed training data. Linear mean value function m1(x) uses non-scaled input vector x as an input. This results in
f(x)=b−1(m1(x))+y2(x(2))·σY(2)+
In this formula, σY(2) and
The following formula results:
This corresponds to a Box-Cox transformation using log(y). For other Box-Cox transformations, the derivation of f′(x) is similar.
For the Newtonian algorithm, two essential expressions are to be calculated, namely f(x) and f′(x). For the first case, that supporting point data X and Y are not scaled, the calculation of f(x) is possible by way of the calculation of model calculation unit 3 of integrated control module 1. Only ya must be subtracted, i.e., input value y, for the inverse problem. Alternatively, ya may be integrated into mean value model parameters a and c, by reducing c by ya.
The formula
corresponds to the formula for calculating the derivative of a function value, which contains a linear mean value function and two additive Gaussian process models. For each data-based partial function model (error model), the derivative may be calculated as a weighted calculation in model calculation unit 3 of the error model at test point x, the weights being dependent on x. Parameter value Qy specifies the product of the inverse of a covariance matrix of the training data, to which noise is applied on the diagonal, with the vector of the associated output values, and may be replaced, inter alia, rapidly during the calculation in model calculation unit 3. Therefore, the following formula may be used for calculating the derivative (in the case of two additive data-based partial function models):
The terms (*) and (**) may each be calculated by model calculation unit 3. Between the two calculations, only parameter vector Qy(k) of the kth data-based partial function model must be adapted, Qy(k) being provided in gi(x) or in hj(x). For this purpose, the ith entry of parameter vector Qy(k) is adapted, by multiplying it with weighting factor wi(x), where
Since wi(x) is dependent on x and the pth component of x changes over the course of the iterations, wi(x) and therefore parameter vector Qy(k) must be changed in each calculation step i. It is thus necessary that parameter vector Qy(k) may be changed rapidly during the calculation. For the calculation in model calculation unit 3, the following formula therefore results
Σi=1Ngi(x)·wi(x)
the calculation being carried out on the basis of changing parameter vectors Qy(k).
If the (on-the-fly) updating of parameter vector Qy(k) is not possible, the calculation may be carried out by rewriting the formula
into the following expression
Two calculations are carried out as follows in model calculation unit 3, as shown in
A first calculation (step S1)
Σi=1Ngi(x)=y(x)
in one of computing cores 31, 32 is followed by a subsequent software multiplication by −xp in main computing unit 2 (step S2)
Σi=1Ngi(x)·(−xp)
and a subsequent calculation (step S3) in model calculation unit 3 using a changed parameter vector Qy(k), which is ascertained by the element by element multiplication of existing parameter vector Qy(k) with Xi,p(k)
Σi=1Ngi(x)·Xi,p(2)
The calculations in model calculation unit 3 are necessary for the calculation of a calculation step. It is thus not necessary to change the model parameters during the running calculation.
During the calculation of the Newtonian method, the calculation of f(x) is carried out for each iteration. Therefore, the term
Σi=1Ngi(x)·(−xp)
only requires one multiplication and no additional calculation of model calculation unit 3. Since two model calculations are possible, the calculations of f(x) and f′(x) may be carried out for each iteration in parallel in computing cores 31, 32.
For the second case, that training data X(k),Y(k) are scaled, the formula
may be calculated as explained above using
Σi=1Ngi(x)·wi(x).
In this case, factor wi(x) is calculated on the scaled x value, i.e., on X(2) in the specified notation, in particular by the calculation using sy=σY2/(σY2)p. The descaling parameter is thus used to multiply the obtained result by the suitable factor.
If an online update of parameters of the model calculation is not possible, by rewriting the above formula into the following expression:
the calculation may be carried out similarly as explained above, with the single difference of the multiplication by
or the suitable term for other Gaussian process models. The calculation is carried out for each data-based partial function model with the aid of two model calculations according to the following computing steps, which are schematically shown in
multiplication of the result by this factor in software (step S14)
For the third case, that for each data-based partial function model, outputs y of the training data are Box-Cox transformed using b(y) and the inputs of training data X are scaled, the following applies for f(x) and f′(x):
the Box-Cox transformation corresponding to b(y)=log(y). f(x) is calculated as follows:
Gradient f′(x) of the function model is calculated as follows, as schematically shown in
multiplication of the result by this factor in software (step S24)
Since in particular term A is used for the calculation of both f(x) and f′(x), only a single calculation is sufficient in model calculation unit 3.
Number | Date | Country | Kind |
---|---|---|---|
10 2013 224 694.3 | Dec 2013 | DE | national |