The invention relates to a method for automatically setting at least one parameter of an actuator control system, a computer program and a learning system for carrying out the method, a machine-readable storage medium on which the computer program is stored, and an actuator control system of which the parameters are set using said method.
From DE 10 2017 211 209, which is not a previously published document, a method is known for automatically setting at least one parameter of an actuator control system which is designed to control a control variable of an actuator to a predefinable target variable, the actuator control system being designed to generate a manipulated variable depending on the at least one parameter, the target variable and the control variable, and to actuate the actuator depending on said manipulated variable,
a new value of the at least one parameter being selected depending on a long-term cost function, said long-term cost function being determined depending on a predicted temporal evolution of a probability distribution of the control variable of the actuator, and the parameter then being set to this new value.
The method having the features of independent claim 1 nevertheless has the advantage that it makes optimal setting of the actuator control system that has an unlimited temporal horizon of the control possible. Advantageous developments are the subject matter of the independent claims.
In a first aspect, the invention relates to a method for automatically setting at least one parameter of an actuator control system for controlling a control variable of an actuator to a predefinable target variable, the actuator control system being designed to generate a manipulated variable depending on the at least one parameter, the target variable and the control variable and to actuate the actuator depending on said manipulated variable, a new value of the at least one parameter being selected depending on a stationary probability distribution of the control variable, and the parameter then being set to said new value.
The stationary probability distribution is in this case the probability distribution toward which a probability distribution of the control variable converges during sustained use of a control strategy of the actuator control system that depends on the parameter. According to the invention, it has been specifically recognized that said stationary probability distribution for many systems which comprise an actuator and an actuator control system according to the invention exists largely independently of the initial requirements and is obvious.
It therefore becomes possible to also optimise the control strategy if no limitation of a temporal horizon of the control is to be predefined.
A model is provided in an advantageous development. Said model can in particular be a Gaussian process, advantageously a sparse Gaussian process. The stationary probability distribution is then determined using this model. This causes the method to be particularly efficient.
In a development of this aspect the model can be adapted depending on the manipulated variable which is supplied to the actuator when the actuator control system controls the actuator. The model is also adapted depending on the control variable which results therefrom. “Adaptation of the model” can be understood in this case as meaning that model parameters which characterise the behaviour of the model are adapted.
After the model has been adapted, an (optimal) new value of the at least one parameter is redetermined depending on the stationary probability distribution of the control variable of the actuator (and the parameter is then reset to this new value). The redetermination of the new value of the at least one parameter is in this case determined depending on the now adapted model.
That is to say, in this development an episodic approach is provided in which the model is first improved (by the behaviour of the real actuator being observed when the real actuator is controlled by the actuator control system). The actuator control system is then improved by the parameters which characterise the control strategy of the actuator control system being optimized by the model during simulation of the reaction of the actuator. This sequence of improving the model and adapting the parameters can be repeated multiple times.
This procedure has the advantage that the model and the actuator control system are successively improved, therefore resulting in a particularly good adaptation of the actuator control system.
In a further particularly advantageous aspect, the stationary probability distribution of the control variable can be determined by an approximation of an integration using possible values of the control variable, said approximation being carried out using numerical quadrature. “Numerical quadrature” in this case refers to an approximation method which approximates the integral by evaluating the integrands at support points and support weights which are associated with the support points.
The stationary probability distribution can in this case be determined by means of a (Gaussian) process which has one or more time steps. At each fixed state of a time step, the Gaussian process in this case models a probability distribution having an associated average value and associated variance of the subsequent state (i.e., of the state at the next time step).
The use of the numeric quadrature has the advantage, in particular in conjunction with the use of Gaussian processes, that the solution is numerically particularly simple, the precision of the approximation being very good at the same time, such that the actuator control system generated in this way is particularly efficient.
In an advantageous development, a density of the support points is determined depending on a determined temporal evolution of the control variable, that in particular is determined by means of the model and/or the actuator control system, proceeding from an initial value of the control variable that was (pseudo-)randomly determined from an initial probability distribution, i.e., in this case the initial value is “sampled”, in particular from the initial probability distribution. A temporal evolution (i.e., a trajectory in the state space) of the control variable is therefore determined, at the starting point of which the control variable assumes the randomly determined initial value. The density of the support points is then selected depending on this temporal evolution. This leads to an efficient selection of the support points, since actual trajectories of the control variable influence the selection of the support points with adequate probability. In particular it can therefore be ensured that the method also functions reliably when the parameter of the actuator control system is not yet well adapted.
In one development, the density of support points can also be determined proceeding from the target value as an initial value of the control variable, depending on a determined temporal evolution of the control variable that is in particular determined by means of the model and/or the actuator control system. This has the advantage that the support points are selected particularly efficiently, since when the method converges it can be assumed that an actual trajectory of the control variable is close to a trajectory on which the control variable assumes the target value.
Specifically, the density of the support points can be selected depending on a variable which characterizes a smoothness of the model at at least one value of the control variable in the determined temporal evolution(s) of the control variable. Formulated more precisely, the expression “smoothness of the model” can be understood to mean the smoothness of the model prediction, i.e., the smoothness of a probability distribution predicted for the subsequent following time step. A low level of smoothness of the model means in this case that greater differences can be expected between successive time steps in the temporal evolutions than in cases in which the smoothness of the model has a higher value.
This variable which characterizes the smoothness of the model can in particular be a variance of the Gaussian process, which variance is associated with at least one of the values which the control variable assumes in the determined temporal evolution(s). The greater said variance is, the lower the level of smoothness of the model.
In this manner it can be ensured that the selection of the support points is selected such that an error of the approximation, in particular of the numerical quadrature, becomes particularly small.
In order to carry this out in an optimum manner, the density of the support points in a region can be selected depending on a smallest value, said smallest value being a minimum value of the variables which characterize a smoothness of the model at the values of the control variable which are in this range. That is to say, one or more temporal evolutions of the control variable are determined as a discrete sequence of values which the control variable assumes. Of the discrete sequence of values, only those values that are in the aforementioned range are then considered. A variable is associated with each of these values, which variable characterizes the smoothness of the model at this point. The smallest value is selected from these associated values.
The density of support points in a region can also alternatively or additionally be selected depending on an average density of support points in said region. In particular, the density of support points can be increased when a quotient of an average density of support points and the smallest value falls below a predefinable threshold value, in particular the value 1. A method of this kind is particularly simple to implement.
An increase of the average density of support points can be achieved by reducing a volume element to which a rule for generating support points is applied, for example by dividing a present volume element into a plurality of smaller volume elements, and then generating for each of these smaller volume elements by means of the rule for generating support points.
In a further aspect, a result of the numerical quadrature can be determined depending on a dominant eigenvector of a matrix which is given by a product of a diagonal matrix of support weights and a transition matrix, the components of the transition matrix each characterizing a probability of a transition of the control variable from a first support point to a second support point.
This has the advantage that it is particularly efficient to determine the dominant eigenvector as a threshold value of an operation, on the basis of a repeated matrix multiplication. In this case the function which describes the probability density only has to be evaluated once at each support point. This method can be parallelized particularly well and is therefore particularly efficient to carry out on one or more GPUs.
In a further aspect of the invention, the long-term cost function can be selected depending on a local cost function, the local cost function being selected depending on a Gaussian function and/or a polynomial function which is dependent on a difference between the manipulated variable and the predefinable target value. The cost function can, for example, be selected as a linear combination of the Gaussian function and polynomial function. Selecting the cost function in this manner is particularly simple.
In another further aspect, the manipulated variable can advantageously be limited by means of a limiting function to values within a predefinable manipulated variable range. This allows the manipulated variable to be limited in a particularly simple manner.
In further aspects, the invention relates to a learning system for automatically setting at least one parameter of an actuator control system which is designed to control a control variable of an actuator to a predefinable target variable, the learning system being designed to carry out one of the aforementioned methods.
As mentioned, aspects of the method can be carried out particularly efficiently on one or more GPUs. The learning system can therefore advantageously comprise one or more GPUs for carrying out the method.
Embodiments of the invention are described below in greater detail and with reference to the accompanying drawings, in which:
The actuator 10 can be, for example, a (partially) autonomous robot, for example a (partially) autonomous motor vehicle or a (partially) autonomous lawnmower. It can also be an actuation means of an actuating member of a motor vehicle, for example a throttle valve or a bypass actuator for idling control. It can also be a heating system or a part of a heating system, such as a valve actuator. The actuator 10 can in particular be larger systems, such as a combustion engine or an (optionally hybridized) drive train of a motor vehicle, for example, or also a braking system.
The sensor 30 can be, for example, one or more video sensors and/or one or more radar sensors and/or one or more ultrasound sensors and/or one or more position sensors (for example GPS). Other sensors are also conceivable, for example a temperature sensor.
In another embodiment the actuator 10 can be a manufacturing robot, and the sensor can be an optical sensor 30, for example, which detects the properties of manufactured articles of the manufacturing robot.
The learning system 40 receives the output signal S from the sensor in an optional receiving unit 50 which converts the output signal S into a control variable x (alternatively, the output signal S can also be directly adopted as the control variable x). The control variable x can be a portion or further processing of the output signal S, for example. The control variable x is supplied to a controller 60, in which a control strategy π is implemented.
Parameters θ which are supplied to the controller 60 are stored in a parameter memory 70. The parameters θ parameterize the control strategy π. The parameters θ can be a singular parameter or a plurality of parameters.
A block 90 supplies the predefinable target variable xd to the controller 60. The block 90 can generate the predefinable target variable xd, for example depending on a sensor signal that is predefined to the block 90. It is also possible that the block 90 reads out the target variable xd from a dedicated storage region in which the variable is stored.
Depending on the control strategy π(θ) (and therefore depending on the parameters θ) of the target variable xd and the control variable x, the controller 60 generates a manipulated variable u. This manipulated variable can for example be determined depending on a difference x-xd between the control variable x and the target variable xd.
The controller 60 transmits the manipulated variable u to an output unit 80 which determines the control signal A from said variable. It is possible, for example, that the output unit first checks whether the manipulated variable u is in a predefinable value range. If this is the case, the control signal A is determined depending on the manipulated variable u, for example, by an associated control signal A being read out from a characteristic diagram depending on the manipulated variable u. This is the norm. If it is determined, however, that the manipulated variable u is not in the predefinable value range, the control signal A can therefore be designed such that it switches the actuator A into a protected mode.
The receiving unit 50 transmits the control variable x to a block 100. The controller 60 also transmits the corresponding manipulated variable u to the block 100. The block 100 stores the time series of the control variable x, which is received at a sequence of time points, and of the relevant corresponding manipulated variable u. The block 100 can then adapt model parameters Λ, σn, σf of the model g depending on said time series. The model parameters Λ, σn, σf are supplied to a block 110 which, for example, stores said parameters in a dedicated storage region. This is described below in
The learning system 40 comprises, in one embodiment, a computer 41 which has a machine-readable storage medium 42 on which a computer program is stored that, when carried out by the computer 41, prompts the computer to carry out the described functions of the learning system 40. In the embodiment, the computer 41 comprises a GPU 43.
The model g can be used to optimize the parameters θ of the control strategy π. This is illustrated schematically in
The block 120 transmits the model parameters Λ, σn, σf to a block 140 and a block 150. A block 130 determines a noise variance τ∈, and a maximum partitioning depth Lmax (for example by these values being predefined and being read out from dedicated storage regions in the memory), and transmits them to the block 140. The parameter memory 70 transmits parameters θ to the block 140, and the block 90 transmits the target value xd to the block 140.
The block 140 determines support points ξi and associated support weights wi from said values. One embodiment of the algorithm of said determination is illustrated in
The block 150 determines new parameters θ* from said points and weights. This is described in
The blocks shown in
The controller 60 then (1010) generates manipulated variables u depending on the control strategy π(θ), as described in
The block 100 receives and aggregates (1020) the time series of the manipulated variable u and control variable x which together in each case form a pair z comprising a control variable x and a manipulated variable x, z=(x1, . . . , xD, u1 . . . uF)T.
In this case D is the dimensionality of the control variable x, and F is the dimensionality of the manipulated variable u, i.e., x∈D, u∈F.
Depending on this state trajectory, a Gaussian process g is then (1030) adapted such that between successive time points t, t+1 the following applies:
xt+1=xt+g(xt,ut). (1)
In this case
ut=πθ(xt). (1′)
A covariance function k of the Gaussian process g is, for example, given by
k(z,w)=σf2 exp(−½(z−w)TΛ−1(z−w)). (2)
The parameter σf2 is in this case a signal variance, and Λ=diag(l12 . . . lD+f2) is a collection of squared length scales l12 . . . lD+F2 for each of the D+F input dimensions.
A covariance matrix K is defined by
K(Z,Z)i,j=k(zi,zj). (3)
The Gaussian process g is then characterized by two functions: by an average value p and a variance Var which are given by
μ(z*)=k(z*,Z)(K(Z,Z)+σn2I)−1y, (4)
Var(z*)=k(z*,z*)−k(z*,Z)(K(Z,Z)+σn2I)−1k(Z,z*). (5)
y is in this case given in the usual manner by yi=f(zi)+∈i, with white noise being ∈i.
The parameters Λ, σn, σf are then adapted to the pair (zi,yi) in the known manner, by a logarithmic marginal likelihood-function being maximized.
Support points ξi and associated support weights wi are then (1040) determined (as described in
A new optimal parameter θ* is then (1050) determined (as described in
The new optimal parameter θ* which is determined in this manner at least approximately solves the equation
In this case p*,θ denotes a stationary probability distribution toward which the system (illustrated in
The solution of equation (6) requires a solution to the equation
p*,θ(xt+1)=∫p(xt+1|xt,πθ(xt))p*,θ(xt)dxt. (7)
Due to the form of the integral kernel, this equation cannot be solved in a closed form.
The solution of this equation therefore has to be achieved by numeric approximation methods. This requires a sufficient precision be achieved without becoming computationally intensive. As a result, the method described in
P*,θ(xt+1)≈Σi=1Nwi·p(xt+1|xt=ξi,π(xt=ξi))p*,θ(ξi) (8)
and surprisingly achieves this aim.
The parameter θ is then (1060) replaced with the new parameter θ*.
It is then (1070) optionally checked whether the method of determining the parameter θ has converged. If this is not the case (“n”) there is a jump back to step 1010. If this is the case (“j”), however, optimal parameters θ have been found and the method is completed (1080). The method can naturally also be completed after a single iteration.
First (1500), basic functions ϕi(x) are determined as ϕi=g(ξi, πθ(ξi)) for each value of the index variables i=1 . . . N by means of the Gaussian process g and the support points ξi. Matrix entries Φi,j=ϕj(ξi) are then determined for all index variables i,j=1 . . . N. That is to say, the matrix entries Φi,j together form a transition matrix (Φ), each matrix entry Φi,j characterizing each probability given during Gaussian process g, such that the control variable x transitions from state x=ξj into state x=ξi.
Now (1510) the matrix
M=diag(w)Φ (8)
is determined by
diag(w)i,j=wiδi,j.
In this case the columns can also be normalized by replacing the entries of the matrix Mi,j with Mi,j/Σk=1N wkϕj(ξi).
Proceeding from the initial vector α0 the weight vectors α1, α2 . . . are then (1520) iteratively generated using
αt+1=Mαt, (9)
and are generated until the weight vectors which are generated in this manner converge, i.e., meet a predefinable convergence criteria, for example ∥αt+1−αt∥<ε∈ for a fixedly predefinable value ∈. The weight vector αt+1 which was last generated is the dominant eigenvector αθ∞ of the matrix M defined in the equation (8).
It has been specifically recognized that the matrix M is positive and stochastic (“stochastic” in this context meaning that the elements of each line summate to the value one), and that according to the Perron-Frobenius theorem exactly one eigenvector exists for a largest possible eigenvalue λ=1, such that the described method (in terms of numerical precision) always clearly converges.
The dominant eigenvector αθ∞ therefore characterizes by means of p*,θ(x)≈ΦTαθ∞, i.e., the eigenvector characterizes the representation of the stationary probability distribution p*,θ by means of the basic functions ϕi(x).
As the dominant eigenvector of the positive matrix M, αθ∞ can be differentiated by means of the parameters θ which parameterize the matrix M. A partial derivative
is therefore now (1540) estimated. This can, for example, be achieved by determining the corresponding dominant eigenvector αθ
The initial vector α0 can be set equal to the dominant eigenvector αθ∞ before step (1540), optionally in one step (1530), in order to improve convergence.
A gradient ascent method is then (1550) preferably used in order to vary θ in the direction of the parameter θ in the direction of a maximum value of Rπ∞(θ) according to formula (6), using the determined estimated value of the partial derivative
of the dominant eigenvector αθ∞. This is preferably carried out by means of the approximate equation
Rπ∞(θ)≈Σi=1Nαi,θ∞x˜ϕ
where αi,θ∞ denotes the components of the dominant eigenvector αθ∞.
It is then (1560) checked whether the method for determining the parameter θ has converged, for example, by checking whether the change of the parameter θ in step (1550) has fallen below a predefinable threshold value. If this is the case the method (1570) is completed. Otherwise, a new iteration begins in step (1500).
A partitioning of a state space X of all possible values of the control variable is first initialized. The partitioning can initially be selected as the trivial partitioning of the state space X, for example, i.e., the state space X is not divided, but is given by the whole state space X.
A counter s is initialized to the value s=1. The support points ξi are determined for the state space X according to a numerical quadrature rule (such as Kepler's barrel rule, the trapezoidal rule, Simpson's rule or the Gaussian quadrature, for example), so too are the associated support weights wi.
It is then (2010) checked whether the counters has reached the maximum partitioning depth Lmax. If this is the case the method is completed (2100) in the step.
Otherwise, the target value xd is assumed as value τ0′ for the control variable x, and a temporal evolution τ0′, τ1′ . . . τT′ is determined (2020) using formula (1), (1′).
A further value τ0 is then optionally also randomly selected for the control variable x according to the initial probability distribution p(x0), and a further temporal evolution τ0, τ1, . . . τT is determined (2030) analogously to step 2020 using formula (1), (1′).
A further counter l is then (2040) initialized to the value l=1, and it is checked (2050) whether the further counter l has reached the value of the counter s. If this is the case, step 2060 follows, in which the counter s is incremented by one, and there is a jump back to step 2010. If this is not the case, the variable ρl(τ) is determined (2070) which characterizes whether the density of the support points ξi is adequate. The density can be determined, for example, to
In this case Xl is the I-th partial volume element of the partitioning of the state space X, Vol(Xl) is the volume thereof and Nl is the number of support points ξi therein. It is then checked (2070) whether this variable is ρl(τ)<1, other threshold values than the value “1” also being possible.
If this is the case (“j”), a partial volume element Xl is split (2080) into a plurality of smaller partial volume elements, for example, by the partial volume element Xl being halved along one or along all of the dimensions thereof. The support points ξi, which are associated with the partial volume element Xl and associated support weights wi are then removed, and support points ξi and associated support weights wi are added for each of the smaller, newly generated partial volume elements. Step 2090 then follows, in which the further counter l is incremented by one. There is then a jump back to step 2050.
If the check in step 2070 reveals that the requirement has not been met (“n”), step 2090 follows immediately.
Number | Date | Country | Kind |
---|---|---|---|
10 2017 218 813.8 | Oct 2017 | DE | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2018/071742 | 8/10/2018 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2019/076511 | 4/25/2019 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
20040024750 | Ulyanov | Feb 2004 | A1 |
20040198268 | Rashev | Oct 2004 | A1 |
20090125274 | Waldock | May 2009 | A1 |
20170200089 | Huang | Jul 2017 | A1 |
Number | Date | Country |
---|---|---|
102013212889 | Jan 2015 | DE |
102013212889 | Jan 2015 | DE |
102017211209 | Jan 2019 | DE |
3062176 | Aug 2016 | EP |
Entry |
---|
International Search Report and Written Opinion dated Nov. 7, 2018 in international patent application No. PCT/EP2018/071742, 9 pages. |
Tom Erez et al., “Infinite-Horizon Model Predictive Control for Periodic Tasks with Contacts,” Robotics: Science and Systems VIII, Jun. 27, 2011, 8 pages. DOI: 10.15607/RSS.2011.VIII.010, ISBN: 978-0-262-51770-9. XP055512502. |
Number | Date | Country | |
---|---|---|---|
20210191347 A1 | Jun 2021 | US |