The present application claims the benefit under 35 U.S.C. § 119 of German Patent Application No. DE 10 2023 208 510.0 filed on Sep. 4, 2023, which is expressly incorporated herein by reference in its entirety.
The present invention relates to a method for configuring a technical system to be configured.
In production processes and machining processes (e.g., drilling, milling, heat treatment, etc.), process parameters such as a process temperature, a process time, a vacuum or a gas atmosphere etc., are set such that desired properties, such as hardness, strength, thermal conductivity, electrical conductivity, density, microstructure, macrostructure, chemical composition, etc., of a workpiece are achieved. The process parameters can be ascertained by model-based optimization methods, such as Bayesian optimization methods. Here, a model for the production or machining process can be ascertained based on measurement data. However, this can require large quantities of measurement data and therefore high expenditure (e.g., time expenditure and/or costs). This expenditure can be reduced by ascertaining the model based on a model already learned that describes a process related to the production or machining process in conjunction with the measurement data (also referred to as transfer learning). For example, both models can describe drilling or milling on different machines (and with comparable process parameters). The model already learned can serve as a basis for the model to be learned and therefore reduce the required quantity of measurement data. Efficient procedures are desirable for this purpose.
According to various example embodiments of the present invention, a method for configuring a technical system to be configured is provided, comprising:
The technical reference systems can be very similar to the system to be configured, to the extent that they differ in variations caused by the fact that the system to be configured and a reference system are physically the same system, but are operated at different times (as a result of which differences arise, e.g., due to temperature, air pressure, available bandwidth or computing power, etc.).
The detection of observations for the technical system to be configured does not necessarily have to take place after the reference system models have been conditioned, but can also take place at least partially beforehand. The conditioning of the a priori model can also include successive conditioning on the basis of successive observations (e.g., evaluations according to an acquisition function).
The method described above can be used in particular to optimize black box functions. Black box functions are functions in which the gradients of the function are not available. Therefore, optimization algorithms based on gradient descent are not directly applicable. In addition, functional evaluations can be distorted by noise (e.g., measurement noise). However, data from related black box functions is often available to support the optimization, e.g., data from previous optimizations for related tasks. The method described above enables efficient use of such so-called metadata in the optimization process (in terms of data efficiency, computational complexity and scalability in the number of meta-tasks). In particular, a “warm start” of a Bayesian optimization of a function on the basis of metadata is made possible.
One possible application of the method is parameter optimization in an industrial environment, e.g., the optimization of a physical (production) process. A typical example is a laser welding process, with which two workpieces are to be joined together using a laser beam by temporally and locally melting the workpieces. Here, parameters such as the laser power and laser spot diameter are exemplary process parameters that need to be optimized. For such an optimization, it is typically necessary to execute the process and analyze the results at different parameter settings in costly and time-consuming experiments. Therefore, it is desirable to select the parameters at which these experiments are performed in an informed manner and to use the available information efficiently and effectively, with the goal of achieving a sufficiently good parameter setting with as few experiments as possible. Other applications are, for example, the optimization of hyperparameters in an algorithm (e.g., the number of hidden layers in a neural network) or the efficient calibration (and thus configuration) of a physical (in particular a technical) system.
The method can also be used for active learning. Active learning typically aims to model one or more target variables over an entire subset of the parameter space. Such a model can then be used to, e.g., operate a system. For example, if the relationship between the input voltage and the rotational speed of a motor is known, the voltage can be set by model inversion, so that a desired rotational speed is achieved.
Various exemplary embodiments of the present invention are specified below.
Exemplary embodiment 1 is a method for configuring a technical system to be configured, as described above.
Exemplary embodiment 2 is a method according to exemplary embodiment 1, wherein adjusting the a priori model comprises adjusting the weights of the weighted combination of the conditioned reference system models.
In this way, account can be taken of how well findings from one of the related configuration tasks (i.e., the configuration of the reference systems) can be transferred to the configuration of the technical system to be configured.
Exemplary embodiment 3 is a method according to exemplary embodiment 1 or 2, wherein the conditioned reference system models and the a priori model for the system to be configured are Gaussian processes.
This enables efficient modeling, conditioning and combination of the reference system models to form the a priori model for the system to be configured.
Exemplary embodiment 4 is a method according to exemplary embodiment 3, wherein the covariance function (also referred to herein by the usual term “kernel”) of the a priori model consists of a weighted sum of the covariances of the reference system models (i.e., the weighted combination includes the weighted sum of the covariances, i.e. the weights of the weighted combination determine the weights of the weighted sum of the covariances) and a residual covariance function, wherein adjusting the a priori model comprises adjusting the residual covariance function.
Thus, relationships for the technical system to be configured that are not included in the reference system models can be modeled.
The mean value of the a priori model is given, for example, by a weighted sum of the mean values of the reference system models (wherein the weight of the mean value of a reference system model and the weight of the covariance of the reference system model can depend on each other, e.g. the weight of the covariance of the reference system model is the square of the weight of the mean value of a reference system model).
Exemplary embodiment 5 is a method according to exemplary embodiment 3 or 4, wherein the weight of a first reference system model of the reference system models is ascertained by projecting the observations detected for the technical system to be configured onto the mean value of the first reference system model, and the weight of a second reference system model of the reference system models is ascertained by projecting a residual of the observations detected for the technical system to be configured (i.e., minus the part described by the projection of the observations onto the mean value of the first reference system model).
This enables the weights to be ascertained with significantly less effort than a likelihood estimate, in particular for a large number of reference systems, and thus more rapid training.
For a further reference system model, the remaining part of the observation (which is not yet covered by the previous projection) can then be projected onto the mean value of the reference system model (see Algorithm 2 below).
Exemplary embodiment 6 is a method according to one of exemplary embodiments 1 to 5, comprising, for each reference system, generating the reference system model by adjusting a relevant reference a priori model to the observations detected for the reference system (e.g., by means of a likelihood approach).
This increases the quality of the reference system models. The conditioning of the reference a priori models on the respective observations then provides a posteriori models for the reference systems.
Exemplary embodiment 7 is a data processing device (in particular a control device) that is designed to perform a method according to one of exemplary embodiments 1 to 6.
Exemplary embodiment 8 is a computer program comprising commands that, when executed by a processor, cause the processor to perform a method according to one of exemplary embodiments 1 to 6.
Exemplary embodiment 9 is a computer-readable medium storing commands that, when executed by a processor, cause the processor to perform a method according to one of exemplary embodiments 1 to 6.
In the figures, similar reference signs generally refer to the same parts throughout the various views. The figures are not necessarily true to scale, with emphasis instead generally being placed on the representation of the principles of the present invention. In the following description, various aspects are described with reference to the figures.
The following detailed description relates to the figures, which show, by way of explanation, specific details and aspects of this disclosure in which the present invention can be executed.
Other aspects may be used and structural, logical, and electrical changes may be performed without departing from the scope of the present invention. The various aspects of this disclosure are not necessarily mutually exclusive, since some aspects of this disclosure may be combined with one or more other aspects of this disclosure to form new aspects.
Various examples are described in more detail below.
The physical or chemical process can be any type of technical process, such as a manufacturing process (e.g., producing a product or intermediate product), a machining process (e.g., machining a workpiece), a control process (e.g., moving a robot arm) or a measuring process. Such a physical process typically has to be controlled, in particular configured (set), e.g. a measuring apparatus has to be calibrated, etc. For example, it may be necessary to set various control variables of an apparatus (e.g., as part of a calibration) in order to perform a physical or chemical process. For example, the physical or chemical process in a heat treatment by means of a furnace can require a calibration of the furnace temperature and/or the vacuum. The corresponding configuration of the apparatus 108 (i.e., the performing of the corresponding configuration task) is effected by a control device 106.
Two physical or chemical processes and thus the respective configuration tasks can be related to one another in different ways. For example, it can be substantially the same process twice, such as the drilling or milling of components, but executed by means of different machines. Even the same process on different machines can lead to individual results. Two processes executed on the same machine can also be related to one another. For example, one process can be drilling a metal component and another process can be drilling a ceramic component. In general, two processes or configuration tasks can be related to one another if the input variables of the relevant process or configuration task overlap at least partially and the output variables of the relevant process or configuration task overlap at least partially. Illustratively, two processes related to one another can have one or more identical input variables (e.g., a process temperature, a process time and/or a vacuum pressure in the case of a heat treatment), which are set in the relevant configuration task, and one or more identical output variables (e.g., a hardness, strength, density, microstructure, macrostructure and/or chemical composition in the case of a heat treatment). Two processes or configuration tasks are related to one another if their respective models are suitable for transfer learning.
In the following, it is assumed that the control device 106 is to execute a target (configuration) task, e.g. a configuration for a certain physical or chemical process to be performed by the apparatus 108, and that one or more processes and configuration tasks related to this target task have already been executed by the system or also by one or more other (at least similar) systems. For example, these are physical or chemical processes related to the certain physical or chemical process, and the corresponding configuration tasks are the configurations (e.g., of the apparatus 108) for performing these processes. However, a configuration task can also be (instead of setting parameters for a physical or chemical process), for example, the configuration of a relevant machine learning model (i.e., setting hyperparameters, e.g. of a neural network). Accordingly, the target task is generally referred to as a (target) configuration task for a technical system (wherein the configuration can refer to process parameters or hyperparameters for a machine learning model, etc.). The associated (i.e. related) target tasks can be generally related (configuration) tasks (for other (“related”) technical systems, also referred to herein as “reference” systems) or also so-called “meta” or “reference” (configuration) tasks (for related processes as described above or also the earlier configuration of neural networks, e.g. for similar tasks (e.g., classification of other types of objects than in the target task) etc.).
Accordingly, the apparatus 108 can be a machine, but also a data processing apparatus or part thereof (e.g., a program part that implements a neural network), etc.
The control device 106 is configured to control the first apparatus 108 according to a relevant (provided) input parameter value 102 of at least one (i.e., exactly one or more than one) input variable (e.g., temperature, exposure time but also hyperparameters etc.). Therefore, an input parameter value 102 is also understood herein to be a vector with values that includes values for a plurality of settable variables (e.g., process parameters).
Illustratively, the control device 106 can, for example, control an interaction of the apparatus 108 with the environment according to the input parameter value 102.
The term “control device” (also referred to as “controller”) can be understood as any type of logical implementation unit that can include, for example, a circuit and/or a processor capable of executing software, firmware, or a combination thereof stored in a storage medium, and issue the instructions, e.g., to an apparatus for executing a process in the present example. The control device can be configured, for example, by means of program code (e.g., software), to control the operation and/or setting (e.g., calibration) of a system, such as a production system, a processing system, a robot, etc.
An input parameter value as used herein can be a parameter value describing an input variable such as a physical or chemical variable, an applied voltage, an opening of a valve, etc. For example, the input variable can be a process-relevant property of one or more materials, such as hardness, thermal conductivity, electrical conductivity, density, microstructure, macrostructure, chemical composition, etc. However, as described above, the input parameter value can also be a hyperparameter (e.g., a number of layers of a neural network or their size) or the like.
During or after execution of the target task or the relevant related task according to the respective input parameter values 102, a result of the relevant task is ascertained.
For this purpose, the system 100 can have one or more sensors 110, for example. The one or more first sensors 110 can be designed to detect a result of the target task, in particular of a physical or chemical process. A result of the process can, for example, be a property of a produced product or machined workpiece (e.g., a hardness, strength, density, microstructure, macrostructure, chemical composition, etc.), a success or failure of a skill (e.g., picking up an object) of a robot, a resolution of an image recorded by means of a camera, etc. The result of the process can be described by means of at least one (i.e., exactly one or more than one) output variable. The one or more first sensors 110 can be designed to detect the at least one output variable and thus ascertain a result value 112. Like the input parameter value, this can be a vector with a plurality of components; for example, a relevant value for each output variable of a plurality of output variables can be detected.
The detection of a result of a process by means of one or more sensors as described herein can take place while the process is executed (in situ) and/or after the process is executed (ex situ). For example, the model can describe the relationship between one or more input variables and at least two output variables, it being possible to detect an output value of an output variable of the at least two output variables during the process and an output value of the other output variable of the at least two output variables after the process has been executed. As an illustrative example of detecting the output value after the process has been executed, the process can be hardening a workpiece in a furnace with a temperature as an input variable. In this case, the output variable can be a hardness of the workpiece at room temperature after the hardening process. The output variable can have an application-specific quality criterion. The output variable can be a component-related parameter, such as a measure or a layer thickness, or can be a material-related parameter, such as hardness, thermal conductivity, electrical conductivity, density, chemical composition, etc.
In the case where the relevant task is the configuration of a relevant neural network, the output value 112 can also be ascertained without sensors, for example by assessing the accuracy of the neural network that has been configured according to the relevant input parameter value.
A result value can be a value describing an output variable of the process. An output variable of the process can be a property of a product, workpiece, recorded image or another result. However, an output variable of the process can also be a success or failure (e.g., of a skill of a robot). Illustratively, the result value 112 results from the input parameter value 102.
However, the relationship between the input parameter value 102 and result value 112 is typically very complex and unknown. A pair consisting of an input parameter value 102 and associated result value 112 forms an observation (or “data point”) 130.
If x designates the input parameter value 102 and y designates the result value 112, then for the target task (indexed with the index t for “target”) y=ft(x), and for the related tasks (indexed with indices m for “meta”) y=fm(x), where both ft and fm are unknown.
A typical task is now to maximize ft (wherein in the following it is assumed that the result value has only one real component), i.e. to find an x for which the result value is as good as possible (e.g., maximum). In order to increase the efficiency of such an optimization, according to various embodiments, knowledge of the related tasks is used, i.e. so-called “meta-learning” (or “transfer learning”) is used, which can drastically reduce the number of experiments required for the target task. It should be noted that the goal need not necessarily be to optimize ft, but it may also be desirable to find a model for ft, for which the approach described below can also be used. However, it is assumed in the following that ft is to be optimized (specifically maximized).
Thus, in the following, a meta-learning situation is considered in which the goal is to efficiently maximize a function ft: D→ (where D is a relevant space of possible input parameter values for the configuration), where ft is unknown (e.g., is a sample from an unknown distribution of functions). In order to maximize ft, parameters (i.e., input parameter values) xn can be sequentially evaluated to obtain noisy observations yn=ft(xn)+ωn, where ωn˜N(0,σt2) represents Gaussian noise with a mean value of zero and is independently and identically distributed. It should be noted that the approach described herein is not restricted to applications in which this assumption is fulfilled. In practice, this assumption is usually violated, but the regression method with Gaussian processes often works well nonetheless. For example, this assumption is also made for linear regression by means of the least squares method.
For this purpose, Nt observations from previous executions of the target task ={xn,yn}n=1N
(m,k). A Gaussian process models different function values via a common Gaussian distribution, which is parameterized by a mean value vector and a covariance matrix.
The a posteriori Gaussian process (hereafter simply referred to as posterior), which is obtained by conditioning an a priori Gaussian process (hereafter simply referred to as prior) for ft with mean value function m(•) and kernel (i.e. covariance function) k(•, •) on data t, is a Gaussian process with a mean value and a covariance given by
where Xt=(x1, . . . , xNt) and yt=(y1, . . . , yNt) is the vector of the corresponding noisy result values.
With Bayesian optimization, the posterior, characterized by (1), is used to sequentially query the result to form a new parameter xNt+1, which provides information about the optimum of ft by solving an auxiliary optimization problem on the basis of an acquisition function α,
The performance of this approach typically depends on the quality of the prior. According to various embodiments, in particular, an approach is provided that enables a prior with good quality to be ascertained. Observations (“metadata”) from the related tasks are used for this purpose. The relationship between the input parameter 102 and result value 112 for the related tasks is given by functions fm (which, e.g., originate from the same distribution of functions as ft).
It is therefore assumed that for each related task (index m, m∈={1, . . . , M}) there is access to a relevant data set
m={xm,n, ym,n}n=1N
(0,σm2) (but this is not necessary). Together, these data sets form the metadata
1:M=
.
This metadata can be incorporated into Gaussian process modeling by considering a common model across the target task and related tasks. Such a multi-task GP model (MTGP) is defined by an extended kernel that additionally models similarities between the tasks,
where km are arbitrary kernel functions and Wm are positive semidefinite matrices that are called coregionalization matrices, since their entries [Wm](ν,ν′) model the covariances between two tasks ν and ν′. By means of conditioning (i.e., “determining” in the sense of a conditional probability) this common model on 1:M and
t, a narrower posterior for ft can be obtained by the GP equations of (1). However, these multi-task GP models are both computationally intensive (cubic in the number of all meta and task points) and difficult to train in practice due to the large number of hyperparameters.
For the sake of simplicity, the target task is referred to below as the M+1-th task, i.e. t=M+1. A key challenge in learning an MTGP model with (3) is that learning covariances on the basis of few data points is difficult, which generally leads to poor performance. For this reason, the joint use of hyperparameters is often introduced in practice. However, this does not reduce the computing costs for evaluating the model.
The following describes an MTGP meta-learning model that can be both efficiently trained and evaluated. Instead of the typical parameter partitioning, two assumptions are introduced for the common GP model of (3), which restrict learning to the most important covariances and lead to a modular GP posterior model that can be evaluated efficiently.
First, correlations between the meta-tasks are neglected, since it can be difficult (or even impossible) to learn them if only a small amount of metadata is available for each meta-task. For the sake of simplicity, the covariance between the tasks used is written Cov(fm, fm′)=c for one c≥0 instead of the more explicit Cov (fm(x), fm′ (x′))=c km(x, x′).
Assumption 1: Cov (f1:M, f1:M)=I.
Note that although this limits the transfer of information between the meta-tasks, each meta-task can still influence the target task. Consequently, after conditioning on the observations of the target tasks t, the meta-tasks can still be correlated. The assumption that Cov(fm, fm)=1 (for m=1 . . . M) ensures that each kernel km in (3) models the marginal distribution for the model of the corresponding meta-task fm. Taken together, these two properties are crucial for being able to efficiently learn the provided model even for a large number M of meta-tasks, since it enables models to be learned independently for each meta-task, which are then combined to form a prior for the target task.
Second, the model provided is constrained to the effect that it is assumed that ft is additive in the meta-task functions, i.e., additive in functions that (anti-)correlate perfectly with the meta-task models. The short notation Corr (fm, fm′)=Corr(fm(x), fm′ (x)) is used for the correlation coefficients.
Assumption 2: The function ft can be written as follows:
f
t
={tilde over (f)}
t
+
{tilde over (f)}
m, where
|Corr({tilde over (f)}m,fm)|=1, Cov({tilde over (f)}t,ft)=1 and Cov({tilde over (f)}t,fm)=0 for all m∈.
In particular, for the MTGP model in (3), Cov(fm,{tilde over (f)}m)=[Wm](m,t).
By constraining the components {tilde over (f)}m of the target task model so that they correlate perfectly with the meta-task models, the intuition that parts of the meta-task functions are to be reflected in the target task is modeled directly. As a result, only the scalings of these functions remain as free parameters that can be learned.
This ensures that the model provided has a structured prior for the target task. The residual component {tilde over (f)}t is independent of the meta-tasks and aims to model all parts of the target task that cannot be explained by the metadata.
Together, assumptions 1 and 2 force a structuring of the coregionalization matrices in (3). Assumption 1 forces that [Wm](m,m′) is zero if m≠m′, while assumption 2 directly forces [W]m,m=1, so that the variation of each meta-task is directly modeled by the corresponding kernel km. The assumption about the correlation additionally leads to matrices Wm which are characterized by a single, scalar and unconstrained parameter wm∈ for each meta-task m∈
. Specifically, the matrix elements are all zero except
For a meta-task M=1, for example, the following applies:
Based on this example, it is easy to verify that both matrices are positive semidefinite and that
While, with the model provided, the function of the meta-task fm and the corresponding component {tilde over (f)}m in the function of the target task ft are restricted to a perfect correlation, the size of wm determines the extent to which the meta-task is relevant for the target task: The prior for ft is given by μ1 and w12k1(⋅,⋅)+kt(⋅,⋅) and in the limit w1→0 they are modeled as independent. The same reasoning applies to a plurality of tasks, since the assumptions result in a valid kernel; specifically, it can be shown:
Assumptions 1 and 2 with wm∈ for m∈
provide a valid multi-task kernel, which is given by
where gm(ν) equals wm if ν=t, is one if ν=m, and zero otherwise. 5 is the Dirac delta. According to (5), a valid common kernel is thus available via meta-tasks and target task, which kernel is parameterized by scalar parameters (weights) wm∈. Although this successfully limits the number of parameters, the resulting model generally has the same inference complexity as any other MTGP. However, the common kernel in (5) additionally leads to a specific posterior distribution that enables each meta-task to be modeled as a separate Gaussian process and evaluated efficiently. The following can be shown:
With a Gaussian process prior with a mean value of zero and a multi-task kernel given by (5), the posterior conditioned on the meta-data is given by ft|1:M˜
(mScaML,ΣScaML) with
where μm(x) and Σm(x, x′) are the mean values and covariances of the posterior of the relevant meta-task according to (1) (with indices m instead of t, since (1) is formulated for the target task, but (6) refers to the meta-task), which only depend on m.
Thus, each meta-task m can be modeled with an individual Gaussian process on the basis of a kernel km, and the a posteriori mean value μm and the a posteriori covariance Σm per meta-task can be ascertained by conditioning the relevant Gaussian process on (only) the meta-data of the relevant meta-task m. The resulting GP prior distribution for the target task ft is given by the weighted sum of the meta-task posteriors according to (6). It is to be noted that the result of the expensive O(Nm3) inversion of the kernel matrix can be temporarily stored per meta task, since it only depends on the fixed metadata
m. Consequently, this approach is suitable for parallelization and efficient evaluation.
A prior is thus obtained for the target task, which can be conditioned by (1) on t in order to obtain the posterior for the target task.
This enables not only a complete Bayesian treatment of the uncertainty, but also the determination of the meta-task weights wm by a maximum likelihood method.
The individual meta-task posteriors 201 are combined in a weighted sum with the test task kernel kt in order to obtain a target task prior 202 according to (6). This prior is conditioned according to (1) on the target task data t in order to obtain the posterior 203 for the target task.
In the above example, it was assumed that the kernel hyperparameters θm are given in the various task kernel functions km and the target task hyperparameters θt, which include the parameters of kt and the weights wm. However, according to various embodiments, they are determined in practice from the data 1:M and
t by means of likelihood estimation. The direct evaluation of the likelihood of the observed data of the common task model according to (5) is computationally intensive, O((Nt+M
, since it depends on the data of all tasks. However, any model that fulfills assumption 1 can exploit the model structure in order to scale the number of meta-tasks M, since
wherein the second term is the sum of the logarithmic probabilities of the respective meta-data belonging to a meta-task under the relevant meta-task model parameterized by θm, which sum can be calculated with the effort O(MNm3), while the first term is the likelihood of the prior of the target task. In view of the already inverted meta-task kernel matrices, the calculation of the posterior meta-task covariances on the target task input values Xt is associated with the complexity O(M(Nt2
Accordingly, according to various embodiments, the provided model is modularized by assuming a conditional independence between θm and θt:
Assumption 3: The following applies to all meta-tasks m∈ p(θm|
m,
t)=p(θm|
m).
While the independence between the meta-tasks is given by assumption 1, assumption 3 goes one step further and allows the meta-task hyperparameters θm to be derived independently from the target task hyperparameters θt. Thus, the meta-task Gaussian processes can be optimized (i.e., “fitted”) in parallel on the basis of (only) their individual data, according to
If a relevant meta-task Gaussian process was fitted to the data m of a relevant meta-task, the posterior mean value μm(Xt) and the covariance matrix ∈m(Xt,Xt) of the meta-task Gaussian process for the target task input parameters Xt are calculated and temporarily stored. Since the prior for the target task according to (6) depends on these variables and these do not depend on θt due to assumption 3, the parameters θt of the model for the target task can then be ascertained by means of likelihood estimation:
The effort required for this is O(MNt2+Nt3), which can be assessed as favorable, especially in the practically relevant range of M≈Nt. Since the meta-task Gaussian processes are independent of the target task, they can be calculated once and reused for each new target task. Taken together, this enables scalable meta-learning with Gaussian processes.
Algorithm 1 summarizes the training of the model for ft.
1:M =
m
m)
t according to (1)
The training begins in line 2 with the training of an individual Gaussian process model for each meta-task, which are combined in line 3 to form a target task prior. On this basis, the hyperparameters for the target task are optimized in a cost-effective manner in line 4 with the aid of (7). Finally, the posterior is ascertained in line 5 by means of conditioning on t according to (1).
As described above, the equations (6) thus provide a scalable and structured way to transfer information from meta-tasks into a new prior for a target task. The key component for this can be seen in assumption 2, which is based on an additive model. While these models reflect many real-life situations, more flexible meta-learning models based on neural networks are in principle capable of learning more complex relationships between the meta and target tasks. However, by relaxing the model assumptions, these methods also require significantly more data. The approach described above is therefore best suited when the amount of data per task is relatively small. While the overall approach scales linearly with the number of meta-tasks and enables parallel optimization, each individual model is still a standard Gaussian process that scales cubically with the number of data points per task. For a large number of data points per task, Nm or Nt, scalable GP approximations can be used to obtain an efficient inference.
According to one embodiment, a direct calculation of the weights wm∈ (instead of as part of the likelihood estimation according to (7) or instead of using a separate optimization problem (in the space spanned by the weights)) is performed.
This can be particularly advantageous if there are a large number of meta-tasks and/or a large number of data points for each meta-task. The weights found in this way can then either be used directly or serve as a starting point for a local optimization of the weights.
One approach to this is to consider the posterior mean values of the meta-task models evaluated on the target task input parameter values, μm(Xt), m∈, as a basis in the space that is spanned by the possible observations of the target task. If this basis were orthonormal, one could simply project the observed target task results yt onto this basis and obtain the weights as the portion of the target data vector expressed by the corresponding basis
However, μm(Xt), m∈ are generally not orthogonal and therefore not orthonormal. However, this problem can be solved with a simple algorithm, as described below.
It starts with the introduction of a vectorial auxiliary variable y=yt.
In a first step, w1=y·{circumflex over (μ)}1(Xt) is determined, where {circumflex over (μ)}1(Xt)=μ1(Xt)/∥μ1(X1)∥. In a second step, the part of y that is described by this first projection is subtracted: The result of this is a residual y→y−w1{circumflex over (μ)}1(Xt). These two steps are repeated for all remaining μm(Xt), m∈ or until a norm of the auxiliary variable falls below a previously defined value. This value can be given in particular by a desired model accuracy. The norm can be given in particular by the L2 or the supremum norm.
Since two basis vectors usually overlap strongly, the weights of the first meta-task models obtained with this method tend to be much larger than the weights corresponding to m>>1. In order to obtain balanced weights, this method can be repeated several times with different orders of the μm(Xt) and then the average of the weights can be calculated.
Algorithm 2 summarizes the procedure for the direct calculation (or estimation) of the weights. The usual keywords for, do and end are used.
As described above, the approach described above can be used to optimize the configuration of a target system 100 (e.g., an apparatus 108 such as a machine). The following is an example of a sequence for optimizing and operating such a target system.
Within the framework of active learning, the sequence remains substantially the same. The acquisition function in step 8 is selected in such a way that it minimizes the uncertainty of the target model. For example, the acquisition function of the upper confidence limit with a very large exploration margin or more sophisticated acquisition functions such as the entropy of the replacement model could be used. In step 13, instead of selecting the best parameter, the replacement model itself is returned and then used for the desired applications, e.g. to control a motor on the basis of this model.
In summary, according to various embodiments, a method as shown in
In 301, for each one or more technical reference systems, reference observations of results of the reference system for different values of configuration parameters are detected (e.g., measured). Here, it is to be noted that the values of the configuration parameters (and their number) on which the reference observations are based can be different for each reference system. For example, if there were only one configuration parameter p1, reference system 1 could have been evaluated for p1={0.1; 0.2; 0.5} and reference system 2 for p1={0.3; 0.6}.
In 302, for each reference system, a relevant reference system model for the relationship between the values of the configuration parameters and the results provided by the reference system is conditioned on the reference observations detected for the reference system.
In 303, observations of results of the technical system to be configured are detected (e.g., measured) for different values of the configuration parameters for the technical system to be configured.
In 304, an a priori model for the relationship between the values of the configuration parameters and the results provided by the technical system to be configured is adjusted to the observations detected for the technical system to be configured, wherein the a priori model is formed from a weighted combination of the conditioned reference system models (e.g., as in the example above, supplemented by a residual term kt, see equation (5)) (i.e., the a priori model includes the weighted combination of the conditioned reference system models).
In 305, an a posteriori model is conditioned for the relationship between the values of the configuration parameters (considered as configuration parameters for the technical system to be configured—for the whole range of values of the configuration parameters (i.e., not only for the different values of the configuration parameters for which observations of the results have been detected)) and the results provided by the technical system to be configured by conditioning the adjusted a priori model on the observations detected for the technical system to be configured.
In 306, the technical system to be configured is configured using the ascertained a posteriori model.
The results can be detected in the form of one or more output variables. In the case of a plurality of output variables, a separate model can be created for each output variable and the union of these models can then be used as a model. A common goal can be defined in order to consider different goals together in one acquisition function. For example, a cost function is defined, which is then to be minimized by setting the input parameters with the aid of an acquisition function. This cost function combines all target variables in a single scalar function. The cost function can be a sum, for example. Target variables that are to be minimized are included in this sum with a positive sign. Target variables that are to be maximized, accordingly have a negative sign. For target variables that are to reach a certain value, a distance between the target variable and the desired value is included in the cost function. The individual contributions can then be weighted.
The method of
The method is therefore in particular computer-implemented according to various embodiments.
Various embodiments can receive and use time series of sensor data from various sensors such as video, radar, LiDAR, ultrasound, motion, thermal imaging, currents, voltages, temperatures, etc. (e.g., for the observations). Based on the sensor data, various technical systems, physical devices etc. can be configured, e.g. by a relevant control device.
Number | Date | Country | Kind |
---|---|---|---|
10 2023 208 510.0 | Sep 2023 | DE | national |