The present invention relates to a method and a system for validating a trained artificial neural network (ANN) on the basis of a test data set.
Trained artificial neural networks (ANNs) usually have a high degree of redundancy in their weights. This means that many weights are not absolutely necessary to allow efficient functioning of the neural network, in particular in the case of inference. Since the number of weights strongly influences the complexity of a network architecture underlying the artificial neural network and means that, as the number of weights increases, more and more computing power is required to run the artificial neural network, but also to train it, there is a need to keep the number of weights per layer of the network as low as possible, while still ensuring that the ANN functions efficiently.
In order to allow an evaluation of the quality and/or performance of ANNs, validation methods are desirable.
An object of the present invention is to provide a method and a system for validating a trained artificial neural network (ANN) on the basis of a test data set.
The object may be achieved by a computer-implemented method for validating a trained artificial neural network (ANN) on the basis of a test data set according to certain features of the present invention. The object may be achieved by a system for validating a trained artificial neural network (ANN) on the basis of a test data set according to certain features of the present invention.
In the present case, a method for validating a trained artificial neural network (ANN) on the basis of a test data set is specified. According to an example embodiment of the present invention, the method comprises the following steps:
The present invention generates a partitioning of the input space into cells. In particular, the predicted class is constant in each cell. The present invention checks whether a data point is present in each cell. If this is the case, the algorithm or validation method preferably ends. If this is not the case, the simulation model is given the parameters of the cells without a data point. The simulation model then preferably generates at least one new data point in the cell(s) concerned. If necessary, the ANN is retrained iteratively with the newly generated data or data points. Preferably, new test data are generated, on the basis of which the retrained ANN can be evaluated or validated again.
It is understood that the steps according to the present invention as well as other optional steps do not necessarily have to be carried out in the order shown, but can also be carried out in a different order. Other intermediate steps can also be provided. The individual steps can also comprise one or more sub-steps without departing from the scope of the method according to the present invention.
In other words, the method according to the present invention partitions the d-dimensional input space of the ANN into a finite number of cells (“faces”). In or within each of these cells, an output of the ANN is preferably constant if the function of the ANN is to solve a classification task. In or within each of these cells, an output of the ANN is preferably linear if the function of the ANN is to solve a regression task. Pruning of the ANN takes place by means of the method.
A hyperplane is a mathematical term defined in a multidimensional space. In a two-dimensional space, a hyperplane is a straight line; in a three-dimensional space, it is a plane. In general, a hyperplane in an n-dimensional space is an (n−1)-dimensional face that divides the space into two parts. In ANNs, hyperplanes are used to define classes or decision boundaries by dividing the space into regions corresponding to different categories or states.
A hyperset is an extension of the concept of a hyperplane. Instead of a single hyperplane, a hyperset is a collection of hyperplanes in the space. These hyperplanes can be used as a whole to define complex spatial structures. Hypersets are used in ANNs to model more complex decision boundaries that cannot be described simply by single hyperplanes.
The input space (also called feature space) refers to the space in which the input data of a neural network exist. Each dimension of the input space corresponds to a specific feature or property of the input data. For example, the pixel values of an image could form the input space in an image classification network.
The output space refers to the space of the output values of a neural network. This space depends on the type of network. In a classification network with multiple classes, the output space could contain vectors representing the probabilities for each class. In a neural network for regression, the output space could represent numerical values.
In a preferred embodiment of the present invention, the ANN has a d-dimensional number of input variables. Such input variables can comprise sensor signals, for example. The input variables can be preprocessed, for example normalized and/or transformed from a time domain to a frequency domain or from a frequency domain to a time domain and/or filtered and/or smoothed or similar. The term “d-dimensional” means that preferably d different signals are available as input variables. Thus, an input space of dimension d is spanned. The network architecture further has a plurality of bias terms. In an ANN, the bias term is preferably a constant that is added to the weighted inputs of each neuron before the, preferably non-linear, activation function is applied. It can be interpreted as a kind of shift that influences the activation of the neuron. The bias helps to control the activation range and the overall ability of the network to adapt to the data. The bias term allows a neuron to be activated even if all the weighted inputs are equal to zero. This allows the network to respond more flexibly to different patterns and relationships in the data. In a multilayer neural network, each neuron typically has its own bias value. Partitioning the d-dimensional input space into a plurality of cells on the basis of the network architecture comprises at least the following steps: determining hyperplanes and/or hypersets depending on the input variables, at least a subset of the plurality of weights and a subset of the plurality of bias terms. How exactly the partitioning of the input space as well as that of the subsequent spaces of the network layers is carried out is described in detail in the description of the drawings, to which explicit reference is made here.
In a preferred embodiment of the present invention, if at least one data point of the test data set is present in each of the cells, a validation result of the ANN is provided, on the basis of which a quality and/or goodness and/or performance of the ANN can be determined and/or evaluated. On the basis of the validation result, it can preferably be decided whether the ANN can be used in the inference, for example by comparing the validation result with a predetermined threshold value. This increases the performance and reliability of the ANN. In a preferred embodiment, the ANN is retrained on the basis of the completed test data set. Through retraining, the performance of the ANN is sustainably improved. This results in a reduced error rate in the inference of the ANN.
In a preferred embodiment of the present invention, an extended and/or new test data set is provided for the retrained ANN, on the basis of which new test data set another validation of the retrained ANN is carried out. In this way, the performance of the ANN can be improved continuously and/or iteratively.
In a preferred embodiment of the present invention, the retraining is carried out iteratively.
In a preferred embodiment of the present invention, the following applies to the d-dimensional input space: d=10, preferably d<10, particularly preferably d<6, where d is an element of the positive, natural number set. Preferably, the ANN is therefore a small-dimensional network. For ANNs of higher dimension, partitioning would only be possible with a disproportionately higher computational effort due to the large number of input variables.
In a preferred embodiment of the present invention, the ANN is part of a complex machine learning model. This allows a part or component of the complex model to be optimized. This leads to an improvement of the overall model.
In a preferred embodiment of the present invention, the simulation model is designed to generate at least one data point, in particular by data augmentation, on the basis of the cell parameters, in particular on the basis of the hyperplane information and/or hyperset information. The simulation model can be a data augmentation model, for example.
In a second aspect present invention, a system for validating a trained artificial neural network (ANN) on the basis of a test data set is provided. According to an example embodiment of the present invention, the system comprises a provisioning device that is designed to carry out the following steps: providing the trained ANN with a network architecture, which represents an in particular non-linear, chained function by means of which a d-dimensional input space can be mapped to an N-dimensional output space, and further comprising a plurality of weights; providing the test data set; and an evaluation and/or computing device that is designed to carry out the following steps: partitioning the d-dimensional input space into a plurality of cells on the basis of the network architecture of the ANN, wherein the individual cells can each be separated from one another in each case by at least one weight-specific hyperplane and/or hyperset; checking whether at least one data point of the test data set is present in each of the cells in order to validate the ANN; and, if no data point is present in at least one of the cells, generating at least one new data point by means of a simulation model on the basis of cell parameters of the at least one of the cells in order to complete the test data set.
The example embodiments mentioned for the method of the present invention apply in the same or a similar way to the system of the present invention.
The present invention also relates to a control and/or computing device that is designed to perform a regression task and/or a classification task by running the ANN provided, trained, validated and/or optimized according to the method of the present invention. The control and/or computing device can be included purely by way of example in an autonomous vehicle and/or a robotic system and/or an industrial machine. Such a control and/or computing device can be used, for example, as an embedded device in a plant or system. In general, the present method can be used particularly for small-dimensional artificial neural networks. For example, the present validated and/or optimized ANN can be used as a virtual sensor with up to ten, preferably up to 5-dimensional inputs. The ANN provided here can also be used for feature detection in larger ANNs, for example for autonomous edge detection. Other examples of virtual sensors in the vehicle are virtual temperature sensors (e.g. on the stator/rotor of an electric motor or on an injector magnet), mass estimators for an injection system, coking models and/or virtual sensors for exhaust gas concentrations. In principle, the present method or the ANN provided thereby can be applied to all virtual sensors that are used in cases of impossibility of installation, high cost pressure, or complex measurement setup for serial production.
In addition, the ANN validated and/or optimized and provided here can be used to determine a correction factor for calculation for the display of the current consumption of an engine in a vehicle. The present ANN offers an alternative to conventional methods that require “running in of an injector” and may not always work. This behavior can be parameterized in a map. In addition, increased EU requirements require the use of alternative methods. Thanks to the compactness and low complexity of the ANN validated and/or optimized here, such an alternative is provided.
In addition, the ANN validated and/or optimized and provided here can be used to detect a change point from a discrete gyroscope signal, for example for fall detection. It is advantageous in this case that the input vectors here are small-dimensional, and thus an application of the present optimization method can be carried out optimally in order to provide an optimized ANN.
In addition, the ANN provided here can be used as an edge filter in image processing. This can be explained using an excavator bucket as an example, wherein the ANN can be used as part of the control loop of the excavator bucket. The transfer function of the excavator joystick to the cylinder position of the bucket can be modeled using such an ANN optimized here. Preferably, small-dimensional networks with a simple network architecture, for example of the size 3-15-15-1 and/or 2-15-7-1, are considered in this case. For such small ANNs, the partitions can be determined in less than 5 seconds, for example.
In particular, due to its low dimensionality, the present ANN can be used as a surrogate function in which the ANN represents a solution of a partial differential equation, which cannot be calculated independently in real time.
According to an example embodiment of the present invention, a computer program having program code is also provided to carry out at least parts of the method according to the present invention in any of its embodiments when the computer program is executed on a computer. In other words, a computer program (product) comprising commands that, when the program is executed by a computer, cause the computer to carry out the method/steps of the method according to the present invention in one of its embodiments.
According to the present invention, a computer-readable data carrier having program code of a computer program is proposed to carry out at least parts of the method according to the present invention in any of its embodiments when the computer program is executed on a computer. In other words, the present invention relates to a computer-readable (memory) medium comprising commands that, when executed by a computer, cause the computer to perform the method/steps of the method according to the present invention in one of its embodiments.
The described embodiments and developments of the present invention can be combined with one another as desired.
Further possible embodiments, developments and implementations of the present invention also include combinations not explicitly mentioned of features of the present invention described above or in the following relating to the exemplary embodiments.
The figures are intended to impart further understanding of the example embodiments of the present invention. They illustrate example embodiments and, in connection with the description, serve to explain principles and concepts of the present invention.
Other embodiments of the present invention and many of the mentioned advantages are apparent from the figures. The illustrated elements of the drawings are not necessarily shown to scale relative to one another.
In the figures, identical reference signs denote identical or functionally identical elements, parts or components, unless stated otherwise.
The partitioning of the input space is described in terms of its mathematical principles with reference to
The partitioning according to the present method preferably considers a multilayer neural network ANN with ReLU nonlinearities as the activation function. This means that the ANN defines a function F: Rm1→Rmk+1.
The following applies:
with weight matrices A1, . . . , Ak, and bias terms b1, . . . , bk. Here, A1 are matrices, i.e., linear mappings between real vector spaces A1: Rm1, Rm2, A2: Rm2, Rm3, . . . , Ak: Rmk→Rmk+1 and b1 E Rm2, . . . , bk∈Rmk+1 are vectors of a real vector space.
Overall, F is preferably a chained mapping F: Rm1→Rmk+1 of the m1-dimensional Euclidean space Rm1 in the mk+1-dimensional Euclidean space Rmk+1.
To reduce the notation, d=m1 and N=mk+1 is set, i.e., the network F: Rd→RN examined here is a function of the d-dimensional Euclidean space Rd in the N-dimensional Euclidean space RN.
If a regression is modeled, the description of the function F(x) is preferably terminated. The ANN then preferably returns the value F(x) as the output value or prediction value.
In the case of classification of the input signal x into mk+1 classes, a softmax function preferably follows the output F(x).
As a result, the components of F(x)=(f1(x), f2(x), . . . , fmk+1(x)) are preferably positive, and normalized to one, i.e., for all input variables x the following applies: f1(x), . . . , fmk+1 (x)≥0 for all x and sum f1(x)+ . . . +fmk+1 (x)=1.
The classification is preferably carried out with an argmax function that returns the index of the vector g1(x), . . . , gmk+1 (x) at which the g (x) is maximal.
After the principles have been explained, the partitioning of the input space is described in more detail below. The layers of the ANN preferably partition the input space into cells. In the case of a classification network, it is preferable to distinguish between the first layer (input layer), middle layers (also subsequent layers or layers following the input layer), and last layer (also output layer).
In the first layer of the network, consisting of a pair of weight matrix A1 and bias terms b1, and a ReLU nonlinearity p (activation function), an input vector x is multiplied by the weight matrix A1, the bias term b1 is added, and the resulting vector A1x+b1 is fed to a component-by-component ReLU ϕ(A1x+b1). At the component level this means:
For easier legibility, d=m1 and m=m2 have been set.
In the following, the column vectors of A1 with the vectors h1, . . . , hm are also included.
These vectors and the bias term preferably each define open half-spaces.
Furthermore, hyperplanes are introduced preferably for the purpose of partitioning.
This system of half-spaces and hyperplanes partitions the input space Rn preferably into cells. This relationship is shown by way of example in
In
In
In other words, each element x∈Rm1 is assigned a coding code(x)∈{−1,0,1}m in such a way that a −1 is inserted at the k-th position if h1,x
+b11<0, a 0 if
h1,x
+b11=0, and a +1 if h1,x
+b11>0. In this way, preferably each element of the input space is uniquely defined by a code.
The hyperplanes H10, H20, . . . , Hm0 of the first layer generate n-dimensional cells (or else n-faces). Here, the n-cells are the connected components of the complement of the union of the hyperplanes. Referring to
The (relatively open connected components) of the boundary of the n-faces are the (n−1)-cells (see
Recursively, the k-cells are thus defined as the boundary of k+1-cells until 0-cells are formed at the end, which are preferably points of the input space.
In each n-face, the output of the first layer of the neural ReLU ANN can be interpreted as a linear mapping.
From k-Layer to (k+1)-Layer
Each further layer of a ReLU ANN preferably further partitions the n-cells generated by the previous layers. Each n-cell should preferably be considered individually.
In the case of a regression, the partitioning process is terminated. Here, the last layer or the output layer is preferably a linear mapping on the outputs of the penultimate layer. Preferably, no further partitioning of the input space is carried out.
In the previous steps, a partitioning Pk of the input space into n-cells (in the input layer), (n−1)-cells (in the further layer), etc. was generated. The n-cells are also partitioned in the last layer, the output layer. Preferably, all n-cells are run through, wherein the following steps can be carried out:
Thus, the partitioning of the input space into cells or faces was described above, wherein the ANN is constant in each cell (i.e., each n-face) of the partitioning.
According to the present invention, the computer-implemented method comprises at least the following steps:
| Number | Date | Country | Kind |
|---|---|---|---|
| 10 2023 211 711.8 | Nov 2023 | DE | national |