The present invention relates to a method for parameterizing a machine learning system, a training system, a control system, a method for operating the control system, a computer program, and a machine-readable memory medium.
A method for training a machine learning system is described in German Patent Application No. DE 20 2017 102 238 U, in which parameters of the machine learning system are trained with observation values and associated desired first output values, using a training set for the machine learning system.
In the following discussion, reference symbol X refers to the input space and Y refers to the output space (target space). A training data set S=((x1,y1), . . . , (xm,ym)) is provided. The data are extracted from a fixed, unknown distribution P over (X, Y). In addition, a loss function l:Y×Y→R is provided. For a function ƒ:X→Y, the expected loss is defined by R[ƒ]:=x,y˜pl(ƒ(x),y). The empirical risk is defined by
This variable is also referred to below as parameter .
The task of a machine learning method is to learn a function ƒ:X→Y from a function class F that minimizes expected loss R[ƒ]. Since this is generally not possible, this task is replaced by the task of learning, based on a training data set S, a function ƒS:X→Y from a function class F that minimizes the empirical (expected) risk.
If a function ƒs has been obtained in this way, a key issue is how well ƒs generalizes for new data points (x, y)˜P. This is characterized by difference R[ƒS]−{circumflex over (R)}S[ƒS].
A machine learning system, for example a neural network, may be described using a plurality of parameters. These parameters may be subdivided into architecture parameters, such as the depth, the number and form of the filters, and the selection of the nonlinearities and links; and optimization parameters, such as the increment, the size of the batches, and the number of iterations. In other words, architecture parameters characterize function ƒ, whereas optimization parameters characterize the optimization method that is used during the training.
The selection of these parameters may be based, for example, on empirical knowledge or the performance of the neural network on a validation data set. Since a neural network is generally parameterized by many more parameters than the number of provided data points, there is a risk that the network memorizes the training data. A network trained in this way may possibly not be well suited for use in security-critical functions, for example for automated driving, since the task may not be well controllable for new data points.
A method in accordance with an example embodiment of the present invention prevents a machine learning system from memorizing the training data during the training; i.e., architecture parameters and/or optimization parameters may be automatedly determined by a parameter search. As a result of the architecture parameters and/or optimization parameters determined in this way, the machine learning system may generalize in an improved manner.
In a first aspect, the present invention relates to a method for computer-assisted parameterization of a machine learning system. Further aspects of the present invention are described herein. Advantageous refinements of the present invention are described herein.
Machine learning systems, in particular artificial neural networks, are considered. To carry out a classification task, i.e., the association of input signals x of an input space X=d with a class y made up of a number k of many classes. This association takes place, for example, with the aid of a function ƒ:d→k, where k is a Euclidean space. The components of ƒ(x) correspond in each case to one of the classes, and in each case characterize a likelihood that associated class y is a correct classification of input signal x. An argmax function may be used to associate input signal x with a certain class. The argmax function outputs the coordinate of the maximum value of ƒ; i.e., y=N(x)=arg max ƒ(x).
Training data include a plurality of training points (xT,yT) which are pairs of an example of input data xT and an associated classification yT. If classification N(xT) of the machine learning system is correct, the value of the i-th coordinate of ƒ(xT) is the largest of the values. This means that ƒ(xT)i>ƒ(xT)j for all j≠i.
A margin m is defined by
If the margin is positive, the classification is correct; if the margin is negative, the classification is incorrect.
The present invention is based on the observation that the histograms of the margins of training data follow different distributions, depending on whether or not the machine learning system has correctly learned the underlying structure of the relationship between input data xT and classification yT.
Training a machine learning system using training data (xT,yT) results in a distribution as illustrated in
Hyperparameters θH may be characterized in that they remain unchanged during the training of the machine learning system. Hyperparameters θH may include architecture parameters θA and/or optimization parameters θO, for example.
Architecture parameters θA characterize the structure of the machine learning system. If the machine learning system is a neural network, architecture parameters θA include, for example, a depth (i.e., number of layers) of the neural network and/or the number of filters and/or parameters that characterize the form of the filters, and/or parameters that characterize the nonlinearities of the neural network, and/or parameters that characterize which layers of the neural network are connected to which other layers of the neural network.
Optimization parameters θO are parameters that characterize the behavior of an optimization algorithm for adapting parameters θ that are adapted during the training. For example, these parameters may include a numerical increment and/or a size of data batches and/or a minimum or maximum number of iterations.
Varying degrees of learning success of the training of the machine learning system result for different values of hyperparameters θH. To preferably optimize the generalization of the machine learning system, it has been found that hyperparameters θH are to be selected in such a way that a correctly labeled data set XC=(xT,yT) may be satisfactorily learned; i.e., a distribution of margins m similar to that shown in
Specific embodiments of the present invention are explained in greater detail below with reference to the figures.
Control system 40 receives the sequence of sensor signals S of sensor 30 in an optional receiving unit 50, which converts the sequence of sensor signals S into a sequence of input signals x (alternatively, each sensor signal S may also be directly accepted as an input signal x). Input signal x may, for example, be a section or a further processing of sensor signal S. Input signal x may include, for example, image data or images, or individual frames of a video recording. In other words, input signal x is ascertained as a function of sensor signal S. Input signal x is supplied to a machine learning system 60, which is a neural network, for example.
Machine learning system 60 is preferably parameterized by parameters θ, which are stored in a parameter memory P and provided by same.
Machine learning system 60 ascertains output signals y from input signals x. Output signals y are supplied to an optional transforming unit 80, which ascertains therefrom control signals A that are supplied to actuator 10 in order to appropriately control actuator 10.
Actuator 10 receives control signals A, is appropriately controlled, and carries out a corresponding action. Actuator 10 may include a control logic system (not necessarily structurally integrated) which ascertains from control signal A a second control signal with which actuator 10 is then controlled.
In further specific embodiments, control system 40 includes sensor 30. In yet further specific embodiments, control system alternatively or additionally controls actuator 10.
In further preferred specific embodiments, control system 40 includes one or multiple processors 45 and at least one machine-readable memory medium 46 on which instructions are stored, which, when executed on processors 45, prompt control system 40 to carry out the method according the present invention.
In alternative specific embodiments, a display unit 10a is provided as an alternative to or in addition to actuator 10.
Sensor 30 may be, for example, one or multiple video sensors and/or one or multiple radar sensors and/or one or multiple ultrasonic sensors and/or one or multiple LIDAR sensors and/or one or multiple position sensors (GPS, for example), preferably situated in motor vehicle 100. Alternatively or additionally, sensor 30 may include an information system that ascertains a piece of information concerning a state of the actuator system, for example a weather information system that ascertains an instantaneous or future state of the weather in surroundings 20.
Based on input data x, machine learning system 60 may detect, for example, objects in the surroundings of the at least semi-autonomous robot. Output signal y may be a piece of information that characterizes where in the surroundings of the at least semi-autonomous robot objects are present. Output signal A may then be ascertained as a function of this information and/or corresponding to this information.
Actuator 10, which is preferably situated in motor vehicle 100, may be, for example, a brake, a drive, or a steering system of motor vehicle 100. Control signal A may then be ascertained in such a way that actuator or actuators 10 is/are controlled in such a way that motor vehicle 100 for example prevents a collision with the objects identified by machine learning system 60, in particular when objects of certain classes, for example pedestrians, are involved. In other words, control signal A may be ascertained as a function of the ascertained class and/or corresponding to the ascertained class.
Alternatively, the at least semi-autonomous robot may be some other mobile robot (not illustrated), for example one that moves by flying, swimming, diving, or stepping. The mobile robot may also be, for example, an at least semi-autonomous lawnmower or an at least semi-autonomous cleaning robot. In these cases as well, control signal A may be ascertained in such a way that the drive and/or steering system of the mobile robot are/is controlled in such a way that the at least semi-autonomous robot for example prevents a collision with the objects identified by machine learning system 60.
In a further alternative, the at least semi-autonomous robot may be a garden robot (not illustrated) that uses an imaging sensor and machine learning system 60 to ascertain a type or a state of plants in surroundings 20. Actuator 10 may then be an applicator of chemicals, for example. Control signal A may be ascertained, as a function of the ascertained type or the ascertained state of the plants, in such a way that a quantity of the chemicals is applied corresponding to the ascertained type or the ascertained state.
In yet further alternatives, the at least semi-autonomous robot may be a household appliance (not illustrated), in particular a washing machine, a stove, an oven, a microwave oven, or a dishwasher. By use of sensor 30, for example an optical sensor, a state of an object that is treated by the household appliance may be detected, for example in the case of the washing machine, a state of laundry situated in the washing machine. By use of machine learning system 60, a type or a state of this object may then be ascertained and characterized by output signal y. Control signal A may then be ascertained in such a way that the household appliance is controlled as a function of the ascertained type or the ascertained state of the object. For example, in the case of the washing machine, it may be controlled as a function of what material the laundry situated therein is made of. Control signal A may then be selected as a function of what material of the laundry has been ascertained.
Sensor 30 may then, for example, be an optical sensor that detects, for example, properties of manufactured products 12. It is possible for actuator 10, which controls manufacturing machine 11, to be actuated as a function of the ascertained properties of manufactured product 12, so that manufacturing machine 11 correspondingly carries out a subsequent processing step for this manufactured product 12. It is also possible for sensor 30 to ascertain the properties of manufactured product 12 that is processed by manufacturing machine 11, and as a function thereof to adapt a control of manufacturing machine 11 for a subsequent manufactured product.
Control system 40 ascertains a control signal A of personal assistant 250 as a function of the signals of sensor 30, for example by the machine learning system carrying out a gesture recognition. This ascertained control signal A is then transmitted to personal assistant 250, which is thus appropriately controlled. This ascertained control signal A may in particular be selected in such a way that it corresponds to a presumed desired control by user 249. This presumed desired control may be ascertained as a function of the gesture recognized by machine learning system 60. Control system 40 may then select control signal A for transmission to personal assistant 250 as a function of the presumed desired control, and/or select control signal A for transmission to the personal assistant, corresponding to presumed desired control 250 [sic].
This corresponding control may involve, for example, personal assistant 250 retrieving pieces of information from a database and reproduce them adoptable for user 249.
Instead of personal assistant 250, a household appliance (not illustrated), in particular a washing machine, a stove, an oven, a microwave oven, or a dishwasher, may be provided to be appropriately controlled.
Artificial neural network x is configured to ascertain associated output signals y from input signals x that are supplied to it. These output signals y are supplied to evaluation unit 180.
Training system 140 includes a second parameter memory Q in which hyperparameters θH are stored.
A modification unit 160 ascertains, for example using the method illustrated in
Evaluation unit 180, for example with the aid of a cost function (English: loss function) that is a function of output signals y and desired output signals yT, may ascertain parameter , which characterizes a performance of machine learning system 60. Parameters θ may be optimized as a function of parameter .
In further preferred specific embodiments, training system 140 includes one or multiple processors 145 and at least one machine-readable memory medium 146 on which instructions are stored, which when executed on processors 145 prompt control system 140 to carry out the method according the present invention.
Initially 1000, hyperparameters θH are initialized, for example randomly or to fixedly predefinable values. A set of correctly labeled training data XC=(xT,yT) is subsequently provided by training data unit 150. Parameters θ are set to a predefinable initial value; for example, they may be set to randomly selected values.
A random permutation of the association of input signals xT and associated output signals, i.e., classifications, yT is then 1100 generated with the aid of an actual random number generator or a pseudorandom number generator. Corresponding to this random permutation, randomized classifications ŷT are ascertained by permutation of classifications yT, and not correctly labeled data set Xr=(xT,ŷT) is generated.
Alternatively, randomized classifications ŷT may also be generated in step 1100 by random extraction from the set of possible classes with the aid of the actual random number generator or the pseudorandom number generator, and not correctly labeled data set Xr=(xT,ŷT) is thus generated.
Input signals x=xT are then 1200 ascertained from the set of correctly labeled training data XC, supplied to machine learning system 60, and output signals y are ascertained from same. For this purpose, for each input signal x, the function of k-dimensional variable ƒ(x) is ascertained, and output signal y is ascertained as that component of ƒ(x) that has the maximum value.
Output signals y and actual, i.e., desired, output signals yT associated with input signals x are subsequently 1300 provided in evaluation unit 180.
Parameter is subsequently 1400 ascertained as a function of ascertained output signals y and desired output signals yT. New parameters θ′ that optimize parameter are then ascertained with the aid of an optimization method, for example a gradient descent method, by carrying out steps 1200 and 1300, in each case using new parameters θ′ and optionally with multiple iterations, until optimal new parameters θ′ have been ascertained. These parameters are then stored in first parameter memory P.
Lastly 1500, margins m that have resulted for optimal new parameters θ′ are ascertained with the aid of ascertained k-dimensional variables ƒ(x), as well as an identification number that characterizes the statistical distribution of margins m, for example a portion of those margins m that are positive. Parameters θ are reset to a predefinable initial value; for example, they may be set to randomly selected values.
Steps 1200 through 1500 are now correspondingly repeated for not correctly labeled data set Xr.
For this purpose, input signals x=xT are initially 1600 ascertained from the set of not correctly labeled training data Xr, supplied to machine learning system 60, and output signals y are ascertained from same. For this purpose, for each input signal x, the function of k-dimensional variable ƒ(x) is ascertained, and output signal y is ascertained as that component of ƒ(x) that has the maximum value.
Output signals y and actual, i.e., desired, output signals ŷT associated with output signals x are subsequently 1700 provided in evaluation unit 180.
Parameter is subsequently 1800 ascertained as a function of ascertained output signals y and desired output signals ŷT. New parameters θ′ that optimize parameter are then ascertained with the aid of an optimization method, for example a gradient descent method, by carrying out steps 1600 and 1700, in each case using new parameters θ′ and optionally with multiple iterations until optimal new parameters θ′ have been ascertained.
Lastly 1900, second margins m′ that have resulted for optimal new parameters θ′ for not correctly labeled training data Xr are ascertained with the aid of ascertained k-dimensional variables ƒ(x), as well as an identification number that characterizes the statistical distribution of second margins m′, for example a portion of those second margins m′ that are positive.
A check is now made 2000 as to whether the ascertained portion is greater than a first threshold value and whether the ascertained second portion is less than a second threshold value. If this is not the case, hyperparameters θH are varied, for example randomly, on a predefinable discrete grid, and the method branches back to step 1000 and is carried out anew using varied hyperparameters θH (of course, hyperparameters θH are not reinitialized in step 1000). If all possible values of hyperparameters θH have been explored, the method terminates with an error message.
In contrast, if the ascertained portion is greater than a first threshold value and the ascertained second portion is less than a second threshold value, the instantaneous values of hyperparameters θH are stored in second parameter memory Q. The method thus ends.
As an alternative to successively varying and evaluating hyperparameters θH, it is also possible to initially ascertain in each case the portion and the second portion for all possible values of hyperparameters θH on a predefinable discrete grid, and from all possible values of hyperparameter θH to select that value that is best met, for example Pareto-optimally, for the condition that the ascertained portion is greater than a first threshold value and the ascertained second portion is less than a second threshold value.
It is understood that the methods may possibly not be implemented completely in software as described. They may also be implemented in hardware, or in a mixed form of software and hardware.
Number | Date | Country | Kind |
---|---|---|---|
102018216078.3 | Sep 2018 | DE | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2019/071735 | 8/13/2019 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2020/057868 | 3/26/2020 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
20200125852 | Carreira | Apr 2020 | A1 |
Number | Date | Country |
---|---|---|
202017102238 | May 2017 | DE |
202018104373 | Aug 2018 | DE |
Entry |
---|
Liu, Xin, et al. “Self-error-correcting convolutional neural network for learning with noisy labels.” 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). IEEE, 2017. (Year: 2017). |
Bartlett, Peter L., Dylan J. Foster, and Matus J. Telgarsky. “Spectrally-normalized margin bounds for neural networks.” Advances in neural information processing systems 30 (2017). (Year: 2017). |
Arpit, et al.: “A Closer Look at Memorization in Deep Networks”, Proceedings of the 34th International Conference on Machine Learning (PMLR 70, 2017), Australia, pp. 1-10, https://arxiv.org/abs/1706.05394v2. |
Bartlett, et al.: “Spectrally-normalized margin bounds for neural networks”, arXiv e-prints, (2017), pp. 1-24, https://arxiv.org/abs/1706.08498v2. |
Elsayed, et al.: “Large Margin Deep Networks for Classification”, arXiv e-prints, (2018), pp. 1-25, https://arxiv.org/abs/1803.05598v1. |
Zhang, et al.: “Understanding deep learning requires re-thinking generalization”, arXiv e-prints, (2017), pp. 1-15, https://arxiv.org/abs/1611.03530v2. |
International Search Report for PCT/EP2019/071735, Issued Nov. 14, 2019. |
David Rolnick et al., “Deep Learning is Robust to Massive Label Noise,” Cornell University Library, 2018, pp. 1-10. ARXIV:1705.10694V3; Downloaded Nov. 30, 2020. |
Number | Date | Country | |
---|---|---|---|
20210271972 A1 | Sep 2021 | US |