The present application claims the benefit under 35 U.S.C. § 119 of German Patent Application No. DE 10 2023 202 443.8 filed on Mar. 20, 2023, which is expressly incorporated herein by reference in its entirety.
The present disclosure relates to methods for providing a neural network on a data processing device.
Neural networks can be used for numerous tasks, in particular control tasks such as driving assistance, robot control, or other controls of any type of machine. Different implementations of “the same” neural network (i.e., the implementations provide the same output from the same input) can have very different requirements regarding main memory, non-volatile memory, and execution time. Depending on the application and available hardware, i.e., the data processing device (e.g., a microcontroller) that is intended to execute the neural network, these requirements can be met more or less well or are of greater or lesser importance depending on the application. Procedures that allow a neural network to be suitably provided on a given data processing device are therefore desirable.
According to various embodiments of the present invention, a method for providing a neural network on a data processing device is provided, comprising: ascertaining, from a set of implementation variants of the neural network, a subset with a plurality of implementation variants of the neural network, wherein each implementation variant of the subset cannot be improved with respect to any of main memory requirement, non-volatile memory requirement and execution time, when executed on the data processing device, without impairing at least one of the other two of main memory requirement, non-volatile memory requirement and execution time, and the subset for each of main memory requirement, non-volatile memory requirement and execution time, when executed on the data processing device, contains at least one particular implementation variant that is optimal in this respect from the set of implementation variants. The method further comprises selecting one of the ascertained implementation variants according to a user input that specifies a selection from the subset, and storing the selected implementation variant in the data processing device.
The above-described method allows efficient provision of a neural network on a data processing device, taking into account technical conditions of the data processing device and the application in question (e.g., the device to be controlled by the data processing device). A user is given the possibility of selecting an implementation variant that is optimal (within the scope of the ascertainment accuracy or the available implementations) with regard to the available data processing device and the application in question (and the resulting prioritization of main memory requirement, non-volatile memory requirement and execution time).
The data processing device uses the neural network, for example, for a control task, i.e., for controlling a device such as a robot.
Various exemplary embodiments of the present invention are specified below.
Exemplary embodiment 1 is a method for providing a neural network, as described above.
Exemplary embodiment 2 is a method according to exemplary embodiment 1, comprising ascertaining a set of layers of the neural network that, when the neural network is implemented according to a reference implementation, have a longer execution time than the other layers of the neural network on the data processing device, ascertaining different layer implementation variants for each layer of the set, ascertaining implementation variants of the neural network by combining the ascertained layer implementation variants to form an implementation variant of the neural network, wherein a corresponding predefined standard implementation is used for layers that are not part of the set, and ascertaining the subset of implementation variants by ascertaining the main memory requirement, non-volatile memory requirement and execution time for each of the ascertained implementation variants (and adding the ascertained implementation variant to the subset, so that the subset or the implementation variants contained therein meet the above conditions).
This allows the subset to be ascertained sufficiently in particular in that only layer implementation variants are ascertained for layers whose execution time (i.e., computing effort) contributes most to the total execution time, e.g. above a predefined threshold (e.g., 1%-5% of the total execution time). For example, layer implementation variants that are each optimal with regard to at least one of main memory requirement, non-volatile memory requirement and execution time can be combined in order to generate the subset of implementation variants (or at least candidates of implementation variants for the subset of implementation variants, which are then added to the subset such that the subset or the implementation variants contained therein meet the above conditions). For ascertaining the main memory requirement, the non-volatile memory requirement and the execution time, the layers are implemented in accordance with the ascertained layer implementations.
Exemplary embodiment 3 is a method according to exemplary embodiment 1 or 2, wherein all implementation variants of the set of implementation variants supply the same output from the output layer of the neural network (i.e., numerically identical) for the same input to the input layer of the neural network.
This ensures that no accuracy is lost by optimizing the implementation of a neural network according to the above method.
Exemplary embodiment 4 is a method according to one of exemplary embodiments 1 to 3, wherein the implementation variants differ in at least one of
These parameters have an effective influence on main memory requirement, volatile memory requirement and execution time without changing the output of the neural network. The data type decides, for example, whether the weights need more storage space but can be loaded directly into the processor.
Exemplary embodiment 5 is a method according to one of exemplary embodiments 1 to 4, comprising receiving a specification of a restriction of the data processing device with respect to at least one of non-volatile memory and main memory (e.g. by receiving a corresponding user input) and ascertaining the subset of implementation variants such that the implementation variants of the subset comply with the restrictions.
It can thus be ensured that the implementation variants from which the selection is made correspond to the capabilities of the data processing device.
Exemplary embodiment 6 is a method according to one of exemplary embodiments 1 to 5, comprising receiving a specification of an application request with respect to at least one of maximum computing time, maximum non-volatile memory requirement and maximum main memory requirement (e.g. by receiving a corresponding user input) ascertaining the subset of implementation variants such that the implementation variants of the subset satisfy the application request.
It can thus be ensured that the implementation variants from which the selection is made correspond to the requirements of the particular application (i.e., the particular task of the neural network).
Exemplary embodiment 7 is a computer system that is configured to carry out a method according to one of exemplary embodiments 1 to 6.
Exemplary embodiment 8 is a computer program comprising commands that, when executed by a processor, cause the processor to carry out a method according to one of exemplary embodiments 1 to 6.
Exemplary embodiment 9 is a computer-readable medium that stores commands that, when executed by a processor, cause the processor to carry out a method according to one of exemplary embodiments 1 to 6.
In the figures, similar reference signs generally refer to the same parts throughout the various views. The figures are not necessarily true to scale, with emphasis instead generally being placed on the representation of the main features of the present invention. In the following description, various aspects are described with reference to the figures.
The following detailed description relates to the figures, which show, by way of explanation, specific details and aspects of this disclosure in which the present invention can be executed. Other aspects may be used and structural, logical, and electrical changes may be performed without departing from the scope of protection of the present invention. The various aspects of this disclosure are not necessarily mutually exclusive, since some aspects of this disclosure may be combined with one or more other aspects of this disclosure to form new aspects.
Various examples are described in more detail below.
The device 100 can be any type of robot device and/or machine, for example a vehicle, a robot arm, a washing machine, a drilling machine, etc. The microcontroller 102 can have any control task in such a device, for example controlling a brake, a motor, etc.
The microcontroller 102 uses a neural network 103 for this purpose. This is stored (in the form of a specification of the neural network, in particular its weights) in a non-volatile memory 104 (e.g., flash memory) of the microcontroller 102, is loaded into a main memory 105 (i.e., a RAM) of the microcontroller 102, for execution (e.g., when the device is switched on), and is executed by a processor 106 (this can also be a plurality of single processors or processor cores).
Microcontrollers are at the lower end of the power scale with respect to computing power, flash size, and RAM size. Software for this hardware is therefore limited in three directions. Neural networks on the other hand are computationally intensive. They have a high main memory requirement due to the activations to be stored and a high flash requirement due to the trained parameters. Their use on microcontrollers limits the number of layers and the layer size. In order to save further computing time, RAM, and flash, int8-quantized networks instead of float32 networks are used in most cases.
It is furthermore possible to implement a given neural network in different ways. These implementations are identical, to the bit, with respect to the numerical result (i.e., supply the same results for the same input) but differ with respect to computing time, main memory requirement, and non-volatile memory requirement. The implementation can thus be further adapted to the requirements within the overall system in question (here the device 100). Examples of the various requirements are:
The implementation of a neural network for microcontrollers often takes place not manually, but with specially developed tools, which are however not able to generate different implementations with different compromises between computing time, main memory requirement and non-volatile memory requirement. Although a neural network can be compressed by quantization, this changes the numerical results of the neural network.
According to various embodiments, different implementations of a neural network are generated automatically, e.g. by a computer system 101. In contrast to quantization and pruning techniques, all implementations are numerically exactly identical (i.e., they deliver the same output with the same input). However, they differ in the required computing time, the main memory requirement, and the non-volatile memory requirement. In this case, of all possible implementations, those that lie on the Pareto front (computing time, main memory requirement and non-volatile memory requirement) are selected. If there is an excessively large number of implementations on the Pareto front, very similar implementations are then removed from these.
The resulting set of implementations is made available for selection to the user in question together with a list of computing time, main memory requirement and non-volatile memory requirement, e.g. on a screen 108 of the computer system 101. According to a user input, an implementation can then be selected and loaded onto the microcontroller 102 (by connection by means of a cable, wirelessly, by means of a USB stick, etc.) For example, the computer system 101 generates C code for a neural network 103 in order ultimately to be able to install the trained neural network 103 (according to the selected implementation) on microcontrollers. According to one embodiment, the computer system 101 does not generate one implementation here but rather a set of numerically identical implementations on the Pareto front (in the space spanned by computing time, main memory requirement and non-volatile memory requirement). For the implementations generated, the properties (computing time, main memory requirement, non-volatile memory requirement) are listed for the user, and/or the Pareto front is displayed graphically on the display 108.
The user can then select the implementation that best corresponds to their requirements (or to those of the device 100). Possible techniques to generate different implementations are:
The various implementations can be sorted easily by layer with respect to the inference time (i.e., execution time of a forward pass through the neural network). The non-volatile memory requirement depends on the overall configuration but can be estimated relatively well, since the required storage amount for individual layers is relatively constant (two different highly optimized dense implementations, adapted to the input and output variables, have virtually the same non-volatile memory requirement if undetermined code parts can be optimized away). The main memory requirement of an implementation results from the set of activations to be stored and can be measured exactly. The parameters of different possible implementations can be determined and listed from the information for the individual layers.
The above-described approach accommodates the diverse requirements in various applications:
However, an optimization with respect to a single target variable such as “optimize with respect to code size”, “optimize with respect to speed” as in the case of compilers, thus falls short. On the other hand, the approach described above avoids a situation in which the user is burdened with a large number of options that define the particular implementation by the user being able to select an implementation from a Pareto surface (i.e., Pareto front).
The determination of the Pareto surface can take place, for example, taking into account the following properties:
The computer system 101 determines the subset of implementation variants that are offered to the user for selection, for example as follows:
In summary, according to various embodiments, a method is provided as shown in
In 301, a subset with a plurality of implementation variants of the neural network is ascertained from a set of implementation variants of the neural network (i.e., a subset of the set of implementation variants, so that the subset itself contains a plurality of implementation variants; in the above examples these are the implementation variants on the Pareto front) , wherein
In 302, one of the ascertained implementation variants is selected according to a user input (e.g. text input or click on a corresponding display element) that specifies a selection from the subset.
In 303, the selected implementation variant is stored in the data processing device (so that it can be executed there, e.g., it is transmitted from a computer system carrying out the above steps to the data processing device and stored there, e.g. by a flash process).
The provision of the neural network on the data processing device comprises the implementation according to various embodiments or is given thereby in various embodiments (storing the implementation variant in the data processing device can also be regarded as provision and in particular implementation of the implementation variant (or the neural network according to the implementation variant) on the data processing device).
According to various embodiments, the subset forms a Pareto surface in the set of (e.g., possible or available or provided) implementation variants.
The method of
The method is therefore in particular computer-implemented according to various embodiments.
The approach of
For example, the neural network can process sensor signals from different sensors, such as video, radar, LiDAR, ultrasound, motion, thermal imaging, etc., for example in order to obtain sensor data regarding states of a system to be controlled (e.g., a robot and objects in its surroundings). The processing of the sensor data can comprise for example the classification of the sensor data or the performance of a semantic segmentation of the sensor data, for example in order to detect the presence of objects (in the environment in which the sensor data were obtained). For example, a robot can thus be controlled by means of the neural network, for example in order to achieve different manipulation tasks under different scenarios. In particular, embodiments are applicable to the control and monitoring of the performance of manipulation tasks, for example, in assembly lines.
| Number | Date | Country | Kind |
|---|---|---|---|
| 10 2023 202 443.8 | Mar 2023 | DE | national |