The present invention relates to a method for optimizing the operation of a calculator implementing a neural network. The present invention further relates to an associated computer program product. The present invention further relates to an associated readable information medium.
Machine learning systems are used in many applications. Such systems are in particular based on neural networks previously trained on a training database. The task for which the neural network was trained is then performed during an inference step.
And yet, such systems are very resource-intensive because of the large number of arithmetic operations, called multiply accumulate (abbreviated as MAC), to be carried out for processing a datum. Such an energy consumption is in particular proportional to the number of bits needed for processing the datum. A large number of bits typically leads to a better performance of an application, but also requires more intensive computational operators and memory accesses.
Hence the need for a process for optimizing the performance of machine learning systems implementing a neural network.
To this end, the subject matter of the invention is a method for optimizing the operation of a calculator implementing a neural network, the method being implemented by computer and comprising the following steps:
According to particular embodiments, the method comprises one or a plurality of the following features, taken individually or according to all technically possible combinations:
The present description further relates to a computer program product comprising a readable information medium, on which a computer program is stored comprising program instructions, the computer program being loadable on a data processing unit and leading to the implementation of a method as described above when the computer program is run on the data processing unit.
The present description further relates to a readable information medium on which a computer program product as described above is stored.
Other features and advantages of the invention will appear upon reading hereinafter the description of the embodiments of the invention, given only as an example, and making reference to the following drawings:
In a variant, the calculator on which the neural network is implemented is coalesced with the calculator 10.
The calculator 10 is preferentially a computer.
More generally, the calculator 10 is an electronic calculator suitable for handling and/or transforming data represented as electronic or physical quantities in calculator 10 registers and/or memories in other similar data corresponding to physical data in memories, registers or other types of display, transmission or storage devices.
The calculator 10 interacts with the computer program product 12.
As illustrated in
The computer program product 12 comprises an information medium 26.
The information medium 26 is a medium readable by the calculator 10, usually by the data processing unit 16. The readable information medium 26 is a medium suitable for storing electronic instructions and apt for being coupled to a bus of a computer system.
As an example, the readable information medium 26 is a diskette or a floppy disk, an optical disk, a CD-ROM, a magneto-optical disk, a ROM, a RAM, an EPROM, an EEPROM, a magnetic card or an optical card.
The computer program 12 containing program instructions is stored on the information medium 26.
The computer program 12 can be loaded on the data processing unit 16 and is suitable for leading to the implementation of a method for optimizing the operation of a calculator implementing a neural network, when the computer program 12 is implemented on the processing unit 16 of the calculator 10.
The operation of the calculator 10 will now be described with reference to
The optimization method is implemented by the calculator 10 in interaction with the computer program product, i.e. is implemented by a computer.
The optimization method comprises a step 100 of providing a neural network. The neural network has parameters the w values of which, can be modified during a training of the neural network as described hereinafter.
A neural network is a set of neurons. The neural network comprises an ordered succession of layers of neurons, each of which takes the inputs thereof from the outputs of the preceding layer. More precisely, every layer comprises neurons taking the inputs thereof from the outputs of the neurons of the preceding layer.
In a neural network, the first layer of neurons is called the input layer while the last layer of neurons is called the output layer. The layers of neurons interposed between the input layer and the output layer are layers of hidden neurons.
Every layer is connected by a plurality of synapses. Every synapse has a parameter, also called synaptic weight. As mentioned above, the values w of the synapse parameters can be modified during a training of the neural network.
Every neuron is apt to perform a weighted sum of the values received from the neurons of the preceding layer, every value then being multiplied by the respective synaptic weight, and then to apply an activation function, typically a non-linear function, to said weighted sum, and to deliver to the neurons of the next layer, the value resulting from the application of the activation function. The activation function makes it possible to introduce a non-linearity in the processing performed by every neuron. The sigmoid function, the hyperbolic tangent function, the Heaviside function, the Rectified Linear Unit function (more often referred to as ReLU) are examples of activation functions.
The optimization method comprises a step 110 of providing training data relating to the values w taken by the parameters of the neural network during a training of the neural network on at least one test database.
Preferentially, the training data comprise at least the values w taken by the neural network parameters at the end of the neural network training (i.e. the final values obtained for the parameters).
Depending on the applications, the training data further comprise the values w taken by the neural network parameters during the training of the neural network on the test database. In this way it is possible to deduce a frequency of change and/or an amplitude of change of the values w of the parameters of the neural network during training.
The test database includes e.g. data from a generic application (generic database), and other data from particular applications (application databases).
The optimization method comprises a step 120 of determining, depending on the training data, an implementation of the neural network on hardware blocks of a calculator so as to optimize a cost relating to the operation of said calculator implementing the neural network. The implementation of a neural network on a calculator refers to the assignment of calculator hardware blocks for carrying out operations on the neural network.
The implementation is determined by decomposing the values w of the neural network parameters into sub-values w0, . . . , wp and by assigning to each sub-value w0, . . . , wp, at least one hardware block from a set of hardware blocks of the calculator. Typically, the decomposition is such that the application of a function on the sub-values w0, . . . , wp of each decomposition, makes it possible to obtain the corresponding value w of the parameter.
Preferentially, the values w of every parameter are each suitable for being represented by a sequence of bits. Every bit in a sequence has a different weight depending on the position of the bit in the sequence. The bit having the greatest weight is called the most significant bit. The sub-values w0, . . . , wp resulting from the same decomposition each being represented by a sequence of one or more bits such that the most significant bit is different for every sub-value w0, . . . , wp. In other words, every sub-value w0, . . . , wp contributes differently to the corresponding value w of the parameter.
Preferentially, the cost to be optimized uses at least one performance metric of the calculator. The optimization then aims to optimize such metric, i.e. either to maximize or minimize the metric depending on the nature of the metric. The at least one performance metric is preferentially chosen from: the latency of the calculator on which the neural network is implemented (to be minimized), the energy consumption of the calculator on which the neural network is implemented (to be minimized), the number of inferences per second during the inference of the neural network implemented on the calculator (to be maximized), the quantity of memory used by all or part of the sub-values of the decomposition (to be minimized), and the surface area after manufacture of the integrated circuit (“chip”) embedding the calculator (to be minimized).
Preferentially, the assignment of every hardware block to a sub-value w0, . . . , wp is done according to the position of the hardware block in the calculator and/or of the type of the hardware block among the hardware blocks performing a storage function and/or of the type of the hardware block among the hardware blocks performing a calculation function, which is done so as to optimize the cost of the operation.
The position of the hardware block in the calculator defines the access cost of the hardware block. For two identical memories e.g., the memory furthest from the calculator calculation unit has a higher access cost than the other memory which is closer.
The hardware blocks performing a storage function have, e.g., a different type according to the reconfiguration rate thereof and/or to the accuracy thereof. ROMs e.g. have a lower reconfiguration rate than memories such as SRAM, DRAM, PCM or OXRAM type memories. The hardware blocks performing the storage function can also be the calculator as such, which, in such case, is materially configured for performing an operation between a variable input value and a constant (which takes the value of the decomposition element).
The hardware blocks performing a calculation function have e.g. a different type depending on the nature of the calculation performed, e.g., matrix calculation versus event-driven calculation (also called “spike” calculation).
In an example of implementation, every value w of a parameter results from the addition of the sub-values w0, . . . , wp of the corresponding decomposition. In such case, every value w of a parameter results from the sum of a sub-value w0, the so-called basis weight, and other sub-values w1, . . . , wp, so-called perturbations. Every sub-value w0, . . . , wp is then represented by a number of bits equal to or different from the other sub-values w0, . . . , wp. In particular, the basic weight is typically represented by the largest number of bits.
Such a decomposition allows the sub-values w0, . . . , wp to be represented by an integer, a fixed point or further a floating point. In such case, conversions are applied for making the representations of the different sub-values w0, . . . , wp uniform at the time of addition.
The addition decomposition, like the other types of decomposition described below, gives the possibility of using different memories for the storage of the sub-values w0, . . . , wp of the same decomposition. If e.g. the values w of the parameters are often modified by low values which can be represented on a few bits, a memory with low precision with high write efficiency is of interest for storing the sub-values w0, . . . , wp corresponding to such variations. The other sub-values w0, . . . , wp are typically implemented in memories with low read consumption and potentially less write efficiency, because the sub-values w0, . . . , wp are read often, but rarely modified. PCMs are e.g. memories with low read consumption, but high power consumption/limited write cycles. ROMs are memories with low read consumption, but high write consumption, which are fixed during the manufacture of the chip.
It should also be noted that memories can have different physical sizes or different manufacturing costs. E.g. a memory is used for base weights with a high but very small write cost (so more can be put on the same silicon surface) or very easy/not expensive to manufacture/integrate into a standardized production process. The number of possible levels can also be a factor. PCMs e.g. can often represent just 1 or 2 bits with sufficient reliability.
In a variant, the same type of memory is used for all sub-values w0, . . . , wp of the decomposition, and the access costs differ only in the complexity of access. E.g. a small memory close to the calculation unit will be fast and less expensive to access, than a large memory further from the calculation unit but saving a larger number of bits. Such differences in the cost of access are e.g. due to the resistance of the connection cables, or to the type of access (memory directly connected to the calculator by cables versus complex addressing system and data bus requiring address calculations or introducing waiting times).
A specific advantage of addition decomposition, compared to concatenation decomposition, is to more easily use different types of representations for sub-values. E.g. integer/fixed-point/floating-point values or physical quantities (current, charge) can be used which may not be in binary format.
In another example of implementation, every value w of a parameter results from the concatenation of the sub-values w0, . . . , wp of the corresponding decomposition. In such case, every value w of a parameter results from the concatenation of a sub-value w0, called base weight, and of other sub-values w1, . . . , wp, called perturbations. Every sub-value w0, . . . , wp then corresponds to most significant bits different from the other sub-values w0, . . . , wp. Concatenation is performed starting with the base weight and then with the perturbations in the order of significance of the perturbations. Such a decomposition allows the sub-values w0, . . . , wp to be represented by an integer or a fixed point.
In addition to the above-mentioned advantages with the addition decomposition and which are applicable to the concatenation decomposition, concatenation decomposition gives the possibility of using different calculation units for every sub-value w0, . . . , wp of the decomposition. E.g. for an integer n-bit sub-value w0, . . . , wp, an integer n-bit multiplier is used. The type of calculation unit used can also be different. E.g. it is possible to use a calculation unit optimized for operations on the sub-values with the greatest contribution (in particular the base weight), and a different calculation unit optimized for operations on the sub-values with low contribution, such as event-driven calculation units. Event-driven calculation units are particularly of interest if the sub-values are rarely changed and are often zero.
In yet another example of implementation, every value w of a parameter results from a mathematical operation on the sub-values w0, . . . , wp of the corresponding decomposition. Such operation is e.g. embodied by a function F having as inputs the sub-values w0, . . . , wp of the decomposition and as output the value w of the parameter. The mathematical operation is e.g. the multiplication of the sub-values w0, . . . , wp for obtaining the value w of the parameter. In another example, the mathematical operation is an addition or a concatenation according to the above-described embodiments. Thus, such example of implementation is a generalization of the addition decomposition or of the concatenation decomposition.
In a first example of application, the cost to be optimized uses at least one performance metric (latency, energy consumption) of the calculator implementing the neural network, which happens during a subsequent training of the neural network on another database. The first application thus aims, to optimize the performance of the calculator during a subsequent training phase of the neural network. Such a first application is of interest in a scenario of real-time adaptation of the neural network during which the values w of the parameters of the neural network are permanently modified according to the data provided to the learning system.
In the first example of application, the training data comprise both the values w taken by the neural network parameters at the end of the training on the test database (i.e. the final values obtained for the parameters), and the other values w taken by the neural network parameters during the neural network training on the test database. The test database typically comprises data similar to or representative of the data that the neural network will receive during the execution thereof on the calculator.
The initial values of the neural network parameters during subsequent training are defined according to the training data and typically correspond to the final values obtained after training.
In the first example of application, the decomposition of the values w of the parameters into sub-values w0, . . . , wp is then determined according to the frequency of change and/or the amplitude of change of said values w in the training data. The hardware blocks assigned to every sub-value w0, . . . , wp are also chosen so as to optimize the operating cost. Typically, when memories are assigned to the sub-values w0, . . . , wp, the assignment is performed according to the contribution of the sub-value w0, . . . , wp to the corresponding value of the parameter. Memories with a low access cost (close to the calculation unit) or a low reconfiguration rate are typically assigned to sub-values with a large contribution, since said sub-values are more often read (in training or inference) and less often modified. However, memories with a low access cost potentially have a high write cost, as is the case for ROMs. Thus, memories with a low write cost (but which hence potentially have a higher access cost) or a high reconfiguration rate are typically assigned to sub-values with a lower contribution, since said sub-values are more often modified than the other sub-values. Thus, in the first example, it is accepted to have a higher read cost so as to have a low write cost for types of memories when the sub-values w0, . . . , wp are often modified.
In the example of
In a second example of application, the cost to be optimized uses at least one performance metric (latency, energy consumption) of the calculator implementing the neural network, which happens during a subsequent inference of the neural network on another database. The second application thus aims to optimize the performance of the calculator during a transfer learning process. Given a generic database and one or a plurality of application databases, the goal herein is to achieve optimized performance for all application databases.
In the second example of application, the training data comprise both the values w taken by the neural network parameters at the end of the training on the test database (i.e. the final values obtained for the parameters), and the other values w taken by the neural network parameters during the neural network training on the test database. The test database typically comprises generic and application data for evaluating the impact of the application data on the modification of the values w of the neural network parameters. The initial values of the neural network parameters during a subsequent training are defined according to the training data and typically correspond to the final values obtained after the training on the generic data of the test database.
In the second example of application, the decomposition of the values w of the parameters into sub-values w0, . . . , wp is then determined according to the frequency of change of said values w and/or of the amplitude of change of said values w in the training data, in particular when using application data (transfer learning). The hardware blocks assigned to every sub-value w0, . . . , wp are also chosen so as to optimize the operating cost.
E.g. part of the sub-values of the decomposition is fixed (hardwired in the calculator). In such case, the decomposition is chosen so that a sufficiently large number of application cases can be processed with the same fixed sub-values. Fixing some of the sub-values improves the latency and/or the energy consumption of the calculator. Optimization consists in finding a balance between performance optimization and adaptability to all application databases. Indeed, if a too large portion of the sub-values is fixed, it will not be possible to sufficiently modify the sub-values of the decomposition so as to adapt the neural network to a sufficiently large number of application cases.
Typically, exactly like for the first example of application, when memories are assigned to the sub-values, the assignment is e.g. performed according to the contribution of the sub-value to the corresponding value w of the parameter. Memories with a low access cost (close to the calculation unit) or a low reconfiguration rate are typically assigned to sub-values with a large contribution, since said sub-values are less often used (in training or inference) and less often modified. However, memories with a low access cost potentially have a high write cost, as is the case for ROMs. Thus, memories with a low write cost (but which hence potentially have a higher access cost) or a high reconfiguration rate are typically assigned to sub-values with a lower contribution, since said sub-values are more often modified than the other sub-values.
More specifically, in the second example of application, the assignment is done e.g. by determining the sub-values to be fixed. However, it is generally not useful to change the weight after training on the application base, hence the write efficiency is less important than for the first example application.
It is also stressed that the different application cases can also be combined in the same system. One could e.g. perform the continuous learning of the first application example on a system with fixed sub-values of the ‘transfer learning’ scenario. In such case, sufficient reconfigurability will be sought at the same time as an efficiency in such reconfiguration. In a third example of application, the cost to be optimized uses at least one performance metric of the calculator implementing the neural network during an inference of the neural network by considering only a part of the sub-values of every decomposition. The values w of the neural network parameters were fixed in such case according to the training data.
The third application thus aims to achieve an adjustable precision inference. The decomposition of the values w of the parameters into sub-values w0, . . . , wp is determined in particular according to the contribution of said sub-values w0, . . . , wp to the corresponding value w of the parameter. Typically, the idea is to use, during the inference phase, only the sub-values with the greatest contribution to the corresponding value w of the parameter, or even only the sub-value represented by the most significant bit. Such sub-values are then assigned memories with low access cost or with features enhancing the read. Inference calculations are then performed only on said sub-values. In this way it is possible to perform an approximate inference with optimized performance (low consumption/latency).
If better calculation precisions are required, and a higher consumption/latency is accepted, other sub-values of the decomposition are taken into account in the calculations. This is e.g. the case in a method for recognizing elements on an image (rough processing, then fine processing). Such sub-values are typically assigned to memories with a higher access cost and/or read features which are less optimized than the most significant sub-values. Such assignments can nevertheless have other advantages, e.g. having a lower write cost if the third application example is combined with one of the other cases of use which requires regular rewrite (in particular the first application example).
In the third application example, the training data comprise the values w taken by the neural network parameters at the end of the training on the test database (i.e. the final values obtained for the parameters). The values w of the parameters of the neural network correspond in such case to the final values.
At the end of the determination step 120, a calculator is obtained, which has hardware blocks (memory, calculation unit) assigned to the sub-values w0, . . . , wp of the decompositions so as to optimize a performance metric (latency, energy consumption) of the calculator. To this end, where appropriate, the calculator is configured beforehand, or even manufactured on the basis of the hardware blocks assigned to the sub-values w0, . . . , wp of the decompositions.
The optimization method comprises a step 130 of operating the calculator with the implementation determined for the neural network.
The operation corresponds to a training or inference phase of the neural network. In operation, calculations are performed on the sub-values wp of every decomposition according to data received at input by the neural network.
In an example of implementation, during the operation step, the sub-values w0, . . . , wp of every decomposition are each multiplied by an input value and the results of the resulting multiplications are summed (addition decomposition) or accumulated (concatenation decomposition) for obtaining a final value. Every final value obtained is then used for obtaining the output or outputs of the neural network.
In another example of implementation, during the operation step, the sub-values w0, . . . , wp of every decomposition are summed (addition decomposition) or concatenated (concatenation decomposition) and the resulting sum or concatenation is multiplied by an input value for obtaining a final value. Every final value obtained is then used for obtaining the output or outputs of the neural network.
More generally, as illustrated by
In the case of an addition decomposition, a decomposition of the variation Δw of the value w of the parameter is determined so as to optimize the cost of updating the corresponding sub-values w0, . . . , wp of the parameters. Such a decomposition for updating is illustrated in
The update presented for addition decomposition also applies to the generic case of decomposition resulting from a mathematical operation, as described above.
In the case of concatenation decomposition, the value of the variation Δw of the parameter is e.g. simply added bit by bit starting with the least significant bits. A communication mechanism between the memories on which the sub-values w0, . . . , wp are stored, makes it possible to communicate the transfer of bits to bits. Such an update is illustrated in
Thus, the present method makes it possible to obtain an automatic learning system for which the parameters which can be trained are divided into a plurality of sub-values, the sub-values w0, . . . , wp then being processed by hardware blocks (memories, calculation units) chosen so as to optimize the performance of the calculator. Such a method can thus be used for optimizing the performance of machine learning systems implementing a neural network.
In particular, for real-time adaptation applications of the neural network (first application) and transfer learning applications (second application), the present method makes it possible to use the fact that the values w of the network parameters are only slightly modified by small perturbations. Such small perturbations are typically much lower in terms of dynamics than the initial value of the parameter, have different statistical features, and modification frequencies potentially higher than the base weights.
Such a method is also of interest for applications such as approximate inference (third application), thus making it possible to optimize the performance of the calculator when an approximate precision in the output data is acceptable.
In particular, the present method enables MAC type operations to be executed in an optimized manner through a decomposition of the values w of every parameter into a plurality of sub-values w0, . . . , wp, and by performing an optimized MAC operation on each of said sub-values. For this purpose, an optimized calculation or storage unit is e.g. chosen for every sub-value w0, . . . , wp of the decomposition.
A person skilled in the art will understand that the embodiments and variants described above can be combined so as to form new embodiments provided that same are technically compatible.
Number | Date | Country | Kind |
---|---|---|---|
21 10153 | Sep 2021 | FR | national |