The present invention is a technique related to learning of a neural network. A preferable application example is a technique related to learning of AI (Artificial Intelligence) using deep learning.
In the brain of an organism, a large number of neurons are present, and each neuron performs a signal input from many other neurons and a movement to output a signal to many other neurons. It is a neural network such as Deep Neural Network (DNN) that attempts to realize such a brain mechanism with a computer, and is an engineering model that mimics the behavior of a biological neural network. As an example of DNN, there is a Convolutional Neural Network (CNN) effective for object recognition and image processing.
In recent years, CNN has been applied to automatic driving, and motions for realizing object recognition, action prediction, and the like have been accelerated. However, in general, CNN has a large amount of calculation, and in order to be mounted on an on-vehicle ECU (Electronic Control Unit) or the like, it is necessary to reduce the weight of CNN. One of the ways to reduce the weight of CNN is bitwidth reduction of operation. Qiu et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. FPGA'16 describes a technology for realizing CNN by low bitwidth operation.
In Qiu et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network, FPGA'16, a sampling area (quantization area) for bitwidth reduction is set according to the distribution of weighting factors and feature maps for each layer. However, changes in the distribution of weighting factors and feature maps due to relearning after bitwidth reduction are not considered. Therefore, there is a problem that information loss due to overflow occurs when the distribution of weighting factors and feature maps changes during relearning and deviates from the sampling area set in advance for each layer.
The above-mentioned problem which inventors examined is explained in detail in
As described above, in the weighting factor learning process, the weighting factor is optimized by repeating relearning. At this time, when learning is performed again using, the weighting factor that has been reduced in bitwidth, the weighting factor changes, and the distribution of the weighting factors also changes as shown in
Therefore, an object of the present invention is to enable appropriate calculation while reducing the weight of CNN by bitwidth reduction of operation.
A preferred aspect of the present invention is a neural network learning device including a bitwidth reducing unit, a learning unit, and a memory. The bitwidth reducing unit executes a first quantization that applies a first quantization area to a numerical value to be calculated in a neural network model. The learning unit performs learning with respect to the neural network model to which the first quantization has been executed. The bitwidth reducing unit executes a second quantization that applies a second quantization area to a numerical value to be calculated in the neural network model on which learning has been performed in the learning unit. The memory stores the neural network model to which the second quantization has been executed.
Another preferable aspect of the present invention is a neural network learning method that learns a weighting factor of a neural network by an information processing apparatus including a bitwidth reducing unit, a learning unit, and a memory. This method includes a first step of executing, by the bitwidth reducing unit, a first quantization that applies a first quantization area to a weighting factor of an arbitrary neural network model that has been input; a second step of performing, by the learning unit, learning with respect to the neural network model to which the first quantization has been executed; a third step of executing, by the bitwidth reducing unit, a second quantization that applies a second quantization area to a weighting factor of the neural network model on which the learning has been performed in the learning unit; and a fourth step of storing, by the memory, the neural network model to which the second quantization has been executed.
According to the present invention, it is possible to perform appropriate calculation while reducing the weight of CNN by bitwidth reduction of operation.
An embodiment will be described below with reference to the drawings. However, the present invention should not be construed as being limited to the description of the embodiments shown below. Those skilled in the art can easily understand that specific configurations can be changed in a range not departing from the spirit or gist of the present invention.
In the configuration of the invention described below, the same portions or portions having similar functions are denoted by the same reference numerals in different drawings, and redundant description may be omitted. In the case where there are a plurality of elements having the same or similar functions, the same reference numerals may be described with different subscripts. However, in the case where it is not necessary to distinguish a plurality of elements, subscripts may be omitted and described.
In the present specification and the like, the expressions “first”, “second”, “third” and the like are used to identify the constituent elements and do not necessarily limit the number, order, or contents thereof. The number for identifying components is used for each context, and the number used in one context does not necessarily indicate the same configuration in another context. In addition, it does not prevent that the component identified by a certain number doubles as the function of the component identified by another number.
The positions, sizes, shapes, ranges, and the like of the components shown in the drawings and the like may not represent actual positions, sizes, shapes, ranges, and the like in order to facilitate understanding of the invention. For this reason, the present invention is not necessarily limited to the position, size, shape, range, etc. disclosed in the drawings and the like.
In this embodiment, the sampling area of the weighting factor is dynamically changed according to the change of the weighting factor during relearning after bitwidth reduction in (B). Dynamic change of the sampling area reduces bitwidth while preventing overflow. Specifically, each time 1 iteration (one iteration) relearning is performed, the weighting factor distribution for each layer is summed up, and a range between the maximum value and the minimum value of the weighting factors is reset as a sampling area. Thereafter, as shown in
The process described in
The configuration of the information processing apparatus may be configured by a single computer, or any part of the input device, the output device, the processing device, and the storage device may be configured by another computer connected by a network. Also, functions equivalent to the functions configured by software can be realized by hardware such as a field programmable gate array (FPGA) or an application specific integrated circuit (ASIC). Such an embodiment is also included in the scope of the present invention.
The configuration shown in
The operation based on the flowchart of
Step 100: As inputs, an original CNN model before bitwidth reduction and a sampling area initial value for performing low bitwidth quantization of the weighting factor of the original CNN model are provided. The sampling area initial value may be a random value or a preset fixed value.
Step 101: Based on the sampling area initial value, the weighting factor of the original CNN model is low-bitwidth quantized by a quantization circuit (P100) to generate a low-bitwidth quantized CNN model. In a specific example, when low bitwidth quantization is performed on n bits, quantization is performed by dividing the sampling area into 2n areas at equal intervals.
Step 102: A control circuit A (P101) determines whether the weighting factor of the low-bitwidth quantized CNN model deviates from the sampling area initial value (overflow). If an overflow occurs, the process proceeds to step 103. If an overflow does not occur, the low-bitwidth quantized CNN model is used as a low bitwidth model without overflow, and the process proceeds to step 104.
Step 103: If an overflow occurs, the sampling area is corrected so as to expand by a predetermined value, and low bitwidth quantization of the weight parameter is performed again by the quantization circuit (P100). Thereafter, the process returns to step 102 to determine again whether or not the weighting factor has overflowed.
Step 104: In a relearning circuit (P102), 1 iteration relearning is performed for the low-bitwidth model without overflow. In the present embodiment, the CNN learning itself may follow the prior art.
Step 105: If the distribution of weighting factors changes due to relearning, a control circuit A (P106) determines whether the weighting factors have overflowed in the sampling area set in step 103. If an overflow occurs, the process proceeds to step 106. If an overflow does not occur, the process proceeds to step 108.
Step 106: If it is determined in step 105 that an overflow will occur, a sampling area resetting circuit (P104) corrects the sampling area again so as to expand it and prevents the overflow from occurring.
Step 107: A quantization circuit (P105) performs quantization again based on the sampling area set in step 106, thereby generating a bitwidth-reduced CNN model without overflow. Specifically, when low bitwidth quantization is performed on n bits, quantization is performed by dividing the sampling area into 2n areas at equal intervals.
Step 108: If the learning loss indicated by the loss function at the time of learning the bitwidth-reduced CNN model without overflow generated in step 107 is less than a threshold th, the processing is terminated and output as a low bitwidth CNN model. On the contrary, if it is equal to or more than the threshold, the process returns to step 104 and the relearning process is continued. This determination is performed by a control circuit B (P103). The output low bitwidth CNN model or the low bitwidth CNN model during relearning is stored in an external memory (P107).
By the above processing, even when the weighting factor changes due to relearning, it is possible to reduce the bitwidth of information while avoiding overflow. In the above example, the presence or absence of an overflow is checked, and the sampling area is corrected when an overflow occurs. However, the checking of the presence or absence of the overflow may be omitted, and the sampling area may be always updated every relearning. Alternatively, without limiting to the overflow, the sampling area may be updated upon change of the distribution of weighting factors. By setting the sampling area to cover the maximum value and the minimum value and performing requantization regardless of the overflow, it is possible to set an appropriate sampling area even if the sampling area is too wide. Also, in
When the configuration of
By the process described with reference to
The second embodiment shown in
An operation based on the flowchart of
Step 205: With respect to the low bitwidth CNN model output in the first embodiment, it is determined whether the value of the weighting factor is equal to or more than the arbitrary threshold. If it is equal to or more than the threshold, the process proceeds to step 206, and if it is less than the threshold, the process proceeds to step 207.
Step 206: If it is determined in step 205 that the value of the weighting factor is equal to or more than the threshold, it is excluded as an outlier.
The configuration of
The third embodiment shown in
The operation of the configuration of
Step 301: Thinning of unnecessary neurons in the network is performed with respect to the original CNN model before bitwidth reduction.
Step 302: Fine tuning is applied to the thinned-out CNN model.
When the configuration of
The first to third embodiments have been described by taking the quantization of the weighting factor as an example. Similar quantization can be applied to feature maps that are the input and output of convolution operations. The feature map refers to an object x into which the weighting factor is to be convoluted and a result y into which the weighting factor is convoluted. Here, focusing on a certain layer of the neural network, the input/output is
y=w*x
y: output feature map
(It is the input feature map of the next layer. Output from neural network in case of the last layer.)
w: weighting factor
*: convolution operation
x: input feature map
(It is the output feature map of previous layer. Input to the neural network in case of the first layer). Thus, when the weighting factor changes due to relearning, the output feature map (that is, the input feature map of the next layer) also changes.
Therefore, by discretizing not only the weighting factor but also the object x to be convoluted and the convoluted result y, the calculation load can be further reduced. At this time, as in the case of the quantization of the weighting factors in the first to third embodiments, requantization of the feature map can be performed when there is a change in the distribution of the feature map or when there is an overflow. Alternatively, feature map requantization can be performed unconditionally at each relearning. Further, as in the second embodiment, also in the quantization of the feature map, outlier extrusion processing may be performed. Alternatively, only the feature map may be quantized or requantized without quantization or requantization of the weighting factor. By requantizing both the weighting factor and the feature map, it is possible to obtain the maximum calculation load reduction effect and to suppress the decrease in recognition accuracy due to the overflow.
As in the case of the weighting factor, the quantized feature map is also implemented in the FPGA. In normal operation, it may be assumed that a value of the same number of digits is input in order to input the same information as in learning. For example, in the case of handling an image of a standardized size, an appropriate setting can be made with the same quantization number in learning and in operation. Therefore, the amount of calculation can be effectively reduced.
According to the embodiments described above, it is possible to reduce the weight of the CNN by reducing the bitwidth of calculation and to suppress the information loss due to deviation of the numerical value to be calculated from the sampling area. The CNN learned by the apparatus or method of the embodiment has an equivalent logic circuit implemented in, for example, an FPGA. At this time, since the numerical value to be calculated is appropriately quantized, it is possible to reduce the calculation load while maintaining the calculation accuracy.
Number | Date | Country | Kind |
---|---|---|---|
2018-128241 | Jul 2018 | JP | national |