The present invention relates to an information processing apparatus, an information processing method, and a storage medium.
In recent years, research and development has been conducted on image recognition techniques using a neural network (NN). Recent NNs have a large number of layers leading to a large calculation amount, but there may be a case with limited calculation resources. Thus, an efficient calculation method has been called for.
As the efficient calculation method, there has been known a method of quantizing data of an NN into a low precision numerical value and performing the calculation. The quantization makes it easier for the NN operation to be performed on devices with limited calculation resources.
According to a technique disclosed in “8-bit Inference with TensorRT”, Szymon Migacz, NVIDIA, May 8, 2017, with an NN learning of which is performed using high precision numerical values, a distribution of output values are obtained using a large amount of data for each layer, and a quantization parameter minimizing the loss of the distribution after the quantization is selected.
According to one embodiment of the present invention, an information processing apparatus comprises: an obtaining unit configured to obtain information indicating a size of an output as a result of a first operation in a neural network that performs the first operation using a weight coefficient for input data and a second operation of quantizing a result of the first operation, in order to obtain data of an intermediate layer; and a control unit configured to control the first operation in the neural network to adjust the size of the output based on the information and a quantization parameter used for the quantization.
According to another embodiment of the present invention, an information processing method comprises: obtaining information indicating a size of an output as a result of a first operation in a neural network that performs the first operation using a weight coefficient for input data and a second operation of quantizing a result of the first operation, in order to obtain data of an intermediate layer; and controlling the first operation in the neural network to adjust the size of the output based on the information and a quantization parameter used for the quantization.
According to still another embodiment of the present invention, a non-transitory computer-readable storage medium stores a program which, when executed by a computer comprising a processor and a memory, causes the computer to: obtaining information indicating a size of an output as a result of a first operation in a neural network that performs the first operation using a weight coefficient for input data and a second operation of quantizing a result of the first operation, in order to obtain data of an intermediate layer; and controlling the first operation in the neural network to adjust the size of the output based on the information and a quantization parameter used for the quantization.
Further features of the present invention will become apparent from the following description of exemplary embodiments (with reference to the attached drawings).
Hereinafter, embodiments will be described in detail with reference to the attached drawings. Note, the following embodiments are not intended to limit the scope of the claimed invention. Multiple features are described in the embodiments, but limitation is not made to an invention that requires all such features, and multiple such features may be combined as appropriate. Furthermore, in the attached drawings, the same reference numerals are given to the same or similar configurations, and redundant description thereof is omitted.
In general, a small quantization parameter for quantizing an output of an intermediate layer of an NN leads to a high risk of deterioration of recognition accuracy of the NN due to truncation or rounding of the output value. On the other hand, a large quantization parameter leads to low resolution for the output value which may result in deteriorated recognition accuracy of the NN. The quantization parameter individually settable for each layer enables suppression of the deterioration of the recognition accuracy, but is likely to result in combinational explosion.
An embodiment of the present invention provides an information processing apparatus that suppresses deterioration of recognition accuracy, with a quantization parameter for an intermediate layer of a neural network including a quantization operation set to be small.
The CPU 11 is a central processing unit and executes a control program stored in the ROM 12 and the RAM 13 to implement various types of control performed by functional units of the information processing apparatus 1 described below. In addition, the CPU 11 executes a Single Instruction, Multiple Data (SIMD) instruction, and collectively processes 8-bit integer type operations in inference processing to be described below.
The ROM 12 is a nonvolatile memory, and stores data including a control program and various parameters. Here, the control program is executed by the CPU 11 to realize various types of control processing. The RAM 13 is a volatile memory, and temporarily stores an image as well as a control program and a result of executing the program.
The storage unit 14 is a rewritable secondary storage device such as a hard disk or a flash memory, and stores various types of data used for each processing according to the present embodiment. The storage unit 14 can store, for example, an image used for calculation of a quantization parameter as well as a control program and a result of processing thereof, and the like. These various types of information are output to the RAM 13 to be used for program execution by the CPU 11.
The input/output unit 15 functions as an interface with the outside. The input/output unit 15 obtains a user input, and may be, for example, a mouse and a keyboard, a touch panel, or the like. The display unit 16 is, for example, a monitor, and can display a processing result of a program, an image, and the like. The display unit 16 may be implemented as a touch panel together with the input/output unit 15, for example. The functional units of the information processing apparatus 1 are communicably connected to each other through the connection bus 17, and transmit and receive data to and from each other.
In the present embodiment, each processing described below is implemented by software using the CPU 11. However, the processing may be partially or entirely implemented by hardware as long as the processing can be similarly executed. As the hardware, a dedicated circuit (ASIC), a processor (reconfigurable processor or DSP), or the like may be used. The software for executing each processing may be obtained via a network or various storage media and executed by a processing apparatus such as a personal computer.
Here, a set from the NN to the activation function is assumed to be one layer unit. For example, the layer 401 includes a CNN layer 404, a normalization layer 405, and a ReLU layer 406 as one unit layer. The layer 402 is an intermediate layer having a layer configuration similar to that of the layer 401. The layer 403 includes an FC layer 410 and a ReLU layer 411 as one unit layer. Hereinafter, the output of a layer (intermediate layer) refers to the output of one unit layer. In the following description, when a layer i (1≤i) is described, i indicates an index of one unit layer. In the example illustrated in
The layer 401 is an input layer and performs a convolution operation on an input image. A layer 403 is an output layer that outputs a likelihood map of a specific object in the input image. This is merely an example, and the number of layers may be different, or a layer executing processing different from those described above may be included. For example, the layers may include a pooling layer. While a learned model is used for the NN in this example, a model initialized by using a known NN weight initialization method as described in “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification”, Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026-1034 may be used instead.
In the information processing apparatus 1 according to the present embodiment, input data is input to the NN and an inference result is output. Here, when the input data is input to the NN, the distribution calculation unit 203 obtains an output distribution through the first operation which is an operation using a weight coefficient (hereinafter, simply referred to as a weight) in each intermediate layer. An output distribution Yi is information indicating the size of an output of a layer i, and may be, for example, the maximum value of the output of the layer i or a value corresponding to top 99.9% of output values of the layer i in the ascending order. The output distribution Yi may be a value calculated by the following Formula (1) using an average μi and a standard deviation σi of the output values. In the formula, n can be set to a value conforming to a desired condition such as 4 or 5 for example. Thus, in the present embodiment, the output distribution is information calculated based on the distribution of the outputs obtained by the first operation, and in particular, may be calculated as information indicating the upper limit excluding the outlier of the outputs. In the present embodiment, the output distribution is obtained from N×M output values, where N is the number of mini batches of the input data and M is the number of output channels of the layer.
Y
i=μi+nσi Formula (1)
The information processing apparatus 1 according to the present embodiment can set the quantization parameter and perform the second operation of quantizing the data of the NN including the result of the first operation (layer output). If the layer output to be quantized exceeds the quantization parameter, it is likely that many output values are truncated or rounded off in the quantization. Thus, the recognition accuracy of the NN may be deteriorated. In view of this, the information processing apparatus 1 controls the first operation so as to adjust the size of the output from the layer based on the output distribution and the quantization parameter. In particular, by adjusting the weight of the NN to achieve a small output distribution with respect to the quantization parameter (for example, equal to or smaller than the quantization parameter), the deterioration of the recognition accuracy in the quantization can be suppressed without using a large quantization parameter.
The information processing apparatus 1 according to the present embodiment performs learning of the NN based on the quantization parameter, to achieve a small output distribution. Such an example will be described below.
In S302, the data obtaining unit 201 obtains a mini batch of images. This mini batch is input data to the NN, including one or more images, and is a set of input images to be input to the NN obtained in S301. For example, the mini batch is assumed to be a set of 32 images (the number of images included in the mini batch is N (N=32)). While a model for detecting a recognition target in an image is used in the present embodiment, an image included in the mini batch may or may not include the recognition target.
In S303, the distribution calculation unit 203 inputs the mini batch images obtained in S302 to the model obtained in S301, and executes inference processing. Here, the distribution calculation unit 203 performs an operation using a weight coefficient in each layer of the NN for the input data, to obtain an output of the layer.
The distribution calculation unit 203 in S304 aggregates the output values from the respective layers obtained in the inference processing executed in S303, and obtains the output distribution Yi based on the aggregated output values. In S305, the distribution calculation unit 203 outputs a set {Yi} of values of the output distributions of the respective layers.
As described above, the information processing apparatus 1 according to the present embodiment suppresses deterioration of the recognition accuracy due to the quantization, through learning (by determining the weight of the NN) to make such an output distribution Yi small. In other words, the information processing apparatus 1 performs learning in such a manner that the output distribution exceeding the quantization parameter results in a large loss.
In S501, the regularization item calculation unit 206 obtains the set {Yi} of values of the output distribution. In S502, the parameter obtaining unit 205 obtains a quantization parameter q. In the present embodiment, the quantization parameter q is set in advance and is assumed to be q=4 in the following description, but a value calculated according to another parameter may be used as the quantization parameter q.
In the example illustrated in
In S503, the coefficient calculation unit 210 calculates a coefficient C in the layer i using the output distribution Yi of the layer i and the quantization parameter q obtained. This C thus calculated is used in loss calculation processing in S508 described below. The coefficient C is not particularly limited as long as it is a value that increases with Yi. For example, the coefficient C may be calculated by the following Formula (2) or Formula (3), and a power of Yi may be used instead of Yi in Formula (2) and Formula (3).
In S504, the correction amount calculation unit 211 calculates a correction amount D for correcting the regularization item using the output distribution Yi and the quantization parameter q. The correction amount D is used for the loss calculation processing in S508 described below. The correction amount D is determined by, for example, the following Formula (4) to be large when Yi exceeds the quantization parameter q.
In S505, the learning unit 208 obtains the model of the NN for which the learning is performed from the model obtaining unit 202. In S506, the supervisor obtaining unit 207 obtains a mini batch corresponding to the input image to be used as supervisory data. In S507, the supervisor obtaining unit 207 obtains correct answer data for the mini batch obtained in S506, and obtains the supervisory data as a combination of these. The correct answer data is data including information indicating a detection target region in the mini batch. Although image data that is the same mini batch as that used for calculating the output distribution in S302 is used in this example, the present invention is not particularly limited to this, and a different mini batch may be used.
In S508, the learning unit 208 executes inference processing with the mini batch obtained in S506 being an input, using the model obtained in S505, to calculate a loss (objective function) between the output and the correct answer data obtained in S507. When the task of the NN is a task of detecting a region, the loss function which is the objective function may be a square error or a cross-entropy error. The learning unit 208 calculates a regularization item for each layer and adds the regularization item to the loss. The regularization item for the layer i may be given as λ(wi)2 (as L2 regularization item), where wi is the weight of the layer i. Note that the regularization item may be given as L1 regularization item or may be given by a combination between the items. Note that λ is a coefficient applied to the regularization item and is set based on the coefficient C calculated in S503 and the correction amount D calculated in S504. For example, λ may be implemented, for example, as in the following Formula (5). Thus, a simple description “regularization item” in the following description indicates a regularization item including the loss function used by the learning unit 208.
In the formula, α and β are constants. With such a configuration, the learning is performed in such a manner that when Yi exceeds q, a larger value of the excess leads to a larger loss. Thus, the learning of the NN proceeds without the output value of the layer exceeding the quantization parameter, whereby deterioration of the recognition accuracy due to quantization can be suppressed.
In S509, the learning unit 208 calculates a gradient by backpropagation using the loss calculated in S508, and calculates an update amount of the weight of the model. In the S510, the learning unit 208 updates the weight of the NN. Since a known NN learning method can be basically used for S501 to S511 in S511, a detailed description thereof will be omitted. The model with the updated weight is output, and the processing is terminated. With such learning processing repeated until the learning loss or the recognition accuracy converges (to a desired precision), the weight of the NN model can be determined.
In this way, the learning unit 208 can perform learning of the NN to make the output distribution Yi small with respect to the quantization parameter q. Note that the processing described with reference to
The quantization unit 209 quantizes the weight and output of the NN, as a result of the learning by the weight determination unit 204. A known technique can be used for quantization of the NN, and thus a detailed description thereof will be omitted. In the quantization processing according to the present embodiment, it is assumed that a 32 bit value of a single precision floating point is quantized to an integer 8 bit value, but the type and the value are not limited these as long as the quantization is executed.
With such a configuration, the information processing apparatus 1 first obtains the information indicating the size of the output of the first operation using the weight coefficient for the input data in the intermediate layer of the NN. Next, the information processing apparatus 1 can control the first operation so as to adjust the size of the output described above based on the obtained information and the quantization parameter used for the quantization of the NN including the result of the first operation. Therefore, the deterioration of the recognition accuracy due to the quantization can be suppressed by reducing the size of the output of the calculation in the intermediate layer without increasing the quantization parameter. In addition, by setting the quantization parameter to a constant common to the layers, it is possible to reduce the processing load compared with a case where an individual quantization parameter is set for each layer, and to prevent the quantization parameter from resulting in a combinational explosion.
In the first embodiment, an example has been described in which learning of the weights of the NN is performed so as to adjust the output distribution based on the output distribution and the quantization parameter. On the other hand, an information processing apparatus 6 according to the second embodiment adjusts the output distribution by correcting the weight of the NN based on the output distribution and the quantization parameter.
Since q=4 in the present embodiment, an output value of each layer exceeding 4 is rounded to 4 as a result of the quantization, meaning that the recognition accuracy is deteriorated. In view of this, the weight correction unit 601 corrects the weight of the NN (regardless of the learning) to prevent the output distribution from exceeding the quantization parameter.
The weight correction unit 601 according to the present embodiment corrects the weight of the NN to set the output distribution to equal to or smaller than the quantization parameter. In the example of
To set the output of the layer 701 to be ¼, the weight parameters γ and δ in Formula (6) may each be multiplied by ¼. In this case, the weight correction unit 601 corrects the weight of the NN by multiplying γ and β by ¼, and outputs the result as the weight of the layer 701.
Then, the weight correction unit 601 corrects the weight in a similar manner in the subsequent layers such as a layer 702. In the example illustrated in
Further, since the output of the layer 703 is 3.5, it is not necessary to change the value of the output distribution. However, since the output distribution is halved in the layer 702, it is necessary to double each of the weight w and the bias b of the FC layer in order to maintain the output value. Thus, the weight correction unit 601 corrects the weight of the NN, by doubling each of a weight w and a bias b of the FC layer and outputting the result as the weight of the layer 703.
The quantization unit 209 may quantize the model of the NN with the weight thus corrected, or may quantize the model of the NN with the learning performed by the regularization item calculation unit 206 and the learning unit 208. The weight correction unit 601 may execute correction processing when the value of the output distribution exceeds a predetermined value (for example, the quantization parameter). Furthermore, for example, the weight correction unit 601 may execute the weight correction processing when learning is performed for a predetermined number of times by the model of the NN.
The weight correction processing by the weight correction unit 601 may be applied to the NN learned to set the value of the output distribution to be small as in the first embodiment, but reduction of the output distribution by the learning is insufficient. Further, the correction processing may be applied to an NN to which the learning of the first embodiment is not applied.
With such processing, the output distribution can be adjusted so as not to exceed the quantization parameter by correcting the weight of the NN. Therefore, the deterioration of the recognition accuracy due to the quantization can be suppressed by reducing the size of the output of the calculation in the intermediate layer without increasing the quantization parameter.
An information processing apparatus 8 according to the present embodiment quantizes the weight of the NN and corrects the regularization item used by the weight determination unit 204 based on the recognition accuracy of the NN on the detection target each of before and after the quantization. For example, the information processing apparatus 8 can adjust the degree of contribution of the normalization term at the time of learning, by evaluating the condition of deterioration of the recognition accuracy due to quantization of the NN and correcting the regularization item in accordance with the condition of deterioration.
The evaluation data obtaining unit 802 obtains evaluation data that is data for evaluating the recognition accuracy of the NN for the detection target. This evaluation data is prepared in advance and is a set of a mini batch and correct answer data as in the supervisory data used in the first embodiment. The real number inference unit 801 executes inference processing (recognition of a detection target) with the mini batch included in the evaluation data being an input, by using the model of the NN after the learning by the learning unit 208.
The first evaluation unit 803 evaluates the recognition accuracy of the NN for the detection target. Here, it is assumed that the first evaluation unit 803 evaluates the value of a loss (E1) output by the inference processing executed by the real number inference unit 801, as the recognition accuracy. Alternatively, the first evaluation unit 803 may evaluate different information indicating the success rate of recognition, such as the accuracy rate or likelihood of recognition on the detection target, as the recognition accuracy for example. Hereinafter, a simple description “recognition accuracy” refers to recognition accuracy for a detection target.
The quantization inference unit 804 executes the inference processing with the mini batch included in the evaluation data being an input by using the model of the NN (used for the inference by the real number inference unit 801) whose weight has been quantized by the quantization unit 209.
The second evaluation unit 805 evaluates the recognition accuracy of the NN, with the weight quantized, for the detection target, used by the quantization inference unit 804. The evaluation of the recognition accuracy by the second evaluation unit 805 is performed in a similar manner to the evaluation by the first evaluation unit 803, and it is assumed here that a loss E2 output by the inference is evaluated as the recognition accuracy.
The regularization item correction unit 806 corrects the regularization item based on the evaluation of the recognition accuracy by the first evaluation unit 803 and the evaluation of the recognition accuracy by the second evaluation unit 805. Here, the regularization item correction unit 806 may evaluate the deterioration degree of the recognition accuracy of the NN due to the quantization of the weight, by using the evaluation of the recognition accuracy by the first evaluation unit 803 and the evaluation of the recognition accuracy by the second evaluation unit 805, and correct the normalization term using this evaluation.
In the present embodiment, the regularization item correction unit 806 evaluates a deterioration degree F. of the recognition accuracy of the NN due to the quantization of the weight, by using the following Formula (7). Since E1 and E2 are values of the loss function, a larger F results in a larger deterioration of the recognition accuracy due to quantization, meaning that the deterioration degree is higher with larger F.
F=E1+E2 Formula (7)
The regularization item correction unit 806 may correct the regularization item using the deterioration degree, by calculating a corrected regularization item λ′ using the value of the deterioration degree F. as a coefficient of the regularization item using, for example, the following Formula (8). In this way, it is possible to correct the contribution of the regularization item at the time of learning in accordance with the deterioration degree of the recognition accuracy. Specifically, when the deterioration degree is low, the degree of contribution of the normalization term at the time of learning can be reduced, and when the deterioration degree is high, the degree of contribution of the normalization term at the time of learning can be increased.
λë′=Fλ Formula (8)
The normalization term correction processing does not need to be executed each time update processing for the weight of the NN by the learning unit 208, and may be executed each time the learning is performed for predetermined number of times for example.
With such a configuration, it is possible to correct the regularization item at the time of learning in accordance with a change in recognition accuracy before and after quantization of the NN. Therefore, it is possible to adjust the degree of contribution of the regularization item at the time of learning in accordance with the deterioration degree of the recognition accuracy of the NN due to quantization.
Embodiment(s) of the present invention can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a ‘non-transitory computer-readable storage medium’) to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s). The computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)™), a flash memory device, a memory card, and the like.
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
This application claims the benefit of Japanese Patent Application No. 2022-078954, filed on May 12, 2022, which is hereby incorporated by reference herein in its entirety.
| Number | Date | Country | Kind |
|---|---|---|---|
| 2022-078954 | May 2022 | JP | national |