This application is the national phase entry of International Application No. PCT/CN2021/119513, filed on Sep. 22, 2021, which is based upon and claims priority to Chinese Patent Application No. 202110421738.5, filed on Apr. 20, 2021, the entire contents of which are incorporated herein by reference.
The present invention relates to a quantization method for a lightweight neural network (LNN).
Recently, a great deal of work has explored quantization techniques for traditional models. However, when these techniques are applied to lightweight networks, there will be a large loss of accuracy. For example, when MobileNetv2 is quantized, the accuracy of the ImageNet dataset drops from 73.03% to 0.1% (Jacob Benoit et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018: pp. 2,704-2,713). In another example, a 2% loss of accuracy is caused (Raghura Krishnamoorthi. Quantizing Deep Convolutional Networks for Efficient Inference: A Whitepaper. CoRR, abs/1806.08342, 2018). To recover these losses of accuracy, much work opts for retraining or training-time quantization techniques. But these techniques are time-consuming and require dataset support. To solve these problems, Nagel et al. proposed a data-free quantization (DFQ) algorithm. They attributed the poor performance of traditional quantization methods on models adopting depthwise separable convolutions (DSCs) to differences in weight distribution. For that, they proposed cross-layer weight balance to adjust the balance of weights between different layers. This technique is only applicable to network models using the rectified linear unit (ReLU) as an activation function. However, currently, most lightweight networks use ReLU6. If ReLU6 is directly replaced by ReLU, it will cause a significant loss of accuracy. Furthermore, the method proposed by Nagel et al. is not suitable for pure integer quantization.
The technical problems to be solved by the present invention are that: A simple combination of a lightweight neural network (LNN) and a quantization technique will lead to a significantly reduced accuracy or a longer retraining time. In addition, currently, many quantization methods only quantize the weights and feature maps, but the offset and quantization coefficients are still floating-point numbers, which is not beneficial to the application specific integrated circuit (ASIC) field-programmable gate array (FPGA).
In order to solve the above technical problems, the present invention adopts the following technical solution: a pure integer quantization method for an LNN, including the following steps:
Preferably, when t=0, no imbalance transfer is performed; and when t=1, all imbalances between the channels of the feature map of the current layer are transferred to the weights of the next layer.
Preferably, the current layer is any layer except a last layer in the LNN.
The algorithm provided by the present invention is verified on SkyNet and MobileNet respectively, and lossless INT8 quantization on SkyNet and maximum quantization accuracy so far on MobileNetv2 are achieved.
The FIGURE is a schematic diagram of an imbalance transfer for 1×1 convolution.
The present invention will be described in detail below with reference to specific embodiments. It should be understood that these embodiments are only intended to describe the present invention, rather than to limit the scope of the present invention. In addition, it should be understood that various changes and modifications may be made on the present invention by those skilled in the art after reading the content of the present invention, and these equivalent forms also fall within the scope defined by the appended claims of the present invention.
The analysis and modeling of the quantization process of a neural network shows that the balance of a tensor can be used as a predictive index of a quantization error. Guided by this predictive index, the present invention proposes a tunable imbalance transfer algorithm to optimize the quantization error of a feature map, the specific contents are as follows:
In the current neural network computing mode, weights can be quantized channel by channel, but the feature maps can only be quantized layer by layer. Therefore, the quantization error of the weights is small, but the quantization error of the feature maps is large.
The present invention divides a value of each pixel in each of the channels of the feature map of a current layer in a neural network by a maximum value of each pixel in the channel, and then performs quantization to achieve equivalent channel-by-channel quantization. In order to ensure that the calculation result remains unchanged, for weights convolved with the feature map, the value of each of the channels is multiplied by the maximum value of each pixel in the channel of the corresponding feature map. As a result, the imbalances between the channels of the feature map of the current layer are all transferred to the weights of the next layer.
However, in fact, it is not the optimal solution to transfer all the imbalances between the channels of the feature map. In order to tune the level of the imbalance transfer, the present invention additionally adds a hyperparameter imbalance transfer coefficient t. In the above steps, the value of each pixel in each of the channels of the feature map is divided by the t-th power of the maximum value of each pixel in the channel, where t ranges from 0 to 1. When t=0, no imbalance transfer is performed. When t=1, all the imbalances are transferred as mentioned above. By tuning t, the present invention can obtain optimal quantization accuracy. This operation is applicable to any network model and any convolution kernel size.
The FIGURE shows an imbalance transfer for 1×1 convolution. The dotted tensors share the same quantization coefficients. The value of each pixel in each of the channels of A1 is divided by the maximum value of each pixel in the channel, and the corresponding channel of W2 is multiplied by this maximum value. This operation ensures that the calculation result remains unchanged, but the balance of A1 is greatly increased, and simultaneously, the balance of the weights is not decreased significantly. Therefore, the quantization error of the feature map can be reduced, so as to improve the accuracy of the model after quantization.
Number | Date | Country | Kind |
---|---|---|---|
202110421738.5 | Apr 2021 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2021/119513 | 9/22/2021 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2022/222369 | 10/27/2022 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
10527699 | Cheng et al. | Jan 2020 | B1 |
20190042948 | Lee | Feb 2019 | A1 |
20190279072 | Gao | Sep 2019 | A1 |
20190294413 | Vantrease et al. | Sep 2019 | A1 |
20200401884 | Guo | Dec 2020 | A1 |
20210110236 | Shibata | Apr 2021 | A1 |
20220086463 | Coban | Mar 2022 | A1 |
Number | Date | Country |
---|---|---|
105528589 | Apr 2016 | CN |
110930320 | Mar 2020 | CN |
111311538 | Jun 2020 | CN |
111402143 | Jul 2020 | CN |
111937010 | Nov 2020 | CN |
112418397 | Feb 2021 | CN |
112488070 | Mar 2021 | CN |
112560355 | Mar 2021 | CN |
113128116 | Jul 2021 | CN |
WO-0074850 | Dec 2000 | WO |
WO-2005048185 | May 2005 | WO |
2018073975 | Apr 2018 | WO |
Entry |
---|
Cho et al. (Per-channel Quantization Level Allocation for Quantizing Convolutional Neural Networks, Nov. 2020, pp. 1-3) (Year: 2020). |
Kang et al. (Decoupling Representation and Classifier for Long-Tailed Recognition, Feb. 2020, pp. 1-16) (Year: 2020). |
Polino et al. (Model Compression via Distillation and Quantization, Feb. 2018, pp. 1-21) (Year: 2018). |
Benoit Jacob, et al., Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference, IEEE, 2018, pp. 2704-2713. |
Raghuraman Krishnamoorthi, Quantizing deep convolutional networks for efficient inference: A whitepaper, 2018, pp. 1-36. |
Markus Nagel, et al., Data-Free Quantization Through Weight Equalization and Bias Correction, Qualcomm AI Research, 2019. |
Liu Guanyu, et al., Design and implementation of real-time defogging hardware accelerator based on image fusion, Hefei University of Technology, Master's Dissertation, 2020, pp. 1-81. |
Number | Date | Country | |
---|---|---|---|
20230196095 A1 | Jun 2023 | US |