The present application claims priority of the Chinese Patent Application No. 202011558175.6, filed on Dec. 25, 2020, the disclosure of which is incorporated herein by reference in its entirety as part of the present application.
Embodiments of the present disclosure relate to a quantization method and quantization apparatus for a weight of a neural network, and a storage medium.
Neural network models are widely used in fields such as computer vision, speech recognition, natural language processing, and reinforcement learning, etc. However, neural network models are highly complex and thus can hardly be applied to edge devices (e.g., cellphones, smart sensors, wearable devices, etc.) with very limited computing speed and power.
A neural network, which is implemented on the basis of a crossbar-enabled analog computing-in-memory (CACIM) system, can reduce the complexity of neural network models, so that neural network models can be applied to edge devices. Specifically, the CACIM system includes a computing and storage unit that is capable of performing data computing where data is stored, thereby saving the overhead that is caused by data transportation. In addition, the computing and storage unit in the CACIM system can perform multiplication and addition operations on the basis of Kirchhoff's current law and Ohm's law, thereby reducing the computing overhead of the system.
At least one embodiment of the present disclosure provides a quantization method for a weight of a neural network, the neural network is implemented on the basis of a crossbar-enabled analog computing-in-memory system, and the method includes: acquiring a distribution characteristic of the weight; and determining, according to the distribution characteristic of the weight, an initial quantization parameter for quantizing the weight to reduce a quantization error in quantizing the weight.
For example, the quantization method provided in at least one embodiment of the present disclosure further includes: quantizing the weight using the initial quantization parameter to obtain a quantized weight; and training the neural network using the quantized weight and updating the weight on the basis of a training result to obtain an updated weight.
For example, the quantization method provided in at least one embodiment of the present disclosure further includes: quantizing the weight using the initial quantization parameter to obtain a quantized weight; adding noise to the quantized weight to obtain a noised weight; and training the neural network using the noised weight and updating the weight on the basis of a training result to obtain an updated weight.
For example, in the quantization method provided in at least one embodiment of the present disclosure, training the neural network and updating the weight on the basis of the training result to obtain an updated weight include: performing forward propagation and backward propagation on the neural network; and updating the weight by using a gradient that is obtained by the backward propagation to obtain the updated weight.
For example, the quantization method provided in at least one embodiment of the present disclosure further includes: updating the initial quantization parameter on the basis of the updated weight.
For example, in the quantization method provided in at least one embodiment of the present disclosure, updating the initial quantization parameter on the basis of the updated weight includes: determining whether the updated weight matches the initial quantization parameter, in a case where the updated weight matches the initial quantization parameter, not updating the initial quantization parameter, and in a case where the updated weight does not match the initial quantization parameter, updating the initial quantization parameter.
For example, in the quantization method provided in at least one embodiment of the present disclosure, determining whether the updated weight matches the initial quantization parameter includes: performing a matching operation on the updated weight and the initial quantization parameter to obtain a matching operation result; and comparing the matching operation result with a threshold range, in a case where the matching operation result is within the threshold range, determining that the updated weight matches the initial quantization parameter; and in a case where the matching operation result is not within the threshold range, determining that the updated weight does not match the initial quantization parameter.
At least one embodiment of the present disclosure further provides a quantization apparatus for a weight of a neural network, the neural network is implemented on the basis of a crossbar-enabled analog computing-in-memory system, the apparatus includes a first unit and a second unit, the first unit is configured to acquire a distribution characteristic of the weight; and the second unit is configured to determine, according to the distribution characteristic of the weight, an initial quantization parameter for quantizing the weight to reduce a quantization error in quantizing the weight.
For example, the quantization apparatus provided in at least one embodiment of the present disclosure further includes a third unit and a fourth unit, the third unit is configured to quantize the weight using the initial quantization parameter to obtain a quantized weight; and the fourth unit is configured to train the neural network using the quantized weight and to update the weight on the basis of a training result to obtain an updated weight.
For example, the quantization apparatus provided in at least one embodiment of the present disclosure further includes a third unit, a fourth unit, and a fifth unit, the third unit is configured to quantize the weight using the initial quantization parameter to obtain a quantized weight; the fifth unit is configured to add noise to the quantized weight to obtain a noised weight; and the fourth unit is configured to train the neural network using the noised weight and to update the weight on the basis of a training result to obtain an updated weight.
For example, the quantization apparatus provided in at least one embodiment of the present disclosure further includes a sixth unit, the sixth unit is configured to update the initial quantization parameter on the basis of the updated weight.
For example, in the quantization apparatus provided in at least one embodiment of the present disclosure, the sixth unit is configured to determine whether the updated weight matches the initial quantization parameter, in a case where the updated weight matches the initial quantization parameter, the initial quantization parameter is not updated, and in a case where the updated weight does not match the initial quantization parameter, the initial quantization parameter is updated.
At least one embodiment of the present disclosure further provides a quantization apparatus for a weight of a neural network, the neural network is implemented on the basis of a crossbar-enabled analog computing-in-memory system, the apparatus includes: a processor; and a memory, including one or more computer program modules; the one or more computer program modules are stored in the memory and are configured to be executed by the processor, and the one or more computer program modules are used for implementing the quantization method provided in any one embodiment of the present disclosure.
At least one embodiment of the present disclosure further provides a storage medium for storing non-transitory computer-readable instructions, the non-transitory computer-readable instructions, when executed by a computer, implement the method provided in any one embodiment of the present disclosure.
In order to clearly illustrate the technical solution of the embodiments of the invention, the drawings of the embodiments will be briefly described in the following; it is obvious that the described drawings are only related to some embodiments of the invention and thus are not limitative of the invention.
In order to make objects, technical details and advantages of the embodiments of the invention apparent, the technical solutions of the embodiments will be described in a clearly and fully understandable way in connection with the drawings related to the embodiments of the invention. Apparently, the described embodiments are just a part but not all of the embodiments of the invention. Based on the described embodiments herein, those skilled in the art can obtain other embodiment(s), without any inventive work, which should be within the scope of the invention.
Unless otherwise defined, all the technical and scientific terms used herein have the same meanings as commonly understood by one of ordinary skill in the art to which the present disclosure belongs. The terms “first,” “second,” etc., which are used in the present disclosure, are not intended to indicate any sequence, amount or importance, but distinguish various components. Likewise, the terms “a”, “an”, “one” or “the” etc., do not denote a limitation of quantity, but mean that there is at least one. The terms “comprise,” “comprising,” “include,” “including,” etc., are intended to specify that the elements or the objects stated before these terms encompass the elements or the objects and equivalents thereof listed after these terms, but do not preclude the other elements or objects.
The present disclosure is described below through several specific embodiments. To keep the following description of the embodiments of the present disclosure clear and concise, detailed descriptions of well-known functions and well-known components may be omitted. When any component of an embodiment of the present disclosure appears in more than one drawing, the component is denoted by the same reference numeral in each drawing.
The implementation of a neural network using a crossbar-enabled analog computing-in-memory (CACIM) system requires mapping, that is, the weight of the neural network needs to be written to the computing and storage unit of the CACIM system. When performing the mapping described above, the weight can be quantized to reduce the precision of the weight, thereby reducing the mapping overhead. However, quantizing the weight will introduce a quantization error, thereby affecting the effect of the neural network model. It should be noted that in a digital computing system, the precision of a weight represents the number of bits used to represent the weight; whereas in the CACIM system, the precision of a weight represents the number of levels of analog devices that are used to represent the weight.
For example, in one example, the weight is a set of 32-bit floating-point numbers: [0.4266, 3.8476, 2.0185, 3.0996, 2.2692, 3.4748, 0.3377, 1.5991]; the quantization method of rounding towards negative infinity is used for quantizing the set of weight values, the quantized weight thus obtained is a set of 2-bit integers: [0, 3, 2, 3, 2, 3, 0, 1], and the difference between the weight and the quantized weight is the quantization error.
In a method for quantizing a weight of a neural network implemented by the CACIM system, the quantization method is designed on the basis of a digital computing system, for example, the quantization method includes a quantization method pre-defined as uniform quantization, a quantization method for quantizing numbers or rounding towards negative infinity. However, the quantization methods described above do not fully consider the distribution characteristic of the weight of the neural network. The pre-defined quantization method solves a optimization problem with constraints and cannot obtain the minimum quantization error, thus leading to a poor effect of the neural network model.
At least one embodiment of the present disclosure provides a quantization method for a weight of a neural network, the neural network is implemented on the basis of a crossbar-enabled analog computing-in-memory system, and the quantization method includes: acquiring a distribution characteristic of the weight; and determining, according to the distribution characteristic of the weight, an initial quantization parameter for quantizing the weight to reduce a quantization error in quantizing the weight.
Embodiments of the present disclosure also provide a quantization apparatus and a storage medium corresponding to the quantization method described above.
The quantization method and quantization apparatus for the weight of the neural network, and the storage medium provided by the embodiments of the present disclosure make use of the characteristic that the weight in the CACIM system is represented by an analog quantity, and the present disclosure proposes a generalized quantization method based on the distribution characteristic of the weight. Such quantization method does not pre-define the quantization method used (for example, it does not pre-define using a quantization method designed for a digital computing system), but determines a quantization parameter used for quantizing the weight according to the distribution characteristic of the weight to reduce a quantization error, so that the effect of the neural network model is better under the same mapping overhead, and the mapping overhead is smaller under the same effect of the neural network model.
Embodiments and examples of the present disclosure will be described in detail below in conjunction with the appended drawings.
Step S110: acquiring a distribution characteristic of the weight.
Step S120: determining, according to the distribution characteristic of the weight, an initial quantization parameter for quantizing the weight to reduce a quantization error in quantizing the weight.
For example, the crossbar-enabled analog computing-in-memory system uses a resistive random access memory cell as a computing and storage unit, and then uses a resistive random access memory cell array to implement the neural network.
It should be noted that, in the embodiments of the present disclosure, the specific types of the resistive random access memory cell are not limited. For example, the resistive random access memory cell may adopt a 1R structure, that is, the resistive random access memory cell only includes one varistor. For another example, the resistive random access memory cell may also adopt a 1T1R structure, that is, the resistive random access memory cell includes a transistor and a varistor.
For example,
Ij=Σi=1M(ViGij)
It should be noted that the example shown in
In the embodiments of the present disclosure, the weight of the neural network can be represented by the conductance values of the resistive random access memory cell, that is, the weight of the neural network can be represented by an analog quantity, so that the quantization method for the weight may not be limited to quantization methods designed for a digital computing system.
For step S110, the distribution characteristic of the weight can be acquired by various means, and no limitation is made in the embodiments of the present disclosure in this regard.
For example, the distribution characteristic of the weight can be acquired directly. For another example, the weight of the neural network can be acquired firstly, and then the distribution characteristic of the weight can be acquired indirectly by computing.
For example, the acquiring may include multiple means to acquire data, such as reading and importing, etc. For example, the distribution characteristic of the weight may be pre-stored in a storage medium, and the distribution characteristic of the weight may be acquired by directly accessing and reading the storage medium.
For example, the distribution characteristic of the weight may include a probability density distribution of the weight.
For example,
It should be noted that in the embodiments of the present disclosure, taking the probability density distribution of the weight as the distribution characteristic of the weight is only exemplary, and the embodiments of the present disclosure include but are not limited thereto. For example, other characteristics of the weight may also be used as the distribution characteristics of the weight. For example, the distribution characteristic of the weight may also include a cumulative probability density distribution of the weight.
For step S120, according to the distribution characteristic of the weight, the quantization parameter for quantizing the weight may be determined with the aim of reducing the quantization error in quantizing the weight, for example, with the aim of minimizing the quantization error.
For example, in some embodiments, the quantization parameter may be determined directly according to the distribution characteristic of the weight.
For example, in one example, the quantization parameter may be determined by using the Lloyd algorithm according to the distribution characteristic of the weight. For example, for the probability density distribution of the weight shown in
It should be noted that in the embodiments of the present disclosure, the Lloyd algorithm is only exemplary, and the embodiments of the present disclosure include but are not limited thereto. For example, the quantization parameter may also be determined by other algorithms aiming at minimizing the quantization error. For example, the quantization parameter may be determined using the K-average clustering algorithm according to the distribution characteristic of the weight.
For another example, in some embodiments, the quantization parameter may also be determined indirectly according to the distribution characteristic of the weight.
For example, in one example, determining, according to the distribution characteristic of the weight, the initial quantization parameter for quantizing the weight to reduce the quantization error in quantizing the weight includes: acquiring a candidate distribution library, in which multiple distribution models are stored in the candidate distribution library; selecting, according to the distribution characteristic of the weight, a distribution model corresponding to the distribution characteristic from the candidate distribution library; and determining, according to the distribution model as selected, the initial quantization parameter for quantizing the weight to reduce the quantization error in quantizing the weight.
For example, the candidate distribution library may be preset, and may be acquired by various means such as reading and importing. No limitation is made in the embodiments of the present disclosure in this regard.
For example, selecting, according to the distribution characteristic of the weight, a distribution model corresponding to the distribution characteristic from the candidate distribution library includes: analyzing the distribution characteristic of the weight, and selecting, from the candidate distribution library, the distribution model with the distribution characteristic that is closest to the distribution characteristic of the weight.
For example, by analyzing the probability density distribution of a set of weight values shown in
In the embodiments of the present disclosure, by using the characteristic that the weight in the CACIM system is represented by the analog quantity, and the present disclosure proposes the generalized quantization method based on the distribution characteristic of the weight. Such quantization method does not pre-define the quantization method used (for example, it does not pre-define using the quantization method designed for a digital computing system), but determines the quantization parameter used for quantizing the weight according to the distribution characteristic of the weight to reduce the quantization error, so that the effect of the neural network model is better under the same mapping overhead, and the mapping overhead is smaller under the same effect of the neural network model.
For example, the quantization method 100 provided by at least one embodiment of the present disclosure further includes steps S130 and S140.
Step S130: quantizing the weight using the initial quantization parameter to obtain a quantized weight.
Step S140: training the neural network using the quantized weight, and updating the weight on the basis of a training result to obtain an updated weight.
For step S130, the quantized weight with reduced precision can be obtained by quantizing the weight using the initial quantization parameter.
For example, in one example, the initial quantization parameter as determined includes four quantization values: [−0.0618, −0.0036, 0.07, 0.1998] and three cut-off points: [−0.0327, 0.0332, 0.1349]. Thus, the quantized weight, which is obtained by quantizing the weight using the initial quantization parameter, can be expressed as:
For example, a set of weight values are [−0.0185, −0.0818, 0.1183, −0.0102, 0.1428], and a set of quantized weight values [−0.0036, −0.0618, 0.07, −0.0036, 0.1998] can be obtained after quantization by using y=f(x).
For step S140, after the quantized weight is obtained, the neural network is trained using the quantized weight, for example, off-chip training can be performed, and the weight is updated on the basis of the training result.
For example, in one example, training the neural network, and updating the weight on the basis of the training result to obtain an updated weight include: performing forward propagation and backward propagation on the neural network; and updating the weight by using a gradient that is obtained by the backward propagation to obtain the updated weight.
For example, in the process of forward propagation, the input of the neural network is processed layer by layer to generate output; in the process of backward propagation, by taking the sum of squares of the output and the expected error as the target function, the partial derivative of the target function to the weight is obtained layer by layer, which constitutes the gradient of the target function to the weight vector; and then the weight is updated on the basis of the gradient.
In the embodiments of the present disclosure, only the influence of the quantization error on the effect of the neural network model is considered. However, both the write error and the read error of the weight may cause the effect of the neural network model to degrade, resulting in poor robustness. In some other embodiments of the present disclosure, noise is added to the quantized weight, and off-chip training is performed using the quantized weight to which noise is added so that the updated weight as obtained has better robustness.
For example, the quantization method 100 provided by at least one embodiment of the present disclosure further includes steps S130′, S135 and S140′.
Step S130′: quantizing the weight using the initial quantization parameter to obtain a quantized weight.
Step S135: adding noise to the quantized weight to obtain a noised weight.
Step S140′: training the neural network using the noised weight, and updating the weight on the basis of a training result to obtain an updated weight.
For step S130′, it is similar to step S130, and no further detail will be provided herein.
For step S135, after obtaining the quantized weight, the noised weight can be obtained by adding noise to the quantized weight.
For example, in one example, after obtaining the quantized weight, the noised weight can be obtained by adding Gaussian distribution noise to the quantized weight. For example, the mean value of the Gaussian distribution noise can be 0, and the standard deviation can be the maximum value of the absolute values of the quantized weight being multiplied by a certain proportional coefficient, such as 2%.
For example, the set of quantized weight values obtained are [−0.0036, −0.0618, 0.07, −0.0036, 0.1998], the mean value of the Gaussian distribution noise is 0, and the standard deviation is 0.1998*0.02=0.003996; then a set of noise values [0.0010, 0.0019, 0.0047, −0.0023, −0.0015] can be obtained, and by adding this set of noise values to the set of quantized weight values, a set of noised weight values [−0.0026, −0.0599, 0.0747, −0.0058, 0.1983] can be obtained.
For step S140′, it is similar to step S140, and the only difference lies in using the noised weight to replace the quantized weight for off-chip training. No further detail will be provided herein.
In the embodiments of the present disclosure, the noised weight obtained by adding the noise to the quantized weight is used for performing off-chip training, so that the updated weight as obtained has better robustness. In addition, in the embodiments of the present disclosure, off-chip training is performed by combining noise addition and quantization rather than performed separately, thereby effectively reducing training costs.
For example, the quantization method 100 provided by at least one embodiment of the present disclosure further includes step S150.
Step S150: updating the initial quantization parameter on the basis of the updated weight.
For step S150, the initial quantization parameter can be adjusted according to the updated weight.
For example, in one example, the initial quantization parameter is updated once the updated weight is obtained.
For example, in another example, updating the initial quantization parameter on the basis of the updated weight includes: determining whether the updated weight matches the initial quantization parameter; in a case where the updated weight matches the initial quantization parameter, not updating the initial quantization parameter; and in a case where the updated weight does not match the initial quantization parameter, updating the initial quantization parameter. In this example, the initialization parameter is updated only when the updated weight does not match the initial quantization, thereby effectively reducing the update frequency.
For example, determining whether the updated weight matches the initial quantization parameter includes: performing a matching operation on the updated weight and the initial quantization parameter to obtain a matching operation result; comparing the matching operation result with a threshold range; in a case where the matching operation result is within the threshold range, determining that the updated weight matches the initial quantization parameter; and in a case where the matching operation result is not within the threshold range, determining that the updated weight does not match the initial quantization parameter.
For example, the operation A⊙B can be defined, where A and B are two matrices with the same dimension, and the matching operation A⊙B means performing a matrix point multiplication operation on matrix A and matrix B and summing the elements in the matrix point multiplication operation result; for example, assuming that the weight matrix is W and the updated weight matrix is qW, the matching operation can be defined as (W⊙qW)/(qW⊙qW), for example, the threshold range is [0.9, 1.1]; after performing the matching operation, if the matching operation result is within the threshold range, the initial quantization parameter is not updated, and if the matching operation result is not within the threshold range, the initial quantization parameter is updated. It should be noted that the matching operation and threshold range described above are only exemplary rather than limitations to the present disclosure.
In the foregoing embodiments and examples of the present disclosure, the off-chip training that is performed on the neural network is taken as an example for illustration, and embodiments of the present disclosure include but are not limited thereto. For example, multiple trainings can also be performed on the neural network to update the weight and update the quantization parameter.
For example,
For example,
The first unit 410 is configured to acquire a distribution characteristic of the weight. For example, the first unit 410 implements the step S110; for the specific implementation method, reference may be made to relevant descriptions of the step S110, and no further detail will be provided herein.
The second unit 420 is configured to determine, according to the distribution characteristic of the weight, an initial quantization parameter for quantizing the weight to reduce a quantization error in quantizing the weight. For example, the first unit 420 implements the step S120; for the specific implementation method, reference may be made to relevant descriptions of the step S120, and no further detail will be provided herein.
For example, the quantization apparatus 400 provided by at least one embodiment of the present disclosure further includes a third unit 430 and a fourth unit 440.
The third unit 430 is configured to quantize the weight using the initial quantization parameter to obtain a quantized weight. For example, the third unit 430 implements the step S130; for the specific implementation method, reference may be made to relevant descriptions of the step S130, and no further detail will be provided herein.
The fourth unit 440 is configured to train the neural network using the quantized weight and to update the weight on the basis of a training result to obtain an updated weight. For example, the fourth unit 440 implements the step S140; for the specific implementation method, reference may be made to relevant descriptions of the step S140, and no further detail will be provided herein.
For example, the quantization apparatus 400 provided by at least one embodiment of the present disclosure further includes a third unit 430, a fourth unit 440, and a fifth unit 450.
The third unit 430 is configured to quantize the weight using the initial quantization parameter to obtain a quantized weight. For example, the third unit 430 implements the step S130′; for the specific implementation method, reference may be made to relevant descriptions of the step S130′, and no further detail will be provided herein.
The fifth unit 450 is configured to add noise to the quantized weight to obtain a noised weight. For example, the fifth unit 450 implements the step S135; for the specific implementation method, reference may be made to relevant descriptions of the step S135, and no further detail will be provided herein.
The fourth unit 440 is configured to train the neural network using the noised weight and to update the weight on the basis of the training result to obtain an updated weight. For example, the fourth unit 440 implements the step S140′; for the specific implementation method, reference may be made to relevant descriptions of the step S140′, and no further detail will be provided herein.
For example, the quantization apparatus 400 provided by at least one embodiment of the present disclosure further includes a sixth unit 460.
The sixth unit 460 is configured to update the initial quantization parameter on the basis of the updated weight. For example, the sixth unit 460 implements the step S150; for the specific implementation method, reference may be made to relevant descriptions of step the S150, and no further detail will be provided herein.
For example, in the quantization apparatus 400 provided by at least one embodiment of the present disclosure, the sixth unit 460 is configured to determine whether the updated weight matches the initial quantization parameter, and n a case where the updated weight matches the initial quantization parameter, the initial quantization parameter is not updated, and in a case where the updated weight does not match the initial quantization parameter, the initial quantization parameter is updated. For example, the sixth unit 460 may determine whether to update the initial quantization parameter according to whether the updated weight matches the initial quantization parameter. For the specific implementation method, reference may be made to the relevant description in the example of the step S150, and no further detail will be provided herein.
It should be noted that the various units in the quantization apparatus 400 shown in
In addition, as described above, although the quantization apparatus 400 is divided into units for performing corresponding processing respectively, it is clear to those skilled in the art that the processing performed by each unit may also be performed without any specific unit division or when there is no clear demarcation between units. In addition, the quantization apparatus 400 shown in
For example, the processor 510 may be a central processing unit (CPU), a digital signal processor (DSP), or any other form of processing units with data processing capabilities and/or program execution capabilities, such as field programmable gate arrays (FPGAs); for example, the central processing unit (CPU) can be an X86, ARM architecture, or the like. The processor 510 can be a general-purpose processor or a special-purpose processor, which can control other components in the quantization apparatus 500 to perform desired functions.
For example, the memory 520 may include any combination of one or more computer program products, the computer program product may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random access memory (RAM) and/or cache memory. The non-volatile memory may include, for example, read-only memory (ROM), hard disks, erasable programmable read-only memory (EPROM), compact disk read-only memory (CD-ROM), USB memory, flash memory, and the like. One or more computer program modules can be stored on the computer-readable storage medium, and the processor 510 can run one or more computer program modules to realize various functions of the quantization apparatus 500. Various application programs, various data and various data that is used and/or generated by the application programs can also be stored in the computer-readable storage medium.
It should be noted that for the sake of clarity and brevity, the embodiments of the present disclosure do not present all components of the quantization apparatus 400, apparatus 500 and the storage medium 600. In order to realize the necessary functions of the quantization apparatus 400, apparatus 500 and the storage medium 600, those skilled in the art may provide and configure other components that are not shown according to specific requirements. No limitation is made in the embodiments of the present disclosure in this regard.
In addition, in the embodiments of the present disclosure, for the specific functions and technical effects of the quantization apparatus 400, apparatus 500 and the storage medium 600, reference may be made to the description about the quantization method 100, 200 or 300 hereinabove, and no further details will be provided herein.
The following points need to be noted:
The above are merely particular embodiments of the present disclosure but are not limitative to the scope of the present disclosure; any of those skilled familiar with the related arts may easily conceive variations and substitutions in the technical scopes disclosed by the present disclosure, which should be encompassed in protection scopes of the present disclosure. Therefore, the scopes of the present disclosure should be defined in the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
202011558175.6 | Dec 2020 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2021/137446 | 12/13/2021 | WO |