The disclosure relates to a neural network, and more particularly to a neural network with flexible feature compression capability.
An artificial neural network (or simply “neural network”) is usually composed of multiple layers of artificial neurons. Each layer may perform a transformation on its input, and generate an output that serves as an input to the next layer. As an example, a convolutional neural network includes a plurality of convolutional layers, each of which may include multiple kernel maps and a set of batch normalization coefficients to perform convolution and batch normalization on an input feature map, and generate an output feature map to be used by the next layer.
However, memory capacity of a neural network accelerator is usually limited and insufficient to store all of the kernel maps, the sets of batch normalization coefficients and the feature maps that are generated during operation of the neural network, so external memory is often used to store these data. Accordingly, the operation of the neural network would involve a large amount of data transfer between the neural network accelerator and the external memory, which would result in power consumption and latency.
Therefore, an object of the disclosure is to provide a method for training a neural network, such that the neural network has flexible feature compression capability.
According to the disclosure, the neural network includes multiple neuron layers, one of which includes a weight set and has a data compression procedure that uses a data compression-decompression algorithm. The method includes steps of: A) by a neural network accelerator, training the neural network based on a first compression setting that corresponds to a first compression quality level, where a first set of batch normalization coefficients that corresponds to the first compression quality level is used in said one of the neuron layers during the training of the neural network in step A); B) by the neural network accelerator, outputting the weight set (optional) and the first set of batch normalization coefficients that have been trained in step A) for use by the neural network when the neural network is executed to perform decompression and multiplication-and-accumulation on a to-be-processed compressed feature map substantially based on the first compression quality level in said one of the neuron layers; C) by the neural network accelerator, training the neural network based on a second compression setting that corresponds to a second compression quality level different from the first compression quality level, where the weight set that has been trained in step A) and a second set of batch normalization coefficients that corresponds to the second compression quality level are used in said one of the neuron layers during the training of the neural network in step C); and D) by the neural network accelerator, outputting the weight set that has been trained in both of step A) and step C) for use by the neural network when the neural network is executed to perform decompression and multiplication-and-accumulation on the to-be-processed compressed feature map substantially based on any one of the first compression quality level and the second compression quality level in said one of the neuron layers, and the second set of batch normalization coefficients that has been trained in step C) for use by the neural network when the neural network is executed to perform decompression and multiplication-and-accumulation on the to-be-processed compressed feature map substantially based on the second compression quality level in said one of the neuron layers. At least one of the first compression quality level or the second compression quality level is a lossy compression level.
Another object of the disclosure is to provide a neural network system that has flexible feature compression capability. The neural network system includes a neural network accelerator and a memory device. In some embodiments, the neural network accelerator is configured to execute the neural network that has been trained using the method of this disclosure. The memory device is accessible to the neural network accelerator, and stores the weight set which has been trained in the method, the first set of batch normalization coefficients which has been trained in the method, and the second set of batch normalization coefficients which has been trained in the method. The neural network accelerator is configured to (a) select one of the first compression quality level and the second compression quality level for said one of the neuron layers, (b) store into said memory device a compressed input feature map that corresponds to said one of the neuron layers and that was compressed with the selected one of the first compression quality level and the second compression quality level, (c) load the compressed input feature map from said memory device for said one of the neuron layers, (d) decompress the compressed input feature map with respect to the selected one of the first compression quality level and the second compression quality level to obtain a decompressed input feature map, (e) load the weight set from said memory device, (f) use the weight set to perform an operation of multiplying and accumulating on the decompressed input feature map to generate a computed feature map, (g) load one of the first set of batch normalization coefficients and the second set of batch normalization coefficients that corresponds to the selected one of the first compression quality level and the second compression quality level from said memory device, and (h) use the loaded one of the first set of batch normalization coefficients and the second set of batch normalization coefficients to perform batch normalization on the computed feature map to generate a normalized feature map for use by the next neuron layer.
In some embodiments, the neural network accelerator is configured to cause a neural network that includes multiple neuron layers to perform corresponding operations. The memory device is accessible to the neural network accelerator, and stores a weight set corresponding to one of the neuron layers, and multiple sets of batch normalization coefficients corresponding to said one of the neuron layers. The weight set is adapted to multiple compression quality levels, and each of the sets of batch normalization coefficients is adapted for a respective one of the compression quality levels. The neural network accelerator is configured to (a) select one of the compression quality levels for said one of the neuron layers, (b) store into said memory device a compressed input feature map that corresponds to said one of the neuron layers and that was compressed with the selected one of the compression quality levels, (c) load the compressed input feature map from said memory device for said one of the neuron layers, (d) decompress the compressed input feature map with respect to the selected one of the compression quality levels to obtain a decompressed input feature map, (e) load the weight set from said memory device, (f) use the weight set to perform an operation of multiplying and accumulating on the decompressed input feature map to generate a computed feature map, (g) load one of the sets of batch normalization coefficients that is adapted for the selected one of the compression quality levels from said memory device, and (h) use the loaded one of the sets of batch normalization coefficients to perform batch normalization on the computed feature map to generate a normalized feature map for use by a next neuron layer, which is one of the neuron layers that immediately follows said one of the neuron layers.
Other features and advantages of the disclosure will become apparent in the following detailed description of the embodiment(s) with reference to the accompanying drawings. It is noted that various features may not be drawn to scale.
Before the disclosure is described in greater detail, it should be noted that where considered appropriate, reference numerals or terminal portions of reference numerals have been repeated among the figures to indicate corresponding or analogous elements, which may optionally have similar characteristics.
Referring to
Referring to
In this embodiment, the computing unit 11 compresses the output feature map for one or more neuron layers, so as to reduce data transfer between the accelerator 1 and the external memory device 2, and power consumption and latency of the neural network can thus be reduced. Furthermore, the computing unit 11 is configured to selectively use, for each neuron layer that is configured to compress the output feature data, one of multiple predetermined compression quality levels to perform the data compression, and the BN coefficients that correspond to the neuron layer includes multiple sets of BN coefficients that have been trained respectively with respect to the multiple predetermined compression quality levels, as shown in
Referring to
In step S1, the computing unit 11 selects one of the predetermined compression quality levels for the neuron layer, and loads a compressed input feature map that corresponds to the neuron layer from the external memory device 2. The compressed input feature map is an output of the last neuron layer (i.e., one of the neuron layers that is immediately previous to the neuron layer), and has been compressed using one of the predetermined compression quality levels that is the same as the predetermined compression quality level selected for the neuron layer. In this embodiment, the compression is performed using the JPEG or JPEG-like (e.g., some operations of the JPEG compression may be omitted, such as header encoding) compression method, which is a lossy compression. It is noted that the compressed input feature map may be composed of a plurality of compressed portions, and the computing unit 11 may load one of the compressed portions at a time for subsequent steps because of the limited memory capacity of the accelerator 1.
In step S2, the computing unit 11 decompresses the compressed input feature map with respect to the selected one of the predetermined compression quality levels to obtain a decompressed input feature map.
In step S3, the computing unit 11 loads, from the external memory device 2, a kernel map set that corresponds to the neuron layer and that has been trained with respect to each of the predetermined compression quality levels, and uses the kernel map set to perform convolution on the decompressed input feature map to generate a convolved feature map.
In step S4, the computing unit 11 loads one of the sets of batch normalization coefficients that has been trained with respect to the selected one of the predetermined compression quality levels from the external memory device 2, and uses the loaded set of batch normalization coefficients to perform batch normalization on the convolved feature map to generate a normalized feature map for use by the next neuron layer, which is one of the neuron layers that immediately follows the neuron layer.
In step S5, the computing unit 11 uses an activation function to process the normalized feature map to generate an output feature map. The activation function may be, for example, a rectified linear unit (ReLU), a leaky ReLU, a sigmoid linear unit (SiLU), a Gaussian error linear unit (GELU), other suitable functions, or any combination thereof.
In step S6, the computing unit 11 selects one of the predetermined compression quality levels for the next neuron layer, compresses the output feature map using said one of the predetermined compression quality level that is selected for the next neuron layer, and stores the output feature map thus compressed into the external memory device 2. The output feature map thus compressed would serve as the compressed input feature map for the next neuron layer. Step S6 is a data compression procedure that uses the JPEG or JPEG-like compression method in this embodiment, but this disclosure is not limited to any specific compression method.
Through steps S11 to S16, the accelerator 1 trains the neural network based on a first compression quality setting that indicates or corresponds to a first compression quality level (which is one of the predetermined compression quality levels), where a first set of batch normalization coefficients that corresponds to the first compression quality level is used in the specific neuron layer to have a kernel map set of the specific neuron layer and the first set of batch normalization coefficients trained. Subsequently, the accelerator 1 outputs the kernel map set and the first set of batch normalization coefficients that have been trained through steps S11 to S16 for use by the neural network when the neural network is executed to perform decompression and multiplication-and-accumulation (e.g., convolution) on a to-be-processed compressed feature map substantially based on the first compression quality level in the specific neuron layer. The term “substantially” as used herein may generally mean that an error of a given value or range is within 20%, preferably within 10%. For example, in practice, one may use the kernel map set and the first set of batch normalization coefficients that were trained with respect to a compression quality level of 80 to perform decompression and multiplication-and-accumulation (e.g., convolution) on the to-be-processed compressed feature map based on a compression quality level of 75, which falls within the aforesaid interpretation of “substantially” because the error would be (80−75)/80=6.25%.
In step S11, the accelerator 1 performs first compression-related data processing on a first input feature map to obtain a first processed feature map, wherein the first compression-related data processing is related to data compression with the first compression quality level.
In step S12, the accelerator 1 performs first decompression-related data processing on the first processed feature map to obtain a second processed feature map, wherein the first decompression-related data processing is related to data decompression and corresponds to the first compression quality level.
Referring to
In step S13, the accelerator 1 uses the kernel map set to perform convolution on the second processed feature map to generate a first convolved feature map.
In step S14, the accelerator 1 uses the first set of batch normalization coefficients to perform batch normalization on the first convolved feature map to obtain a first normalized feature map for use by the next neuron layer, which is one of the neuron layers that immediately follows the specific neuron layer. The first set of batch normalization coefficients may include a set of scaling coefficients and a set of offset coefficients that are used to perform scaling and offset in the batch normalization performed on the first convolved feature map.
In step S15, the accelerator 1 uses an activation function to process the first normalized feature map, and the first normalized feature map thus processed is used as an input feature map to the next neuron layer.
In step S16, after the neural network generates a final output, the accelerator 1 performs back propagation on the neural network that was used in step S11 to S15 to modify, for each neuron layer, the corresponding kernel map set and the corresponding set of batch normalization coefficients (e.g., the kernel map set that was used in step S13 and the first set of batch normalization coefficients that was used in step S14 for the specific neuron layer).
Accordingly, each kernel map in the kernel map set and the first set of batch normalization coefficients for the specific neuron layer have been trained with respect to the first compression quality level.
After the neural network has been trained using a batch of training data for the first compression quality level, the accelerator 1 outputs the kernel map set (optional) and the first set of batch normalization coefficients of the specific neuron layer that are adapted for the first compression quality level (step S17). Referring to
In step S21, the accelerator 1 performs second compression-related data processing on a second input feature map to obtain a third processed feature map, where the second compression-related data processing is related to data compression with the second compression quality level.
In step S22, the accelerator 1 performs second decompression-related data processing on the second processed feature map to obtain a fourth processed feature map, where the second decompression-related data processing is related to data decompression and the second compression quality level.
The accelerator 1 generates a Q-table based on the second compression quality level, and uses the Q-table thus generated to perform the second compression-related data processing and the second decompression-related data processing. Details of the second compression-related data processing and the second decompression-related data processing are similar to those of the first compression-related data processing and the first decompression-related data processing, and are not repeated herein for the sake of brevity.
In step S23, the accelerator 1 uses the kernel map set that has been modified in step S16 to perform convolution on the fourth processed feature map to generate a second convolved feature map.
In step S24, the accelerator 1 uses the second set of batch normalization coefficients to perform batch normalization on the second convolved feature map to obtain a second normalized feature map for use by the next neuron layer. The second set of batch normalization coefficients may include a set of scaling coefficients and a set of offset coefficients that are used to perform scaling and offset in the batch normalization performed on the second convolved feature map.
In step S25, the accelerator 1 uses the activation function to processes the second normalized feature map, and the second normalized feature map thus processed is used as an input feature map to the next neuron layer.
In step S26, after the neural network generates a final output, the accelerator 1 performs back propagation on the neural network that was used in steps S21 to S25 to modify, for each neuron layer, the corresponding kernel map set and the corresponding set of batch normalization coefficients (e.g., the kernel map set that has been modified in step S16 and that was used in step S23, and the second set of batch normalization coefficients that was used in step S24 for the specific neuron layer).
Accordingly, each kernel map in the kernel map set and the second set of batch normalization coefficients for the specific neuron layer have been trained with respect to the second compression quality level. In step S27, the accelerator 1 outputs the kernel map set of the specific neuron layer that is adapted for the first compression quality level and the second compression quality level, and the second set of batch normalization coefficients of the specific neuron layer that is adapted for the second compression quality level.
In some embodiments, steps S11 to S16 may be iteratively performed with multiple mini-batches of training datasets, and/or steps S21 to S26 may be iteratively performed with multiple mini-batches of training datasets. A mini-batch is a subset of a training dataset. In some embodiments, a mini-batch may include 256, 512, 1024, 2048, 4096, or 8192 training samples, but this disclosure is not limited to these specific numbers. Batch Gradient Descent training is one special case with mini-batch size being set to the total number of examples in the training dataset. Stochastic Gradient Descent (SGD) training is another special case with mini-batch size set to 1. In some embodiments, iterations of steps S11 to S16 and iterations of steps S21 to S26 do not need to be performed in any particular order. In other words, the iterations of steps S11 to S16 and the iterations of steps S21 to S26 may be interleavingly performed (e.g., in the order of S11-S16, S21-S26, S11-S16, S21-S26 . . . , with S17 and S27 at last). It is noted that step S17 is not necessarily performed prior to steps S21-S26, and can be performed together with step S27 in other embodiments, and this disclosure is not limited to specific orders of step S17 and steps S21-S26.
As a result, for the specific neuron layer, the kernel map set has been trained with respect to both of the first compression quality level and the second compression quality level, the first set of batch normalization coefficients has been trained with respect to the first compression quality level, and the second set of batch normalization coefficients has been trained with respect to the second compression quality level. If needed, the specific neuron layer can be trained with respect to other compression quality levels in a similar way, so the kernel map set of the specific neuron layer is trained with respect to additional compression quality levels, and the specific neuron layer includes additional sets of batch normalization coefficients that are respectively trained with respect to the additional compression quality levels, and this disclosure is not limited to only two compression quality levels. In addition, each neuron layer of the neural network can be trained in the same manner as the specific neuron layer, and as a result, the neural network is adapted for multiple compression quality levels, and has flexible feature compression capability.
Table 1 compares the embodiment with prior art using two ResNet neural networks denoted by ResNet-A and ResNet-B, where the prior art uses only one set of batch normalization coefficients for different compression quality levels in a single neuron layer, while the embodiment of this disclosure uses different sets of batch normalization coefficients for different compression quality levels in a single neuron layer. Four compression levels corresponding to four quality levels were tested. Taking ResNet-A, for example, the prior art achieves 69.7%, 66.8%, 42.6%, and 14.9% accuracy, depending on the four compression levels, respectively. In comparison, this embodiment achieves 69.8%, 69.1%, 66.6%, and 64%, which are up to 49.1% better than the baseline (64%-14.9%=49.1% at quality level 50). Experiments on ResNet-B also show that this embodiment makes one neural network adapt to multiple (four in the example) compression quality levels better than the prior art.
To sum up, the embodiment of the neural network system according to this disclosure includes, for a single neuron layer, a kernel map set that has been trained with respect to multiple predetermined compression quality levels, and multiple sets of batch normalization coefficients that have been trained respectively for the multiple predetermined compression quality levels, and thus the neural network system has flexible feature compression capability. In some embodiments, during the training of the neural network, the compression-related training includes only the lossy part of the full compression procedure (i.e., the lossless part is omitted), and the decompression-related training includes only the inverse operations of the lossy part of the full compression procedure, so the overall time required for the training can be reduced.
In the description above, for the purposes of explanation, numerous specific details have been set forth in order to provide a thorough understanding of the embodiment(s). It will be apparent, however, to one skilled in the art, that one or more other embodiments may be practiced without some of these specific details. It should also be appreciated that reference throughout this specification to “one embodiment,” “an embodiment,” an embodiment with an indication of an ordinal number and so forth means that a particular feature, structure, or characteristic may be included in the practice of the disclosure. It should be further appreciated that in the description, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of various inventive aspects; such does not mean that every one of these features needs to be practiced with the presence of all the other features. In other words, in any described embodiment, when implementation of one or more features or specific details does not affect implementation of another one or more features or specific details, said one or more features may be singled out and practiced alone without said another one or more features or specific details. It should be further noted that one or more features or specific details from one embodiment may be practiced together with one or more features or specific details from another embodiment, where appropriate, in the practice of the disclosure.
While the disclosure has been described in connection with what is(are) considered the exemplary embodiment(s), it is understood that this disclosure is not limited to the disclosed embodiment(s) but is intended to cover various arrangements included within the spirit and scope of the broadest interpretation so as to encompass all such modifications and equivalent arrangements.
This application claims the benefits of U.S. Provisional Patent Application No. 63/345,918, filed on May 26, 2022, which is incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
63345918 | May 2022 | US |