The present invention is related to a machine learning and inference system based on an artificial neural network, and more particularly to a multiplier-less method and system of a deep neural network with round-accumulate operation and weight pruning.
A machine learning based on a deep neural network (DNN) has been applied to many various issues including but not limited to image classification, speech recognition, computational sensing, data analysis, feature extraction, signal processing, and artificial intelligence. There are many DNN architectures such as multi-layer perceptron, recurrent network, convolutional network, transformer network, attention network, long short-term memory, generative adversarial network, auto-encoder, U-shaped network, residual network, reversible network, loopy network, clique network, and a variant thereof. Although a high performance of a DNN has been demonstrated in many practical systems, it has still been a challenge to train and deploy a DNN at a resource-limited hardware with a small memory and processing power for real-time applications. This is partly because the typical DNN has a large number of trainable parameters such as weights and biases, with many artificial neurons across deep hidden layers which also require a large number of arithmetic multiplication and floating-point operations.
In prior arts, some memory-efficient DNN methods and systems were proposed to resolve a part of the issue. One type of such approaches includes a pruning, sparcification, or compacting of weights and nodes. Another type of approaches is based on a quantization of weights and activations. Both of them are known as a network distillation technique. The weight pruning enforces some DNN weights to be zeros and the corresponding edges are eliminated to realize more compact DNN architectures. For example, L1/L2-regularization is used to realize sparse weights to compress the network size. Recently, another simple approach for weights pruning was proposed based on lottery ticket hypothesis (LTH). The LTH pruning uses a trained DNN to create a pruning mask, and rewinds the weights at early epochs for unpruned edges. It was shown that the LTH pruning can provide better performance with more compact DNNs.
The weight quantization compresses the DNN weights by converting a high-precision value into a low-precision value. For example, the DNN is first trained with floating-point weights of double precision, which typically requires 64 bits to represent, and then the weights are quantized to a half precision of 16 bits, or an integer value of 8 bits. This quantization can reduce the requirement of DNN memory by 4 or 8 folds for deployment, leading to low-power systems. Even weights binarization or ternalization is developed to realize low-memory DNNs. The binary DNN and ternery DNN show reasonable performance for some specific datasets, while they show performance loss in most real-world datasets. Another quantization technique includes rounding, vector quantization, codebook quantization, stochastic quantization, random rounding, adaptive codebook, incremental quantization, additive powers-of-two, or power-of-two quantization. Besides weights quantization, activations and gradients are quantized for some systems. In order to reduce performance degradation due to those network distillation, there are many training methods based on quantization/pruning-aware training besides dynamic quantization.
However, most distillation techniques suffer from a significant performance degradation due to the resource limitation in general. In addition, there is no integrated way to realize sparse and quantized DNN without losing performance. Accordingly, there is a need to develop a method and a system for a low-power and low-complexity deployment of a DNN applicable for a high-speed real-time processing.
A deep neural network (DNN) has been recently investigated for various applications including tele-communications, speech recognition, image processing and so on. Although a high potential of DNN has been successfully demonstrated in many applications, DNN generally requires high computational complexity and high power operation for real-time processing due to a large number of multiply-accumulate (MAC) operations. This invention provides hardware-friendly DNN framework with round-accumulate (RAC) operations to realize multiplier-less operations based on powers-of-two quantization or its variant. We demonstrate that quantization-aware training (QAT) for additive powers-of-two (APoT) weights can eliminate multipliers without causing visible performance loss. In addition, we introduce weight pruning based on a progressive version of lottery ticket hypothesis (LTH) to sparcify the over-parameterized DNN weights for further complexity reduction. It is demonstrated that the invention can prune most weights, leading to power-efficient inference for resource-limited hardware implementation such as micro processors or field-programmable gate arrays (FPGAs).
The method and system are based on the realization that rounding-aware training for powers-of-two expansion can eliminate the need of multiplier components from the system without causing significant performance loss. In addition, the method and system provide a way to reduce the number of PoT weights based on a knowledge distillation using a progressive compression of an over-parameterized DNN. It is can realize high compression, leading to power-efficient inference for resource-limited hardware implementation. A compacting rank is optimized with additional DNN model in a reinforcement learning framework. A rounding granularity is also successively decremented and mixed-order PoT weights are obtained for low-power processing. Another student model is also designed in parallel for a knowledge distillation to find a Pareto-optimal trade-off between performance and complexity.
The invention provides a way to realize low-complexity low-power DNN design which does not require multipliers. Our method uses progressive updates of LTH pruning and quantization-aware training. Hyperparameters are automatically optimized with Bayesian optimization. Multiple solutions are obtained by multi-seed training to find Pareto-optimal trade-off between performance and complexity. Pruning ranking is optimized with additional DNN model. Quantization granularity is also successively decremented and mixture-order APoT weights are obtained for low-power processing. Another student model derived from a parent model is also designed in parallel for knowledge distillation. Embodiments include digital pre-distortion, MIMO equalization, nonlinear turbo equalization etc.
Some embodiments of the present invention provide a hardware-friendly DNN framework for nonlinear equalization. Our DNN realizes multiplier-less operation based on powers-of-two quantization. We demonstrate that quantization-aware training (QAT) for additive powers-of-two (APoT) weights can eliminate multipliers without causing visible performance loss. In addition, we introduce weight pruning based on the lottery ticket hypothesis (LTH) to sparsify the over-parameterized DNN weights for further complexity reduction. We verify that progressive LTH pruning can prune most of the weights, yielding power-efficient inference in real-time processing.
In order to reduce the computational complexity of DNN computation, we integrate APoT quantization into a DeepShift framework. In the original DeepShift, DNN weights are quantized into a signed PoT as w=±2u,
where u is an integer to train. Note that the PoT weights can eliminate multiplier operations from DNN equalizers as it can be realized with bit shifting for fixed point precision or addition operation for floating point precision. The present invention further improves it with APoT weights: w=±(2u±2v) where we use another trainable integer v<u. It requires an additional summation but no multiplication at all. In our QAT updating, we used a straight-through rounding to find u and v after each epoch iteration once the pre-training phase is done. For some embodiments, higher-order additive PoT is used to further reduce quantization errors.
Even though our DNN equalizer does not require any multipliers, it still needs a relatively large number of addition operations. We introduce a progressive version of the LTH pruning method to realize low-power sparse DNN implementation. It is known that an over-parameterized DNN can be significantly sparsified without losing performance and that sparsified DNN can often outperform the dense DNN. In the progressive LTH pruning according to the invention, we first train the dense DNN via QAT for APoT quantization, starting from random initial weights. We then prune a small percentage of the edges based on the trained weights. We re-train the pruned DNN after rewinding the weights to the early-epoch weights for non-pruned edges. Rewinding, QAT updating, and pruning are repeated with a progressive increase of the pruning percentages. We use late rewinding of the first-epoch weights.
Accordingly, the zero-multiplier sparse DNN can be used for many applications including high-speed optical communications employing probabilistic shaping. The APoT quantization can achieve floating point arithmetic performance without multipliers and progressive LTH pruning can eliminate 99% of the weights for power-efficient implementation in some embodiments.
According to some embodiments of the present invention, a computer-implemented method is provided for training a set of artificial neural networks. The method may be performed by one or more computing processors in association with a memory storing computer-executable programs. The method comprises the steps of: (a) initializing a set of trainable parameters of an artificial neural network, wherein the set of trainable parameters comprise a set of trainable weights and a set of trainable biases; (b) training the set of trainable parameters using a set of training data; (c) generating a pruning mask based on the trained set of trainable parameters; (d) rewinding the set of trainable parameters; (e) pruning a selected set of trainable parameters based on the pruning mask; and (f) repeating the above steps from (b) to (e) for a specified number of times to generate a set of sparse neural networks having an incremental sparsity.
Further, some embodiments of the present invention can provide a computer-implemented method for testing an artificial neural network. The method can be performed by one or more computing processors, comprises the steps of: feeding a set of testing data into a plurality of an input node of the artificial neural network; propagating the set of testing data across the artificial neural network according to a set of pruning masks; and generating a set of output values from a plurality of an output node of the artificial network.
Yet further, some embodiments provide a system deployed for an artificial neural network. The system includes at least one computing processor; at least one memory bank; at least one interface link; at least one trained set of trainable parameters of the artificial neural network; causing processors to execute training method and testing method based on the at least one trained set of trainable parameters; (a) initializing a set of trainable parameters of an artificial neural network, wherein the set of trainable parameters comprise a set of trainable weights and a set of trainable biases; (b) training the set of trainable parameters using a set of training data; (c) generating a pruning mask based on the trained set of trainable parameters; (d) rewinding the set of trainable parameters; (e) pruning a selected set of trainable parameters based on the pruning mask; and (f) repeating the above steps from (b) to (e) for a specified number of times to generate a set of sparse neural networks having an incremental sparsity.
Accordingly, the embodiments can realize multiplier-less sparse DNN computation with no performance degradation. It can be used for ultra high-speed real-time applications requiring low-power and limited-resource deployment.
The accompanying drawings, which are included to provide a further understanding of the invention, illustrate embodiments of the invention and together with the description, explaining the principle of the invention.
Various embodiments of the present invention are described hereafter with reference to the figures. It would be noted that the figures are not drawn to scale elements of similar structures or functions are represented by like reference numerals throughout the figures. It should be also noted that the figures are only intended to facilitate the description of specific embodiments of the invention. They are not intended as an exhaustive description of the invention or as a limitation on the scope of the invention. In addition, an aspect described in conjunction with a particular embodiment of the invention is not necessarily limited to that embodiment and can be practiced in any other embodiments of the invention.
Some embodiments of the present disclosure provide a multiplier-less deep neural network (DNN) to mitigate fiber-nonlinear distortion of shaped constellations. The novel DNN achieves an excellent performance-complexity trade-off with progressive lottery ticket hypothesis (LHT) weight pruning and additive powers-of-two (APoT) quantization. Some other embodiments include but not limited to inference and prediction for digital pre-distortion, channel equalization, channel estimation, nonlinear turbo equalization, speech recognition, image processing, biosignal sensing and so on.
The optical communications system 100 under consideration is depicted in
Due to fiber nonlinearity, residual distortion after LE will limit the achievable information rates and degrade the bit error rate performance.
To train the DNN, various different loss functions can be used, including but not limited to softmax cross entropy, binary cross entropy, distance, mean-square error, mean absolute error, connectionist temporal classification loss, negative log-likelihood, Kullback-Leibler divergence, margin loss, ranking loss, embedding loss, hinge loss, Huber loss, and so on. Specifically, the trainable parameters of DNN architecture are trained by some steps of operations: feeding a set of training data into the input nodes of the DNN; propagating the training data across layers of the DNN; pruning trainable parameters according to some pruning masks; and generating output values from output nodes of the DNN; calculating loss values for the training data; updating the trainable parameters based on the loss values through a backward message passing; and repeating those steps for a specified number of iteration times.
To compensate for the residual nonlinear distortion, we use DNN-based equalizers, which directly generate bit-wise soft-decision log-likelihood ratios (LLRs) for the decoder.
Zero-Multiplier DNN with Additive Powers-of-Two (APoT) Quantization
In order to reduce the computational complexity of DNN for real-time processing, the present invention prodives a way to integrate APoT quantization into a DeepShift framework. In the original DeepShift, DNN weights are quantized into a signed PoT as w−±2u, where u is an integer to train. Note that the PoT weights can fully eliminate multiplier operations from DNN equalizers as it can be realized with bit shifting for fixed-point precision or addition operation for floating-point (FP) precision.
Even though our DNN equalizer does not require any multipliers, it still needs a relatively large number of addition operations due to the over-parameterized DNN architecture having huge number of trainable weights. We introduce a progressive version of the LTH pruning method to realize low-power sparse DNN implementation. It is known that an over-parameterized DNN can be significantly sparsified without losing performance and that sparsified DNN can often outperform the dense DNN in LTH.
As discussed above, various DNN equalizers are provided for nonlinear compensation in optical fiber communications employing probabilistic amplitude shaping. We then proposed a zero-multiplier sparse DNN equalizer based on trainable version of APoT quantization and LTH pruning techniques. We showed that APoT quantization can achieve floating-point arithmetic performance without using any multipliers, whereas the conventional PoT quantization suffers from a severe penalty. We also demonstrated that the progressive LTH pruning can eliminate 99% of the weights, enabling highly power-efficient implementation of DNN equalization for real-time fiber-optic systems.
When the computer executable program is performed by the at least one computing processor 120, the program causes the at least one processor to execute a training method and testing method based on the at least one trained set of trainable parameters; (a) initializing a set of trainable parameters of an artificial neural network, wherein the set of trainable parameters comprise a set of trainable weights and a set of trainable biases; (b) training the set of trainable parameters using a set of training data; (c) generating a pruning mask based on the trained set of trainable parameters; (d) rewinding the set of trainable parameters; (e) pruning a selected set of trainable parameters based on the pruning mask; and (f) repeating the above steps from (b) to (e) for a specified number of times to generate a set of sparse neural networks having an incremental sparsity.
The above-described embodiments of the present invention can be implemented in any of numerous ways. For example, the embodiments may be implemented using hardware, software or a combination thereof. When implemented in software, the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers. Such processors may be implemented as integrated circuits, with one or more processors in an integrated circuit component. Though, a processor may be implemented using circuitry in any suitable format.
Also, the embodiments of the invention may be embodied as a method, of which an example has been provided. The acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiments.
Use of ordinal terms such as “first,” “second,” in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term) to distinguish the claim elements.
Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications can be made within the spirit and scope of the invention.
Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.
Number | Date | Country | |
---|---|---|---|
63242636 | Sep 2021 | US |