This application claims priority under 35 U.S.C. § 119 to Korean Patent Application No. 10-2022-0046272 filed on Apr. 14, 2022, in the Korean Intellectual Property Office, the disclosures of which are incorporated by reference herein in their entireties.
Embodiments of the present disclosure described herein relate to a deep neural network lightweight device and an operating method thereof, and more particularly, relate to a deep neural network lightweight device based on batch normalization and sparsity regularization, and an operating method thereof.
Lightweight of a deep neural network is a technology for obtaining a high-accuracy neural network with a small amount of computation, and is required for scenarios such as a mobile, an Internet of Things (IoT), and an edge. As a size of the deep neural network increases dramatically, various pieces of redundancy are present in the deep neural network. To solve this issue, various attempts have been made, such as pruning and quantization.
Most modern neural networks may use batch normalization layers. Some of the modern neural networks may use a pruning method of performing learning to add sparsity to a scale term of a batch normalization layer. In this case, an L1 loss is commonly used for sparsity regularization. When the L1 loss is used, scale terms of all of the batch normalizations are reduced with the same gradient. In the case of pruning, only the number of non-zero terms is important, and the size of the deep neural network is not affected. Accordingly, a sparsity regularization method better than an L1 regularization method is continuously studied.
Embodiments of the present disclosure provide adaptive regularization based on a target pruning ratio and a scale term of current batch normalization when learning is performed based on sparsity regularization in the lightweight of a deep neural network. In this way, it is possible to minimize a task loss of the lightweight of a deep neural network.
According to an embodiment, a deep neural network lightweight device based on batch normalization includes a memory that stores at least one data and at least one processor that executes a network lightweight module. When executing the network lightweight module, the processor performs learning on an input neural network based on sparsity regularization to adaptively determine at least one parameter of the sparsity regularization, performs pruning on the learning result, and performs fine tuning on the pruning result.
According to an embodiment, a deep neural network lightweight method based on batch normalization includes performing learning on an input neural network based on sparsity regularization, performing pruning on the learning result, and performing fine tuning on the pruning result. The performing of the learning based on the sparsity regularization includes adaptively determining at least one parameter of the sparsity regularization.
The above and other objects and features of the present disclosure will become apparent by describing in detail embodiments thereof with reference to the accompanying drawings.
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings in detail and clearly to such an extent that an ordinary one in the art easily implements the present disclosure.
The processors 110 may function as a central processing unit of the deep neural network lightweight device 100. At least one of the processors 110 may drive the network lightweight module 200. The processors 110 may include, for example, at least one general-purpose processor such as a central processing unit (CPU) 111 or an application processor (AP) 112. Moreover, the processors 110 may further include at least one special-purpose processor such as a neural processing unit (NPU) 113, a neuromorphic processor 114, or a graphics processing unit (GPU) 115. The processors 110 may include two or more homogeneous processors. As another example, at least one (or at least another) of the processors 110 may be manufactured to implement various machine learning or deep learning modules.
At least one of the processors 110 may be used to learn the network lightweight module 200. At least one of the processors 110 may learn the network lightweight module 200 based on various pieces of data or information.
At least one (or at least another) of the processors 110 may execute the network lightweight module 200. The network lightweight module 200 may perform network lightweight based on batch normalization by performing machine learning or deep learning. For example, at least one (or at least another one) of the processors 110 may perform learning on an input neural network based on sparsity regularization to adaptively determine at least one parameter of sparsity regularization, by executing the network lightweight module 200. Moreover, at least one (or at least another) of the processors 110 may execute the network lightweight module 200 to perform pruning on the learning result and to perform fine tuning on the pruning result.
The network lightweight module 200 may be implemented in a form of instructions (or codes) executed by at least one of the processors 110. In this case, the at least one processor may store instructions (or codes) of the network lightweight module 200 in the memory 130.
As another example, at least one (or at least another) of the processors 110 may be manufactured to implement the network lightweight module 200. For example, the at least one processor may be a dedicated processor implemented in hardware based on the network lightweight module 200 generated by the learning of the network lightweight module 200.
As another example, at least one (or at least another) of the processors 110 may be manufactured to implement various machine learning or deep learning modules. For example, at least one (or at least another) of the processors 110 may perform learning on an input neural network based on sparsity regularization. In this case, the sparsity regularization may be transformed L1 (TL1) regularization. At least one (or at least another) of the processors 110 may calculate a task loss and a regularization loss, may perform backpropagation based on the calculation result, and may perform deep learning based on the backpropagation result.
Moreover, the at least one processor may implement the network lightweight module 200 by receiving information (e.g., instructions or codes) corresponding to the network lightweight module 200.
The network interface 120 may provide remote communication with an external device. The network interface 120 may perform wireless or wired communication with the external device. The network interface 120 may communicate with the external device through at least one of various communication schemes such as Ethernet, wireless-fidelity (Wi-Fi), long term evolution (LTE), and 5th generation (5G) mobile communication. For example, the network interface 120 may communicate with an external device of the deep neural network lightweight device 100.
The network interface 120 may receive calculation data, which is to be processed by the deep neural network lightweight device 100, from the external device. The network interface 120 may output result data, which is generated by the deep neural network lightweight device 100, to the external device. For example, the network interface 120 may store the result data in the memory 130.
The memory 130 may store data and process codes, which are processed or to be processed by the processors 110. For example, in some embodiments, the memory 130 may store data to be entered into the deep neural network lightweight device 100 or pieces of data generated or learned in a process of performing a deep neural network by the processors 110.
The memory 130 may be used as a main memory device of the deep neural network lightweight device 100. The memory 130 may include a dynamic random access memory (DRAM), a static RAM (SRAM), a phase-change RAM (PRAM), a magnetic RAM (MRAM), a ferroelectric RAM (FeRAM), a resistive RAM (RRAM), or the like.
Referring to
When the TL1 regularization is used as sparsity regularization, a graph shape of the TL1 regularization may vary depending on a scaling factor ‘γ’ of batch normalization. For example, when the scaling factor ‘γ’ is small, Pa(x) may quickly converge to ‘0’ due to a steep gradient. When the scaling factor ‘γ’ is great, Pa(x) may be more affected by a task loss due to a gentle gradient.
In operation S100, under the control of the processors 110, the deep neural network lightweight device 100 may perform neural network learning based on sparsity regularization. For example, under the control of the processors 110, the deep neural network lightweight device 100 may perform neural network learning on an input neural network based on sparsity regularization.
In operation S110, the deep neural network lightweight device 100 may perform pruning under the control of the processors 110. For example, under the control of the processors 110, the deep neural network lightweight device 100 may remove channels as much as a predetermined target pruning ratio ‘p’ by performing pruning on the result of operation S100.
In operation S120, the deep neural network lightweight device 100 may perform fine tuning under the control of the processors 110. For example, under the control of the processors 110, the deep neural network lightweight device 100 may finely adjust parameters of a neural network by performing fine tuning on the result of operation S110. In this way, the deep neural network lightweight device 100 may restore the recognition ability of a neural network.
In operation S200, the deep neural network lightweight device 100 may determine whether learning batch ‘x’ is received for each learning loop, under the control of the processors 110. For example, under the control of the processors 110, the deep neural network lightweight device 100 may perform a learning process (e.g., operation S210 to operation S230) of a neural network based on sparsity regularization, in response to an event that the learning batch ‘x’ is received. The deep neural network lightweight device 100 may terminate the learning process of the neural network in response to an event that the learning batch ‘x’ is not received.
In operation S210, the deep neural network lightweight device 100 may calculate a task loss of the received learning batch ‘x’ under the control of the processors 110. For example, under the control of the processors 110, the deep neural network lightweight device 100 may calculate the task loss from the received learning batch ‘x’, a weight ‘W’, and the scaling factor ‘γ’ of the batch normalization. In this case, the task loss may be calculated based on Equation 2 as follows.
In operation S220, the deep neural network lightweight device 100 may calculate a regularization loss under the control of the processors 110. For example, under the control of the processors 110, the deep neural network lightweight device 100 may calculate the regularization loss from the scaling factor ‘γ’ of batch normalization. In this case, the regularization loss may be calculated based on Equation 3 as follows.
In this case, ‘λ’ may denote a coefficient of a sparsity regularization term, and g(γ) may denote a sparsity-induced penalty (e.g., g(γ)=|γ|) for a scaling factor.
In operation S230, the deep neural network lightweight device 100 may calculate the total loss and then may perform backpropagation, under the control of the processors 110. For example, under the control of the processors 110, the deep neural network lightweight device 100 may calculate the total loss based on the task loss and the regularization loss, which are respectively calculated in operation S210 and operation S220, and then may perform backpropagation. In this case, the total loss may be calculated by adding the task loss and the regularization loss.
In operation S300, the deep neural network lightweight device 100 may assign a parameter ‘th’ by calculating the scaling factor ‘γ’ corresponding to a target pruning ratio ‘p’ under the control of a processor. For example, under the control of the processor, the deep neural network lightweight device 100 may sort the entire scaling factor ‘γ’, may calculate a value corresponding to the target pruning ratio ‘p’ in the sorted scaling factor ‘γ’, and may assign the parameter ‘th’.
In operation S310, the deep neural network lightweight device 100 may calculate a parameter ‘a’ from the assigned parameter ‘th’ under the control of the processor. The parameter ‘a’ may be calculated based on Equation 4 as follows.
In this case, a result of Equation 4 may be “a=2th+th2”.
In operation S320, the deep neural network lightweight device 100 may calculate a regularization loss from the calculated ‘a’ under the control of the processor. For example, under the control of the processor, the deep neural network lightweight device 100 may calculate the regularization loss from Pa(γ).
Through operation S300 to operation S320, the regularization loss may not be fixed, but may be determined adaptively. That is, the regularization loss may be adaptively determined by the target pruning ratio ‘p’ and the scaling factor ‘γ’ of current batch normalization.
The first region R1 means a case of “|γ|<th”. In this case, the scaling factor ‘γ’ corresponding to the target pruning ratio ‘p’ may quickly converge to ‘0’, and the total loss may be focused on sparsity regularization. The second region R2 means a case of “|γ|>th”. In this case, the total loss may be focused on the task loss.
The above description refers to embodiments for implementing the present disclosure. Embodiments in which a design is changed simply or which are easily changed may be included in the present disclosure as well as an embodiment described above. In addition, technologies that are easily changed and implemented by using the above embodiments may be included in the present disclosure. While the present disclosure has been described with reference to embodiments thereof, it will be apparent to those of ordinary skill in the art that various changes and modifications may be made thereto without departing from the spirit and scope of the present disclosure as set forth in the following claims.
According to an embodiment of the present disclosure, sparsity regularization in a learning stage of a deep neural network may be adaptively adjusted depending on a target pruning ratio when a deep neural network lightweight device is used. Accordingly, the performance of a main task is improved. In addition, the performance may be prevented from degrading by minimizing a change in a network after pruning.
While the present disclosure has been described with reference to embodiments thereof, it will be apparent to those of ordinary skill in the art that various changes and modifications may be made thereto without departing from the spirit and scope of the present disclosure as set forth in the following claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2022-0046272 | Apr 2022 | KR | national |