This application claims the priority benefit of Taiwan application serial no. 111124592, filed on Jun. 30, 2022. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of this specification.
The invention relates to machine learning/deep learning, and particularly relates to a simplification device and a simplification method for neural network model used in deep learning.
In applications of neural network, it is often necessary to perform multilayer matrix multiplication and addition. For example, a multilayer perceptron (MLP) has multiple linear operation layers. Each linear operation layer generally performs matrix multiplication by using a weight matrix and an activation matrix, a multiplication result may be added to a bias matrix, and the result of the addition is used as an input of a next linear operation layer.
Along with increasing enlargement and complexity of the neural network model, the number of layers of the linear operation layer increases, and a size of the matrix involved in each layer increases. Without upgrading hardware specifications and improving the computing architecture, time (or even power consumption) required for inference may be increased continuously. In order to speed up the inference time of the neural network, how to simplify the original trained neural network model and make the simplified trained neural network model equivalent to the original trained neural network model is one of many important technical issues in this field.
The information disclosed in this Background section is only for enhancement of understanding of the background of the described technology and therefore it may contain information that does not form the prior art that is already known to a person of ordinary skill in the art. Further, the information disclosed in the Background section does not mean that one or more problems to be resolved by one or more embodiments of the invention was acknowledged by a person of ordinary skill in the art.
The invention is directed to a simplification device and a simplification method for neural network model, which simplify an original trained neural network model.
In an embodiment of the invention, the simplification method for neural network model is configured to simplify an original trained neural network model to a simplified trained neural network model, wherein the simplified trained neural network model includes at most two linear operation layers. The simplification method includes: receiving the original trained neural network model; calculating a first new weight of the at most two linear operation layers of the simplified trained neural network model by using a plurality of original weights of the original trained neural network model; and generating the simplified trained neural network model based on the first new weight.
In an embodiment of the invention, the simplification device includes a memory and a processor. The memory stores a computer readable program. The processor is coupled to the memory to execute the computer readable program. The processor executes the computer readable program to realize the above-mentioned simplification method for neural network model.
In an embodiment of the invention, the above-mentioned non-transitory storage medium is used for storing a computer readable program. Wherein, the computer readable program is executed by a computer to realize the above-mentioned simplification method for neural network model.
Based on the above description, the simplification method for neural network model according to the embodiments of the invention may simplify the original trained neural network model with multiple linear operation layers into the simplified trained neural network model of at most two linear operation layers. In some embodiments, the simplification method converts the original trained neural network model into an original mathematical function; and performs an iterative analysis operation on the original mathematical function to simplify the original mathematical function to a simplified mathematical function, where the simplified mathematical function has a first new weight. Generally, each weight of the trained neural network model may be considered as a constant. By using a plurality of original weights (constants) of the original trained neural network model, the simplification method may pre-calculate the first new weight to serve as a weight for the linear operation layer of the simplified trained neural network model. Under the premise that the simplified trained neural network model is equivalent to the original trained neural network model, a number of layers of the linear operation layers of the simplified trained neural network model is much less than that of the original trained neural network model. Therefore, the inference time of the neural network may be effectively speeded up.
To make the aforementioned more comprehensible, several embodiments accompanied with drawings are described in detail as follows.
The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention.
A term “couple” used in the full text of the disclosure (including the claims) refers to any direct and indirect connections. For example, if a first device is described to be coupled to a second device, it is interpreted as that the first device is directly coupled to the second device, or the first device is indirectly coupled to the second device through other devices or connection means. “First”, “second”, etc. mentioned in the specification (including the claims) are merely used to name discrete components and should not be regarded as limiting the upper or lower bound of the number of the components, nor is it used to define a manufacturing order or setting order of the components. Moreover, wherever possible, components/members/steps using the same referential numbers in the drawings and description refer to the same or like parts. Components/members/steps using the same referential numbers or using the same terms in different embodiments may cross-refer related descriptions.
The following embodiments will exemplify a neural network simplification technology based on matrix operation reconstruction. The following embodiments may simplify a plurality of successive linear operation layers into at most two layers. The reduction/simplification of the number of layers of the linear operation layers may greatly reduce computational requirements, thereby reducing energy consumption and speeding up an inference time.
In some application examples, the computer readable program may be stored in a non-transitory storage medium (not shown). In some embodiments, the non-transitory storage medium includes, for example, a read only memory (ROM), a tape, a disk, a card, a semiconductor memory, a programmable logic circuit and/or a storage device. The storage device includes a hard disk drive (HDD), a solid-state drive (SSD), or other storage devices. The simplification device 200 (for example, a computer) may read the computer readable program from the non-transitory storage medium, and temporarily store the computer readable program in the memory 210. In other application examples, the computer readable program may also be provided to the simplification device 200 via any transmission medium (a communication network or broadcast waves, etc.). The communication network is, for example, the Internet, a wired communication network, a wireless communication network, or other communication media.
In step S320 may pre-calculate new weights and new biases of at most two linear operation layers of the simplified trained neural network model (in some applications, there may be no bias). Namely, the new weights and new biases of at most two linear operation layers of the simplified trained neural network model are also constants. Therefore, a user may use the simplified trained neural network model with at most two linear operation layers to perform inferences, and an inference effect is equivalent to the original trained neural network model with more layers.
For example, it is assumed that the original trained neural network model is denoted as y=(x@w1+b1)@w2+b2, where y represents an output of the original trained neural network model and x represents an input of the original trained neural network model, @ represents any linear operation (such as a matrix multiply operation, a matrix add operation, a matrix multiply-accumulate operation, or other linear matrix operations), w1 and b1 respectively represent an original weight and an original bias of a first linear operation layer of the original trained neural network model, and w2 and b2 respectively represent an original weight and an original bias of a second linear operation layer of the original trained neural network model. According to practical applications, the original biases b1 and/or b2 may be 0 or other constants.
The processor 220 may simplify the original trained neural network model y=(x@w1+b1)@w2+b2 of two layers to a simplified trained neural network model y=x@WI+BI of a single linear operation layer, where y represents an output of the simplified trained neural network model, x represents an input of the simplified trained neural network model, WI represents a first new weight, and BI represents a new bias of the simplified trained neural network model. Simplification details are described in the next paragraph.
The original trained neural network model y=(x@w1+b1)@w2+b2 may be expanded as y=x@w1@w2+b1@w2+b2. Namely, the processor 220 may pre-calculate WI=w1@w2 to determine the first new weight WI of the simplified trained neural network model y=x@WI+BI. The processor 220 may also pre-calculate BI=b1@w2+b2 to determine a new bias BI of the simplified trained neural network model y=x@WI+BI. Therefore, the simplified trained neural network model y=x@WI+BI with a single linear operation layer may be equivalent to the original trained neural network model y=(x@w1+b1) @w2+b2 with two linear operation layers.
For another example, it is assumed that the original trained neural network model is denoted as y=((x@w1+b1)T@w2+b2)T@w3, where ( )T represents a matrix transpose operation, w1 and b1 respectively represent an original weight and an original bias of the first linear operation layer of the original trained neural network model, w2 and b2 respectively represent an original weight and an original bias of the second linear operation layer of the original trained neural network model, and w3 represents an original weight of a third linear operation layer of the original trained neural network model. In the example, an original bias of the third linear operation layer is assumed to be 0 (i.e., the third linear operation layer has no bias).
The processor 220 may simplify the original trained neural network model y=((x@w1+b1)T@w2+b2)T@w3 of three linear operation layers to a simplified trained neural network model y=WII® (x@WI+BI) of at most two linear operation layers. Where, WI represents the first new weight of the first linear operation layer of the simplified trained neural network model, and BI represents the first new bias of the first linear operation layer of the simplified trained neural network model. The processor 220 may also calculate a second new weight WII of the second linear operation layer of the simplified trained neural network model by using at least one original weight of the original trained neural network model. The processor 220 may further calculate a second new bias BI of the simplified trained neural network model by using at least one original weight and at least one original bias of the original trained neural network model. Simplification details are described in the next paragraph.
The original trained neural network model y=((x@w1+b1)T@w2+b2)T@w3 may be expanded as y=(w2)T@x@w1@w3+(w2)T@b1@w3+(b2)T@w3, and rewrote as y=(w2)T@X@w1@w3+(w2)T@b1@w3+(w2)T@((w2)T)−1@(b2)T@w3. Therefore, the original trained neural network model may be organized as y=(w2)T@[x@w1@w3+b1@w3+((w2)T)−1@(b2)T@w3]. Namely, the processor 220 may pre-calculate WII=(w2)T to determine the second new weight WII of the simplified trained neural network model y=WII@(x@WI+BI). The processor 220 may pre-calculate WI=w1@w3 to determine the first new weight WI of the simplified trained neural network model y=WII@(x@WI+BI). The processor 220 may further pre-calculate BI=b1@w3+((w2)T)−1@(b2)T@w3 to determine the first new bias BI of the simplified trained neural network model y=WII@(x@WI+BI). Therefore, the simplified trained neural network model y=WII@(x@WI+BI) with at most two linear operation layers may be equivalent to the original trained neural network model y=((x@w1+b1)T@w2+b2)T@w3 with three linear operation layers.
The simplification method shown in
In step S420 shown in
In step S430, the processor 220 may perform an iterative analysis operation on the original mathematical function to simplify the original mathematical function to a simplified mathematical function. Where, the simplified mathematical function has two more new weights. The iterative analysis operation includes n iterations. In a first iteration of the n iterations, the input x of the original mathematical function is used as a starting point, the processor 220 may extract (xT0@w1+b1)T1 corresponding to the first linear operation layer 510_1 from the original mathematical function. In the first iteration, the processor 220 may define X1 as x, and check T0. When T0 represents “transpose”, the processor 220 may define F1 as (X1)T (i.e., transposed X1), define F′1 as F1@w1+b1, and check T1, where ( )T represents a transpose operation. When T0 represents “transpose” and T1 represents “transpose”, the processor 220 may define Y1 as (F′1)T (i.e., transposed F′1), such that Y1=(w1)T@X1+(b1)T. When T0 represents “transpose” and T1 represents “not transpose”, the processor 220 may define Y1 as F′1 such that Y1=(X1)T@w1+b1.
In the first iteration, when T0 represents “not transpose”, the processor 220 may define F1 as X1, define F′1 as F1@w1+b1, and check T1. When T0 represents “not transpose” and T1 represents “transpose”, the processor 220 may define Y1 as (F′1)T (i.e., transposed F′1) such that Y1=(w1)T@(X1)T+(b1)T. When T0 represents “not transpose” and T1 represents “not transpose”, the processor 220 may define Y1 as F′1 such that Y1=X1@w1+b1. After the first iteration, the processor 220 may use Y1 to replace (xT0@w1+b1)T1 in the original mathematical function, so that the original mathematical function becomes y=(( . . . (Y1@w2+b2)T2 . . . )Tn-1@wn±bn)Tn.
In a second iteration of the n iterations, Y1 is taken as the starting point, the processor 220 may extract (Y1@w2+b2)T2 corresponding to the second linear operation layer from the original mathematical function. The processor 220 may define X2 as Y1, define F2 as X2, define F′2 as F2@w2+b2, and check T2. When T2 represents “transpose”, the processor 220 may define Y2 as (F′2)T (i.e., the transposed F′2), such that Y2=(w2)+b2. When T2 represents “not transpose”, the processor 220 may define Y2 as F′2 such that Y2=X2@w2+b2. After the second iteration, the processor 220 may replace (Y1@w2+b2)T2 in the original mathematical function with Y2, so that the original mathematical function becomes y=(( . . . Y2 . . . )Tn−1@wn+bn)Tn. Deduced by analogy until the end of the n iterations. After the n iterations are complete, the processor 220 may generate a simplified mathematical function. The simplified mathematical function may be y=x@WI+BI or y=WII@(x@WI+BI)+BII, where WI and BI represent a first new weight and a first new bias of the same linear operation layer. value, and WII and BII represent a second new weight and a second new bias of a next linear operation layer.
In step S440, the processor 220 may calculate the new weight WI, the new weight WII, the new bias BI and/or the new bias BII by using the original weights w1 to wn and/or the original biases b1 to bn of the original trained neural network model. The iterative analysis operation uses a part of or all of these original weights w1 to wn to pre-calculate a first constant to serve as the first new weight WI (such as a new weight of the linear operation layer 521 shown in a middle part of
In step S450, the processor 220 may convert the simplified mathematical function into a simplified trained neural network model. For example, the processor 220 may convert the simplified mathematical function y=WII@(x@WI+BI)+BII into the simplified trained neural network model shown in the middle part of
In step S715 shown in
When a judgment result of step S720 is “yes” (the current linear operation layer has the preceding transpose), for example, in the first iteration, when TO represents “transpose”, the processor 220 may perform step S725 to define Fi as (Xi)T (i.e., the transposed Xi). In step S730, the processor 220 may define F′i as Fi@wi+bi. In step S735, the processor 220 may check whether there is a “succeeding transpose” in the current linear operation layer (for example, check T1 in the first iteration). Taking
When the judgment result of step S735 is “yes” (the current linear operation layer has the succeeding transpose), for example, in the first iteration, when T1 indicates “transpose”, the processor 220 may perform step S740 to define Yi as (F′i)T (i.e., the transposed F′i), such that Yi=(wi)T@X1+(bi)T. When the judgment result of step S735 is “none” (the current linear operation layer has no succeeding transpose), for example, in the first iteration, when T1 indicates “not transpose”, the processor 220 may proceed to step S745 to define Yi as F′i, such that Yi=(Xi)T@wi+bi.
When the judgment result of step S720 is “none” (the current linear operation layer has no preceding transpose), for example, in the first iteration, when TO indicates “not transpose”, the processor 220 may perform step S750 to define Fi as Xi. In step S755, the processor 220 may define F′i as Fi@wi+bi. In step S760, the processor 220 may check whether there is the “succeeding transpose” in the current linear operation layer (for example, check T1 in the first iteration). Step S760 may be deduced with reference of the relevant description of step S735, and details thereof are not repeated.
When the judgment result of step S760 is “yes”, for example, in the first iteration, when T1 indicates “transpose”, the processor 220 may proceed to step S765 to define Yi as (F′i)T (i.e., transposed F′i) such that Yi=(wi)T@(Xi)T+(bi)T. When the judgment result of step S760 is “none”, for example, in the first iteration when T1 indicates “not transpose”, the processor 220 may proceed to step S770 to define Yi as F′i, such that Yi=X1@wi+bi.
After any one of steps S740, S745, S765 and S770 ends, the processor 220 may proceed to step S775 to determine whether all linear operation layers of the original trained neural network model have been traversed. When there is still a linear operation layer in the original trained neural network model that has not been subjected to iterative analysis (the determination result in step S775 is “No”), the processor 220 may proceed to step S780 to accumulate i by 1, and define X1 is Yi-1. After step S780 ends, the processor 220 may perform step S720 again to perform a next iteration of the n iterations.
When all of the linear operation layers in the original trained neural network model have been subjected to iterative analysis (the determination result of step S775 is “Yes”), the processor 220 may proceed to step S785 to define the output y as Yi. Taking n iterations as an example, step S785 may define the output y as Yn. The processor 220 may perform step S790 to calculate at most two sets of new weights WI and/or WII of the simplified mathematical function by using a plurality of the original weights w1 to wn and/or a plurality of the original biases b1 to bn of the original trained neural network model. WI and WII represent two weight matrices. In step S450, the processor 220 may convert the simplified mathematical function into the simplified trained neural network model. Therefore, the processor 220 may simplify the original trained neural network model of n linear operation layers to the simplified trained neural network model of at most two linear operation layers, for example, y=WII® (x@WI+BI)+BII or y=x@WI+BI.
For example, it is assumed that the original mathematical function is y=((x@w1+b1)T@w2+b2)T@w3+b3. In the first iteration (i=1), the input x of the original math function is taken as a starting point, the processor 220 may extract the first linear operation layer (x@w1+b1)T from the original math function. In step S715, the processor 220 may define X1 as x. Since there is no “preceding transpose” in the current linear operation layer, the processor 220 may proceed to step S750 to define F1 as X1. In step S755, the processor 220 may define F′1 as F1@w1+b1. Since the current linear operation layer has “succeeding transpose”, the processor 220 may perform step S765 to define Y1 as (F′1)T (i.e., the transposed F′1), such that Y1=(w1)T@(X1)T+(b1)T. Since there is still a linear operation layer in the original trained neural network model that has not been subjected to iterative analysis, the processor 220 may perform step S780 to accumulate i by 1 (i.e., i=2), and define X2 as Y1.
The processor 220 may execute step S720 again to perform a second iteration. In the second iteration (i=2), X2 is taken as the starting point, the processor 220 may extract the second linear operation layer (X2@w2+b2)T from the original mathematical function y=(X2@w2+b2)T@w3+b3. Since there is no “preceding transpose” in the current linear operation layer, the processor 220 may proceed to step S750 to define F2 as X2. In step S755, the processor 220 may define F′2 as F2@w2+b2. Since the current linear operation layer has “succeeding transpose”, the processor 220 may execute step S765 to define Y2 as (F′2)T (i.e., the transposed F′2), such that Y2=(w2)T@(X2)T+(b2)T. Since there is still a linear operation layer in the original trained neural network model that has not been subjected to iterative analysis, the processor 220 may execute step S780 to accumulate i by 1 (i.e., i=3), and define X3 as Y2.
The processor 220 may execute step S720 again to perform a third iteration. In the third iteration (i=3), X3 is taken as the starting point, the processor 220 may extract a third linear operation layer X3@w3+b3 from the original mathematical function y=X3@w3+b3. Since there is no “preceding transpose” in the current linear operation layer, the processor 220 may proceed to step S750 to define F3 as X3. In step S755, the processor 220 may define F′3 as F3@w3+b3. Since there is no “succeeding transpose” in the current linear operation layer, the processor 220 may proceed to step S770 to define Y3 as F′3, such that Y3=X3@w3+b3. Since all linear operation layers in the original trained neural network model have been subjected to iterative analysis, the processor 220 may proceed to step S785 to define the output y as Y3.
After completing 3 iterations, the original mathematical function turns into y=((w2)T@((w1)T@(x)T+(b1)T)T+(b2)T)@w3+b3. The transformed original math function may be expanded as y=(w2)T@x@w1@w3+(w2)T@b1@w3+(b2)T@w3+b3. In some embodiments, y=(w2)T@x@w1@w3+(w2)T@b1@w3+(b2)T@w3+b3 may be sorted into y=(w2)T@[x@ w1@w3+b1@w3]+(b2)T@w3+b3. Namely, the processor 220 may pre-calculate WII=(w2)T, WI=w1@w3, BI=b1@w3, and BII=(b2)T@W3+b3. Since w1, w2, w3, b1, b2, and b3 are all constants, WI, WII, BI, and BII are also constants. Based on this, the processor 220 may determine the first new weight WI, the second new weight WII, the first new bias BI and the second new bias BII of the simplified mathematical function y=WII@(x@WI+BI)+BII.
In some other embodiments, y=(w2)T@x@w1@w3+(w2)T@b1@w3+(b2)T@w3+b3 may be rewritten as y=(w2)T@x@w1@w3+(w2)T@b1@w3+(w2)T@((w2)T)−1@(b2)T @w3+b3, and further sorted as y=(w2)T@[x@w1@w3+b1@w3 ((w2)T)−1@(b2)T@w3]+b3. Namely, the processor 220 may pre-calculate WII=(w2)T, WI=w1@w3, BI=b1 @w3+((w2)T)−1@(b2)T@w3, and BII=b3. Therefore, the processor 220 may determine the first new weight WI, the second new weight WII, the first new bias BI, and the second new bias BII of the simplified mathematical function y=WII@(x@WI+BI)+BII.
Therefore, the processor 220 may simplify the original trained neural network model y=((x@w1+b1)T@w2+b2)T@w3+b3 with three linear operation layers to the simplified trained neural network model y=WII@(x@WI+BI)+BII with at most two linear operation layers. The simplified trained neural network model y=WII@(x@WI+BI)+BII with at most two linear operation layers may be equivalent to the original trained neural network model y=((x@w1+b1)T@w2+b2)T@w3+b3 with three linear operation layers.
The above embodiments may also be applied to trained neural network models with residual connections. For example, in yet other embodiments, it is assumed that the original mathematical function (original trained neural network model) is y=((x@w1+b1)T@w2+b2)T@w3+x. After completing 3 iterations, the original mathematical function turns into y=(w2)T@[x@w1@w3+b1@w3+((w2)T)−1@(b2)T@w3]+x. Namely, the processor 220 may pre-calculate the first new weight WI, the second new weight WII and the first new bias BI in the simplified mathematical function y=WII@(x@WI+BI)+x, i.e., WII=(w2)T, WI=w1@w3, and BI=b1@w3+((w2)T)−1@(b2)T@w3 (in this example, the second new bias BII is 0).
In summary, under the premise that the simplified trained neural network model is equivalent to the original trained neural network model, the number of the linear operation layers of the simplified trained neural network model is much less than the number of the original trained neural network model. Therefore, the inference time of the neural network may be effectively speeded up.
It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed embodiments without departing from the scope or spirit of the invention. In view of the foregoing, it is intended that the invention covers modifications and variations provided they fall within the scope of the following claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
111124592 | Jun 2022 | TW | national |