This application claims the benefit of Korean Patent Applications No. 10-2019-0138793, filed Nov. 1, 2019, and No. 10-2020-0046724, filed Apr. 17, 2020, which are hereby incorporated by reference in their entireties into this application.
The present invention relates generally to machine-learning and signal-processing technology, and more particularly to technology for optimizing a quantized machine-learning algorithm.
In conventional machine-learning and nonlinear-signal-processing technology, a signal-processing operation is performed based on floating-point operations. However, the conventional machine-learning and nonlinear-signal-processing technology is not suitable for fields in which small and lightweight hardware is required because such technology uses multiple operation modules in order to provide real-time operation and because the size and complexity of a computation module provided for floating-point operations are greater than the size and complexity of a computation module provided for integer operations.
Accordingly, research on quantization of processing data is underway in various engineering fields. Quantization of processing data may confer many advantages, such as a decrease in the number of bits, calculation speed improvement, and availability improvement, when an engineering solution is implemented. For example, because a learning equation of a quantized domain is implemented by updating the least significant bit of a parameter, it is the same as applying a fixed learning speed to an update parameter. However, a general stochastic steepest-descent algorithm having a fixed step size cannot avoid performance degradation because of convergence on the optimum point in weak topology, such as convergence with first-order distribution.
Meanwhile, Korean Patent Application Publication No. 10-2018-0043154, titled “Method and apparatus for neural network quantization” relates to a neural network quantization method, and discloses a method for quantizing the parameters of a neural network, which includes determining the diagonals of a second-order partial derivative matrix (Hessian matrix) of the loss function of the network parameters of the neural network and assigning Hessian weights to the network parameters using the determined diagonals as part of the step of quantizing the network parameters.
An object of the present invention is to implement an optimization algorithm capable of minimizing a quantization error in machine-learning and nonlinear-signal-processing fields using quantization and exhibiting excellent performance on lightweight hardware.
Another object of the present invention is to implement a machine-learning algorithm capable of providing sufficient optimization performance even on low-performance hardware.
In order to accomplish the above objects, an apparatus for optimizing a quantized machine-learning algorithm according to an embodiment of the present invention may include one or more processors and executable memory for storing at least one program executed by the one or more processors. The at least one program may set the learning rate of the quantized machine-learning algorithm using at least one of an Armijo rule and golden search methods, calculate a quantized orthogonal compensation search vector from the search direction vector of the quantized machine-learning algorithm, compensate for the search performance of the quantized machine-learning algorithm using the quantized orthogonal compensation search vector, and calculate an optimized quantized machine-learning algorithm using the learning rate and the quantized machine-learning algorithm, the search performance of which is compensated for.
Here, the at least one program may set the learning rate through a learning-rate-setting function predefined by the Armijo rule using the gradient vector of the objective function of the search direction vector.
Here, the at least one program may set any one of a first candidate value, acquired by increasing the minimum candidate value of the learning rate by a golden ratio, and a second candidate value, acquired by decreasing the maximum candidate value of the learning rate by the golden ratio, as the learning rate.
Here, the at least one program may set any one of the first candidate value and the second candidate value as the learning rate when the difference value between the first candidate value and the second candidate value is equal to or less than a preset value.
Here, the at least one program may select a vector in the direction orthogonal to the direction opposite the largest component vector of the search direction vector and quantize the selected vector, thereby calculating the quantized orthogonal compensation search vector.
Here, when the solution of the quantized machine-learning algorithm is not able to escape from a local minimum point, the at least one program may make the solution escape from the local minimum point using the quantized orthogonal compensation search vector.
Also, in order to accomplish the above objects, a method for optimizing a quantized machine-learning algorithm, performed by an apparatus for optimizing the quantized machine-learning algorithm, according to an embodiment of the present invention may include setting the learning rate of the quantized machine-learning algorithm using at least one of an Armijo rule and golden search methods; calculating a quantized orthogonal compensation search vector from the search direction vector of the quantized machine-learning algorithm and compensating for the search performance of the quantized machine-learning algorithm using the quantized orthogonal compensation search vector; and calculating an optimized quantized machine-learning algorithm using the learning rate and the quantized machine-learning algorithm, the search performance of which is compensated for.
Here, setting the learning rate may be configured to set the learning rate through a learning-rate-setting function predefined by the Armijo rule using the gradient vector of the objective function of the search direction vector.
Here, setting the learning rate may be configured to set any one of a first candidate value, acquired by increasing the minimum candidate value of the learning rate by a golden ratio, and a second candidate value, acquired by decreasing the maximum candidate value of the learning rate by the golden ratio, as the learning rate.
Here, setting the learning rate may be configured to set any one of the first candidate value and the second candidate value as the learning rate when the difference value between the first candidate value and the second candidate value is equal to or less than a preset value.
Here, compensating for the search performance may be configured to select a vector in the direction orthogonal to the direction opposite the largest component vector of the search direction vector and to quantize the selected vector, thereby calculating the quantized orthogonal compensation search vector.
Here, compensating for the search performance may be configured such that, when the solution of the quantized machine-learning algorithm is not able to escape from a local minimum point, the solution is made to escape from the local minimum point using the quantized orthogonal compensation search vector.
The above and other objects, features and advantages of the present invention will be more clearly understood from the following detailed description, taken in conjunction with the accompanying drawings, in which:
The present invention will be described in detail below with reference to the accompanying drawings. Repeated descriptions and descriptions of known functions and configurations that have been deemed to unnecessarily obscure the gist of the present invention will be omitted below. The embodiments of the present invention are intended to fully describe the present invention to a person having ordinary knowledge in the art to which the present invention pertains. Accordingly, the shapes, sizes, etc. of components in the drawings may be exaggerated in order to make the description clearer.
Hereinafter, a preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.
Table 1 shows mathematical symbols used for explaining a quantized machine-learning algorithm according to an embodiment of the present invention.
Q
The definition of learning quantization and main quantization operations according to an embodiment of the present invention may be described as follows.
First, in order to define quantization of a variable x∈R, a round-off operation for obtaining an integer may be defined as shown in Equation (1).
x≡└x┘+ϵ (ϵ∈R[0,1)) (1)
In Equation (1), the symbol └x┘∈Z indicates a round-off operation, which may be defined as the integer that is nearest to x, among integers less than x. Using this, the Gauss symbol may be defined as shown in Equation (2).
[x]+└x+0.5┘=x+0.5−∇x+ϵ (2)
When the operation of rounding x to the nearest number is defined as [x] using Equation (2), the round-off error, ϵ, may be defined as ϵ∈R(−0.5, 0.5┘. Therefore, when an arbitrary integer n E Z is given, the relationships of addition and multiplication may be represented as shown in Equation (3).
[x+n]=└x+n+0.5┘=└x+0.5+n┘=└x+0.5┘+n=[x]+n[n·[x]]=└n−└x+0.5┘+0.5┘=n·└x+0.5┘=n·[x], [n·x]≠n[x] (3)
Also, when an arbitrary real-number sequence {xk} (∀k∈N, xk∈R) is given, if the round-off operation of each xk is represented using ϵk as in Equation (1), Equation (4) may be obtained.
Using Equations (1) and (2), a quantization operation may be defined as shown in Equation (5).
In Equation (5), Qp denotes a quantization coefficient, which may determine the level of quantization. For example, when quantization to a fixed-point number at the level of 10−3 is attempted, Qp=103 may be set. For convenience, the quantization coefficient may be set to a positive integer (Qp∈Z, Qp>0). In order to represent quantization of a specific number without the Gauss symbol, a round-off error is used, whereby Equation (5) may be replaced with Equation (6).
In Equation (6), xQ satisfies xQ∈R, but QpxQ=[Qp·x]∈Z. That is, in the case of xQ, quantization may be performed to obtain a fixed-point number format, rather than simply an integer.
In Equation (6), the range of the round-off error e may be represented as ε∈R(−0.5Qp−1,0.5Qp−1]=R(−5·(10Qp)−1,5·(10Qp)−1]. If Qp is large enough that the average value for the distribution of the round-off error depending on Qp satisfies EQ
The addition and multiplication of quantized values have the characteristics shown in Equations (4) and (5). However, because the result of division may not be represented as a quantized value, the following method is applied. That is, when division of two integers x and a (x, a∈z) is represented using a quotient and a remainder, division of x by a may be represented as shown in Equation (8).
In Equation (8), the quotient may be
and the remainder may be
When the Gaussian operation is applied to Equation in order to prove the quotient and the remainder, Equation (9) may be obtained.
Equation (9) shows that
is satisfied.
Meanwhile, when the rounding operation is applied to Equation (8), Equation (10) may be obtained.
As a result the conditional expression shown in Equation 11 ma be obtained.
In Equation (11), the condition that the number added to
becomes 1 or 0 may be represented as shown in Equation (12). In order for the number to become 1 based on the definition of the Gauss symbol, the condition shown in Equation (12) should be satisfied.
Particularly, when
is satisfied because |x|<|a| is satisfied, this may be expanded to
in which case the operation may be represented as shown in Equation (13).
Through Equations (5) and (13), quantization of division may be derived as Equation (14). First, the function, g(x,a)∈{0,1}, may be defined as shown in Equation (14).
When quantization for division is interpreted using Equation (14), this may be represented as shown in Equation (15):
Based on the definition of learning quantization and main quantization operations according to an embodiment of the present invention, it may be understood that, when quantization based on the distribution of a remainder value is performed, the smallest value is 1 or 0.
Referring to
When the above-described quantization method is expanded to a vector, this may be defined in such a way that quantization is applied to the elements of each vector. At step S10, using the method of applying quantization to the elements of each vector, the quantization characteristics of a learning equation may be analyzed.
For example, it may be assumed that the learning equation shown in Equation (16) is given for x(t)∈Rn.
x
t+1
=x
t−λtht (16)
In Equation (16), ht∈Rn is a search direction vector, λt∈R(0,1] is a learning rate, and t∈R is a parameter related to time. In Equation (16), when quantization of xt is defined as xtQ≡[x]t, the quantized learning equation shown in Equation (17) may be defined.
x
t+1
Q=(xt−λtht)Q (17)
In Equation (17), λt denotes a learning rate. Using ht, xt or xtQ may be updated to xt+1Q through the update term of Equation (18).
λtht=(λtht)Q (18)
Additionally, it is assumed that the following basic condition for the learning rate is satisfied.
λi=arg min ƒ(xi−λhi) (19)
Here, when Equation (17), which is a quantized learning equation, is rewritten by substituting the quantized xtQ for xt and by applying Equation (18), which is a quantized update term, Equation (20) may be obtained.
x
t+1
Q=(xtQ−(λtht)Q)Q=xtQ−(λtht)Q (20)
In Equation (20), because the learning rate λt∈R and the search direction vector ht∈Rn are a scalar and a vector, respectively, it is relatively difficult to implement quantization satisfying λtht=(λtht)Q. Therefore, even though the optimized learning rate is found through a line search algorithm, it is required to recalculate it so as to enable quantization.
The most intuitive quantization of the update term is to set λt=1/Qp. As the result thereof,
is satisfied, and the quantized update term may be represented as shown in Equation (21).
Based on this, the function z(kt)∈Z, z(kt)>0, which outputs an arbitrary positive integer for the internal iteration index kt in the t-th iteration, is defined, and using this function, the learning rate λt is set to λt=z(kt)Qp−1. Accordingly, the update term may be represented as shown in Equation (22).
If quantization is defined well so as to satisfy
by appropriately defining Equation (22) (or if quantization can be defined through the internal characteristics of a calculator), because z(kt)∈Z is satisfied, Equation (23) may be obtained.
In Equation (23), z(kt)∈Z(0,Qp) is satisfied. When the well-defined
is assumed to be a basic quantization search direction vector, Equation (24) may be defined.
When Equation (22) is rewritten using Equation (24), Equation (25) may be obtained.
(λtht)Q=z(kt)htQ (25)
That is, at step S10, the learning rate of the quantized machine-learning algorithm may be set using at least one of an Armijo rule and golden search methods.
Here, at step S10, using the gradient vector of the objective function of the search direction vector, the learning rate may be set based on a learning-rate-setting function predefined by the Armijo rule.
Here, at step S10, any one of a first candidate value, which is acquired by increasing the minimum candidate value of the learning rate by a golden ratio, and a second candidate value, which is acquired by decreasing the maximum candidate value of the learning rate by the golden ratio, may be set as the learning rate.
Here, at step S10, when the difference value between the first candidate value and the second candidate value is equal to or less than a preset value, any one of the first candidate value and the second candidate value may be set as the learning rate.
Also, in the method for optimizing a quantized machine-learning algorithm according to an embodiment of the present invention, quantization search performance may be compensated for at step S20.
That is, at step S20, a quantized orthogonal compensation search vector may be calculated from the search direction vector of the quantized machine-learning algorithm, and the search performance of the quantized machine-learning algorithm may be compensated for using the quantized orthogonal compensation search vector.
Here, at step S20, a vector in the direction orthogonal to the direction opposite the largest component vector of the search direction vector is selected, and the selected vector is quantized, whereby the quantized orthogonal compensation search vector may be calculated.
Here, at step S20, when the solution of the quantized machine-learning algorithm is not able to escape from a local minimum point, the solution may be made to escape from the local minimum point using the quantized orthogonal compensation search vector.
Also, in the method for optimizing a quantized machine-learning algorithm according to an embodiment of the present invention, an optimized learning algorithm may be calculated at step S30.
That is, at step S30, the optimized quantized machine-learning algorithm may be calculated using the learning rate and the quantized machine-learning algorithm, the search performance of which is compensated for.
Referring to
First, at step S110, a quantization parameter may be set.
Also, at step S120, a learning-rate-setting parameter may be set.
Also, at step S130, a search direction vector may be calculated.
The search direction vector ht is the gradient vector of an objective function ƒ(x) and is represented as ht=∇ƒ(xt), and the function for setting the learning rate may be represented as shown in Equation (26).
ϕ(βk)=βkα∥ht∥2 (26)
Also, at step S140, when the error of the search direction vector is equal to or less than a preset value, the optimization of the quantized machine-learning algorithm is terminated, but when the error is greater than the preset value, the learning rate may be set at step S150.
At step S150, the learning rate may be set using Equation (27) to which the Armijo rule is applied.
Finally, the quantization parameter may be configured with positive integers, as shown in Equation (28).
Q
p=η·ρn η,ρ, n∈Z++ (28)
Because λt=ϕ(βk) is satisfied in the definition of the learning rate, the Armijo rule may be modified so as to satisfy the characteristics of the quantized learning rate in Equation (23). First, assuming that the search direction vector is the basic quantization search direction vector and is represented as ht(ht)Q=QphtQ, Equation (29) may be obtained.
Equation (29) may be solved as shown in Equation (30).
Because α and β are arbitrary values satisfying α,β∈R(0,1), an integer that is equal to or less than η is taken (that is, ζ≤η) for α in Equation (30), whereby the condition for a may be satisfied as shown in Equation (31).
Also, when β=ρ−1 is set,
Therefore,
Therefore, when the objective function ƒ:Rn→R is configured so as to satisfy ƒ:Zn→Z for zn and when h0 is configured so as to satisfy h0=−∇ƒ(x0)∈Z, the learning rate λtQ based on the Armijo rule may be calculated using only the integer operations shown in Equation (34), and may then be applied to the quantized machine-learning algorithm at step S150.
Also, at step S160, the quantized learning equation may be updated using the set learning rate.
Referring to
The golden search method is a method for finding the optimized learning rate, like the Armijo rule and a learning rate satisfying Equation 35 may be set.
First, at step S210, a quantization parameter may be set.
Also, at step S220, a search range for applying the golden search method may be set.
Also, at step S230, an initial condition for applying the golden search method may be set.
Here, at step S230, when quantization is not applied, a0=0 and b0=1 may be set and ht=QphtQ may be set.
Also, at step S240, the golden search method may be applied.
Here, at step S240, the i-th search range 1, may be defined as shown in Equation (36).
l
i
b
i
−a
i (36)
Here, ai and bi may be the minimum candidate value and the maximum candidate value for the learning rate A.
Here, at step S240, when the value of the search range 1, is made to approach 0 by increasing the minimum candidate value by a golden ratio and decreasing the maximum candidate value by the golden ratio using the golden search method, one of the minimum candidate value and the maximum candidate value may be set as the learning rate.
The golden ratio used for the golden search may be F0=0.618.
Also, at steps S250 to S280, the minimum candidate value and the maximum candidate value may be updated.
Here, the update of the minimum candidate value and the maximum candidate value may be represented as shown in Equation (37).
a′
i
=a
i+(1−F0)li, b′i=bi−(1−F0)li (37)
Because ai and bi are set to ai,bi∈R(0,1) in Equation (37), quantization thereof is required.
Using Equations (24) and (25), the quantized learning equation shown in Equation (38) may be calculated.
x
t+1
Q
=x
t
Q−(λtht)Q=xtQ−(λtQphtQ)Q=xtQ−(λtQp)QhtQ (38)
In Equation (38), because (λtQp)Q is a value that falls within the range of z(0,Qp), when λtQ(λtQp) is set, Equation (39) may be obtained.
a
i
Q≡(aiQp)Q, bi≡(biQp)Q, ai,bi∈Z(0,Qp) (39)
Equation (39) may be solved as a quantized operation using the golden search method. To this end,
a′
i
Q
p
=a
i
Q
p
+Q
p(1−F0)li
a′
i
Q
p
=a
i
Q
p
+Q
p
i
(a′iQp)Q=(aiQp+Qp
The error resulting from quantization is taken into consideration, and because li=biQ−aiQ∈Z is satisfied as the result of quantization, Equation (41) may be obtained.
a′
i
Q=(aiQp)Q+(Qp
Because this is the operation for setting the learning rate, when the quantization error is ignored, Equations (39) and (41) may be solved as shown in Equation (42).
a′
i
Q
=a
i
Q+(Qp
Because the learning rate set through the above process satisfies λtQ
x
t+1
Q
=x
t
Q−λtQ
Referring to
Here, at step S20, a point that makes an objective function smaller in a different direction other than a search direction vector is searched for, whereby optimization performance degradation may be overcome.
First, at step S310, a quantization parameter may be set.
Also, at step S320, an initial parameter may be set.
Also, at step S330, a search direction vector may be calculated.
Also, at step S340, when the error of the search direction vector is equal to or less than a preset value, the optimization of the quantized machine-learning algorithm is terminated, but when the error is greater than the preset value, a learning rate may be set at step S350.
At step S350, the learning rate may be set using the Armijo rule and the golden search method, as described with reference to
Also, at step S360, whether the quantized learning equation using the set learning rate reaches a local minimum point is determined. When the quantized learning equation is determined not to reach the local minimum point, the search direction vector may be compensated for at step S370.
Here, at step S360, with regard to performance degradation, it may be checked whether the optimization parameter x, cannot converge near the local minimum point because the appropriate learning rate enabling arrival at the local minimum point cannot be applied due to quantization or whether a better minimum point cannot be found.
For the search direction vector ht∈Rn, the quantized vector (ht)Q=QphtQ∈Rn may be represented as shown in Equation (44).
Here, at step S370, the different direction is the direction orthogonal to the search direction. Because a vector orthogonal to the search direction vector may have various vector directions, the vector in the direction orthogonal to the direction opposite the largest component vector of the search direction vector may be selected.
In Equation (44), ei is the unit orthogonal vector of Euclidian space Rn and satisfies ∀i,j∈Z[0,n), ∥ei∥=1, eiTej=0 i≠j. Assume that the largest component, among the components {vi} of the quantized vector (ht)Q, is vm=max{∥vi∥}, and that the index thereof is m=argmaxi{∥vi∥}. Here, when the vector acquired by setting vm=0 in the quantized vector (ht)Q is
Therefore, the search direction vector may be divided into two orthogonal vectors
At step S370, vector z may be calculated as shown in Equation (47) in order to obtain the vector in the direction orthogonal to the largest component in the existing search direction vector (ht)Q.
z
t
=
In Equation (47), r∈R is a proportional constant for {circumflex over (v)}, and through this value, the orthogonal vector z may be calculated. Using the orthogonality between the vector z and the vector (ht)Q, r may be calculated as shown in Equation (48).
However, because |r|<1 is satisfied, when this value is applied to the learning equation without change, the operation is not performed using an integer value.
Accordingly, the compensated search vector may become a general real number vector, rather than a vector configured with quantized values. Therefore, at step S370, the compensated search vector may be calculated in consideration of quantization of the proportional constant.
In Equation (47), because (ht)Q=QphtQ is satisfied, when the equation is solved using vm=QpvmQ, Equation (49) may be obtained.
In Equation (49), when vmQem{circumflex over (v)}Q is set, Equation (50) may be obtained.
Based on Equations (46) and (48) and (ht)Q=QphtQ, Equation (51) may be obtained.
Accordingly, when the coefficient of {circumflex over (v)}Q is solved using Equations (11) and (25), Equation (52) may be obtained.
Accordingly, when the compensated search vector zt is quantized using Equations (51) and (52), Equation (53) may be obtained.
Therefore, Equation (53) may be solved to Equation (54) using Equation (4).
In Equation (54),
and
are the quotient and remainder of
The part corresponding to the remainder may be simplified as shown in Equation (55).
The quantized orthogonal compensation search vector may be represented as shown in Equation (56).
That is, at step S370, the quantized compensation search vector zt, which is orthogonal to the search direction vector, may be calculated.
Here, at step S370, when the quantized learning equation does not escape from a local minimum point using the quantized orthogonal compensation search vector, the quantized learning equation may be made to escape from the local minimum point using the quantized compensation search vector.
Because the quantized compensation search vector has a scale of a quantization coefficient Qp, this may be calculated as the basic quantization compensation search vector, as shown in Equation (57).
Also, at step S380, the quantized learning equation may be calculated.
That is, at step S380, based on the quantized compensation search vector defined in Equation (57), the quantized compensation search vector is multiplied by the quantized learning rate calculated using the quantized Armijo rule or the quantized line search algorithm, whereby the quantized learning equation shown in Equation (58) may be calculated.
The quantized learning equation shown in Equation (58) may be easily combined with the existing machine-learning or nonlinear algorithm.
x
t+1
Q
=x
t
Q−λtQhtQ, htQ=ztQ (58)
Referring to
The apparatus for optimizing a quantized machine-learning algorithm according to an embodiment of the present invention includes one or more processors 1110 and executable memory 1130 for storing at least one program executed by the one or more processors 1110. The at least one program may set the learning rate of the quantized machine-learning algorithm using at least one of an Armijo rule and golden search methods, calculate a quantized orthogonal compensation search vector from the search direction vector of the quantized machine-learning algorithm, compensate for the search performance of the quantized machine-learning algorithm using the quantized orthogonal compensation search vector, and calculate an optimized quantized machine-learning algorithm using the learning rate and the quantized machine-learning algorithm, the search performance of which is compensated for.
Here, the at least one program may set the learning rate through a learning-rate-setting function predefined by the Armijo rule using the gradient vector of the objective function of the search direction vector.
Here, the at least one program may set any one of a first candidate value, which is acquired by increasing the minimum candidate value of the learning rate by a golden ratio, and a second candidate value, which is acquired by decreasing the maximum candidate value of the learning rate by the golden ratio, as the learning rate.
Here, when the difference value between the first candidate value and the second candidate value is equal to or less than a preset value, the at least one program may set any one of the first candidate value and the second candidate value as the learning rate.
Here, the at least one program may select a vector in a direction orthogonal to the direction opposite the largest component vector of the search direction vector and quantize the selected vector, thereby calculating the quantized orthogonal compensation search vector.
Here, when the solution of the quantized machine-learning algorithm is not able to escape from a local minimum point, the at least one program may make the solution escape from the local minimum point using the quantized orthogonal compensation search vector.
The present invention may implement an optimization algorithm capable of minimizing a quantization error in machine-learning and nonlinear-signal-processing fields using quantization and exhibiting excellent performance on lightweight hardware.
Also, the present invention may implement a machine-learning algorithm capable of providing sufficient optimization performance even on low-performance hardware.
As described above, the apparatus and method for optimizing a quantized machine-learning algorithm according to the present invention are not limitedly applied to the configurations and operations of the above-described embodiments, but all or some of the embodiments may be selectively combined and configured, so that the embodiments may be modified in various ways.
Number | Date | Country | Kind |
---|---|---|---|
10-2019-0138793 | Nov 2019 | KR | national |
10-2020-0046724 | Apr 2020 | KR | national |