APPARATUS AND METHOD FOR OPTIMIZING QUANTIZED MACHINE-LEARNING ALGORITHM

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of Korean Patent Applications No. 10-2019-0138793, filed Nov. 1, 2019, and No. 10-2020-0046724, filed Apr. 17, 2020, which are hereby incorporated by reference in their entireties into this application.

BACKGROUND OF THE INVENTION
1. Technical Field

The present invention relates generally to machine-learning and signal-processing technology, and more particularly to technology for optimizing a quantized machine-learning algorithm.

2. Description of the Related Art

In conventional machine-learning and nonlinear-signal-processing technology, a signal-processing operation is performed based on floating-point operations. However, the conventional machine-learning and nonlinear-signal-processing technology is not suitable for fields in which small and lightweight hardware is required because such technology uses multiple operation modules in order to provide real-time operation and because the size and complexity of a computation module provided for floating-point operations are greater than the size and complexity of a computation module provided for integer operations.

Accordingly, research on quantization of processing data is underway in various engineering fields. Quantization of processing data may confer many advantages, such as a decrease in the number of bits, calculation speed improvement, and availability improvement, when an engineering solution is implemented. For example, because a learning equation of a quantized domain is implemented by updating the least significant bit of a parameter, it is the same as applying a fixed learning speed to an update parameter. However, a general stochastic steepest-descent algorithm having a fixed step size cannot avoid performance degradation because of convergence on the optimum point in weak topology, such as convergence with first-order distribution.

Meanwhile, Korean Patent Application Publication No. 10-2018-0043154, titled “Method and apparatus for neural network quantization” relates to a neural network quantization method, and discloses a method for quantizing the parameters of a neural network, which includes determining the diagonals of a second-order partial derivative matrix (Hessian matrix) of the loss function of the network parameters of the neural network and assigning Hessian weights to the network parameters using the determined diagonals as part of the step of quantizing the network parameters.

SUMMARY OF THE INVENTION

An object of the present invention is to implement an optimization algorithm capable of minimizing a quantization error in machine-learning and nonlinear-signal-processing fields using quantization and exhibiting excellent performance on lightweight hardware.

Another object of the present invention is to implement a machine-learning algorithm capable of providing sufficient optimization performance even on low-performance hardware.

In order to accomplish the above objects, an apparatus for optimizing a quantized machine-learning algorithm according to an embodiment of the present invention may include one or more processors and executable memory for storing at least one program executed by the one or more processors. The at least one program may set the learning rate of the quantized machine-learning algorithm using at least one of an Armijo rule and golden search methods, calculate a quantized orthogonal compensation search vector from the search direction vector of the quantized machine-learning algorithm, compensate for the search performance of the quantized machine-learning algorithm using the quantized orthogonal compensation search vector, and calculate an optimized quantized machine-learning algorithm using the learning rate and the quantized machine-learning algorithm, the search performance of which is compensated for.

Here, the at least one program may set the learning rate through a learning-rate-setting function predefined by the Armijo rule using the gradient vector of the objective function of the search direction vector.

Here, the at least one program may set any one of a first candidate value, acquired by increasing the minimum candidate value of the learning rate by a golden ratio, and a second candidate value, acquired by decreasing the maximum candidate value of the learning rate by the golden ratio, as the learning rate.

Here, the at least one program may set any one of the first candidate value and the second candidate value as the learning rate when the difference value between the first candidate value and the second candidate value is equal to or less than a preset value.

Here, the at least one program may select a vector in the direction orthogonal to the direction opposite the largest component vector of the search direction vector and quantize the selected vector, thereby calculating the quantized orthogonal compensation search vector.

Here, when the solution of the quantized machine-learning algorithm is not able to escape from a local minimum point, the at least one program may make the solution escape from the local minimum point using the quantized orthogonal compensation search vector.

Also, in order to accomplish the above objects, a method for optimizing a quantized machine-learning algorithm, performed by an apparatus for optimizing the quantized machine-learning algorithm, according to an embodiment of the present invention may include setting the learning rate of the quantized machine-learning algorithm using at least one of an Armijo rule and golden search methods; calculating a quantized orthogonal compensation search vector from the search direction vector of the quantized machine-learning algorithm and compensating for the search performance of the quantized machine-learning algorithm using the quantized orthogonal compensation search vector; and calculating an optimized quantized machine-learning algorithm using the learning rate and the quantized machine-learning algorithm, the search performance of which is compensated for.

Here, setting the learning rate may be configured to set the learning rate through a learning-rate-setting function predefined by the Armijo rule using the gradient vector of the objective function of the search direction vector.

Here, setting the learning rate may be configured to set any one of a first candidate value, acquired by increasing the minimum candidate value of the learning rate by a golden ratio, and a second candidate value, acquired by decreasing the maximum candidate value of the learning rate by the golden ratio, as the learning rate.

Here, setting the learning rate may be configured to set any one of the first candidate value and the second candidate value as the learning rate when the difference value between the first candidate value and the second candidate value is equal to or less than a preset value.

Here, compensating for the search performance may be configured to select a vector in the direction orthogonal to the direction opposite the largest component vector of the search direction vector and to quantize the selected vector, thereby calculating the quantized orthogonal compensation search vector.

Here, compensating for the search performance may be configured such that, when the solution of the quantized machine-learning algorithm is not able to escape from a local minimum point, the solution is made to escape from the local minimum point using the quantized orthogonal compensation search vector.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other objects, features and advantages of the present invention will be more clearly understood from the following detailed description, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a flowchart illustrating a method for optimizing a quantized machine-learning algorithm according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating an optimization algorithm using a quantized Armijo rule according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating an optimization algorithm using a golden search according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating an optimization algorithm using a compensated search vector according to an embodiment of the present invention; and

FIG. 5 is a view illustrating a computer system according to an embodiment of the present invention.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present invention will be described in detail below with reference to the accompanying drawings. Repeated descriptions and descriptions of known functions and configurations that have been deemed to unnecessarily obscure the gist of the present invention will be omitted below. The embodiments of the present invention are intended to fully describe the present invention to a person having ordinary knowledge in the art to which the present invention pertains. Accordingly, the shapes, sizes, etc. of components in the drawings may be exaggerated in order to make the description clearer.

Hereinafter, a preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.

FIG. 1 is a flowchart illustrating a method for optimizing a quantized machine-learning algorithm according to an embodiment of the present invention.

Table 1 shows mathematical symbols used for explaining a quantized machine-learning algorithm according to an embodiment of the present invention.

TABLE 1

mathematical

symbol
description

R
real number space

Z
integer space

N
natural number space, 0 is not included

Rⁿ
n-dimensional vector space of real numbers

Zⁿ
n-dimensional vector space of integers

R(a, b]
{x|∀x ∈ R, a < x ≤ b}

R[a, b)
{x|∀x ∈ R, a ≤ x < b}

Z[a, b]
{x|∀x ∈ Z, a ≤ x ≤ b}

Z(a, b)
{x|∀x ∈ Z, a < x < b}

δ(x)
Dirac-Delta function, if x = 0, δ(x) = 1, and if x ≠ 0,

δ(x) = 0

custom-character

_Q_px
average value of x based on probability distribution for

quantization error caused by quantization coefficient Q_p

The definition of learning quantization and main quantization operations according to an embodiment of the present invention may be described as follows.

First, in order to define quantization of a variable x∈R, a round-off operation for obtaining an integer may be defined as shown in Equation (1).

x≡└x┘+ϵ (ϵ∈R[0,1)) (1)

In Equation (1), the symbol └x┘∈Z indicates a round-off operation, which may be defined as the integer that is nearest to x, among integers less than x. Using this, the Gauss symbol may be defined as shown in Equation (2).

[x]+└x+0.5┘=x+0.5−∇ custom-character x+ϵ (2)

When the operation of rounding x to the nearest number is defined as [x] using Equation (2), the round-off error, ϵ, may be defined as ϵ∈R(−0.5, 0.5┘. Therefore, when an arbitrary integer n E Z is given, the relationships of addition and multiplication may be represented as shown in Equation (3).

[x+n]=└x+n+0.5┘=└x+0.5+n┘=└x+0.5┘+n=[x]+n[n·[x]]=└n−└x+0.5┘+0.5┘=n·└x+0.5┘=n·[x], [n·x]≠n[x] (3)

Also, when an arbitrary real-number sequence {x_k} (∀k∈N, x_k∈R) is given, if the round-off operation of each x_kis represented using ϵ_kas in Equation (1), Equation (4) may be obtained.

$\begin{matrix} ⌊ \sum_{k = 1}^{n} x_{k} ⌋ = ⌊ \sum_{k = 1}^{n} (⌊ x_{k} ⌋ + ϵ_{k}) ⌋ = \sum_{k = 1}^{n} ⌊ x_{k} ⌋ + ⌊ \sum_{k = 1}^{n} ϵ_{k} ⌋ & (4) \end{matrix}$

Using Equations (1) and (2), a quantization operation may be defined as shown in Equation (5).

$\begin{matrix} x^{Q} \overset{Δ}{=} \frac{1}{Q_{p}} ⌊ Q_{p} \cdot (x + 0.5) ⌋ & (5) \end{matrix}$

In Equation (5), Q_pdenotes a quantization coefficient, which may determine the level of quantization. For example, when quantization to a fixed-point number at the level of 10⁻³is attempted, Q_p=10³may be set. For convenience, the quantization coefficient may be set to a positive integer (Q_p∈Z, Q_p>0). In order to represent quantization of a specific number without the Gauss symbol, a round-off error is used, whereby Equation (5) may be replaced with Equation (6).

$\begin{matrix} x^{Q} = \frac{1}{Q_{p}} [Q_{p} \cdot x] = \frac{1}{Q_{p}} ⌊ Qp \cdot (x + 0.5) ⌋ = \frac{1}{Qp} (Qp \cdot (x + ɛ)) = x + ɛ & (6) \end{matrix}$

In Equation (6), x^Qsatisfies x^Q∈R, but Q_px^Q=[Q_p·x]∈Z. That is, in the case of x^Q, quantization may be performed to obtain a fixed-point number format, rather than simply an integer.

In Equation (6), the range of the round-off error e may be represented as ε∈R(−0.5Q_p⁻¹,0.5Q_p⁻¹]=R(−5·(10Q_p)⁻¹,5·(10Q_p)⁻¹]. If Q_pis large enough that the average value for the distribution of the round-off error depending on Q_psatisfies E_Q_pε=0, the relationship shown in Equation (7) may be satisfied for x_k, which is the sample data of x.

$\begin{matrix} \begin{matrix} 𝔼_{Q_{p}} x^{Q} \overset{Δ}{=} \lim_{N \to \infty} \frac{1}{N} \sum_{k = 0}^{N - 1} x_{k}^{Q} = \lim_{N \to \infty} \frac{1}{N} \sum_{k = 0}^{N - 1} (x_{k} + ɛ_{k}) \\ = 𝔼_{Q_{p}} x + 𝔼_{Q_{p}} ɛ = 𝔼_{Q_{p}} x . \end{matrix} & (7) \end{matrix}$

The addition and multiplication of quantized values have the characteristics shown in Equations (4) and (5). However, because the result of division may not be represented as a quantized value, the following method is applied. That is, when division of two integers x and a (x, a∈z) is represented using a quotient and a remainder, division of x by a may be represented as shown in Equation (8).

$\begin{matrix} \frac{x}{a} = ⌊ \frac{x}{a} ⌋ + \frac{1}{a} (x - a ⌊ \frac{x}{a} ⌋) & (8) \end{matrix}$

In Equation (8), the quotient may be

$⌊ \frac{x}{a} ⌋,$

and the remainder may be

$x - a ⌊ \frac{x}{a} ⌋ .$

When the Gaussian operation is applied to Equation in order to prove the quotient and the remainder, Equation (9) may be obtained.

$\begin{matrix} \begin{matrix} ⌊ \frac{x}{a} ⌋ = ⌊ ⌊ \frac{x}{a} ⌋ + \frac{1}{a} (x - a ⌊ \frac{x}{a} ⌋) ⌋ \\ = ⌊ \frac{x}{a} ⌋ + ⌊ \frac{1}{a} (x - a ⌊ \frac{x}{a} ⌋) ⌋ \end{matrix} & (9) \end{matrix}$

Equation (9) shows that

$(x - a ⌊ \frac{x}{a} ⌋) < a$

is satisfied.

Meanwhile, when the rounding operation is applied to Equation (8), Equation (10) may be obtained.

$\begin{matrix} \begin{matrix} [\frac{x}{a}] = ⌊ \frac{x}{a} + 0.5 ⌋ \\ = ⌊ ⌊ \frac{x}{a} ⌋ + \frac{1}{a} (x - a ⌊ \frac{x}{a} ⌋) + \frac{a}{2 a} ⌋ \\ = ⌊ \frac{x}{a} ⌋ + ⌊ \frac{1}{a} (x - a (⌊ \frac{x}{a} ⌋ - \frac{1}{2})) ⌋ \end{matrix} & (10) \end{matrix}$

As a result the conditional expression shown in Equation 11 ma be obtained.

$\begin{matrix} [\frac{x}{a}] = {\begin{matrix} ⌊ \frac{x}{a} ⌋ & x < a (⌊ \frac{x}{a} ⌋ + \frac{1}{2} ⌋) \Rightarrow a \cdot ɛ > 0 \\ ⌊ \frac{x}{a} ⌋ + 1 & x \geq a (⌊ \frac{x}{a} ⌋ + \frac{1}{2} ⌋) \Rightarrow a \cdot ɛ \leq 0 \end{matrix} & (11) \end{matrix}$

In Equation (11), the condition that the number added to

$⌊ \frac{x}{a} ⌋$

becomes 1 or 0 may be represented as shown in Equation (12). In order for the number to become 1 based on the definition of the Gauss symbol, the condition shown in Equation (12) should be satisfied.

$\begin{matrix} x - a (⌊ \frac{x}{a} ⌋ - 0.5) \geq a & (12) \\ \Rightarrow x \geq a \cdot ⌊ \frac{x}{a} ⌋ + 0.5 a \\ \Rightarrow x \geq a \cdot (\frac{x}{a} - ϵ) + 0.5 a ∵ by (eq 01 : \sec_{01}) x = ⌊ x ⌋ + ϵ \\ \Rightarrow x \geq x - 0.5 a + ϵ + 0.5 a ∵ by (eq 02 : \sec_{01}) ɛ = 0.5 - ϵ \\ \Rightarrow a ɛ \leq 0 \end{matrix}$

Particularly, when

$\langle \frac{x}{a} \rangle < 1$

is satisfied because |x|<|a| is satisfied, this may be expanded to

$Q_{p} \langle \frac{x}{a} \rangle < Q_{p},$

in which case the operation may be represented as shown in Equation (13).

$\begin{matrix} Q_{p} \cdot \frac{x}{a} = ⌊ \frac{Q_{p} x}{a} ⌋ + \frac{1}{a} (Q_{p} x - a ⌊ \frac{Q_{p} x}{a} ⌋) & (13) \end{matrix}$

Through Equations (5) and (13), quantization of division may be derived as Equation (14). First, the function, g(x,a)∈{0,1}, may be defined as shown in Equation (14).

$\begin{matrix} g (x, a) = ⌊ \frac{1}{a} (Q_{p} x - a (⌊ \frac{Q_{p} x}{a} ⌋ - \frac{1}{2})) ⌋ = {\begin{matrix} 1 & a \cdot ɛ \leq 0 \\ 0 & a \cdot ɛ > 0 \end{matrix} & (14) \end{matrix}$

When quantization for division is interpreted using Equation (14), this may be represented as shown in Equation (15):

$\begin{matrix} \begin{matrix} \frac{1}{Q_{p}} [Q_{p} \cdot \frac{x}{a}] = \frac{1}{Q_{p}} ⌊ ⌊ \frac{Q_{p} x}{a} ⌋ + \frac{1}{a} (Q_{p} x - a (⌊ \frac{Q_{p} x}{a} ⌋ - \frac{1}{2})) ⌋ \\ = \frac{1}{Q_{p}} ⌊ \frac{Q_{p} x}{a} ⌋ + \frac{1}{Q_{p}} ⌊ \frac{1}{a} (Q_{p} x - a (⌊ \frac{Q_{p} x}{a} ⌋ - \frac{1}{2})) ⌋ \\ = \frac{1}{Q_{p}} (⌊ \frac{Q_{p} x}{a} ⌋ + g (x, a)) \end{matrix} & (15) \end{matrix}$

Based on the definition of learning quantization and main quantization operations according to an embodiment of the present invention, it may be understood that, when quantization based on the distribution of a remainder value is performed, the smallest value is 1 or 0.

Referring to FIG. 1, in the method for optimizing a quantized machine-learning algorithm according to an embodiment of the present invention, a learning rate may be set at step S10.

When the above-described quantization method is expanded to a vector, this may be defined in such a way that quantization is applied to the elements of each vector. At step S10, using the method of applying quantization to the elements of each vector, the quantization characteristics of a learning equation may be analyzed.

For example, it may be assumed that the learning equation shown in Equation (16) is given for x(t)∈Rⁿ.

x
_t+1
=x
_t−λ_th_t (16)

In Equation (16), h_t∈Rⁿis a search direction vector, λ_t∈R(0,1] is a learning rate, and t∈R is a parameter related to time. In Equation (16), when quantization of x_tis defined as x_t^Q≡[x]_t, the quantized learning equation shown in Equation (17) may be defined.

x
_t+1
^Q=(x_t−λ_th_t)^Q (17)

In Equation (17), λ_tdenotes a learning rate. Using h_t, x_tor x_t^Qmay be updated to x_t+1^Qthrough the update term of Equation (18).

λ_th_t=(λ_th_t)^Q (18)

Additionally, it is assumed that the following basic condition for the learning rate is satisfied.

λ_i=arg min ƒ(x_i−λh_i) (19)

Here, when Equation (17), which is a quantized learning equation, is rewritten by substituting the quantized x_t^Qfor x_tand by applying Equation (18), which is a quantized update term, Equation (20) may be obtained.

x
_t+1
^Q=(x_t^Q−(λ_th_t)^Q)^Q=x_t^Q−(λ_th_t)^Q (20)

In Equation (20), because the learning rate λ_t∈R and the search direction vector h_t∈Rⁿare a scalar and a vector, respectively, it is relatively difficult to implement quantization satisfying λ_th_t=(λ_th_t)^Q. Therefore, even though the optimized learning rate is found through a line search algorithm, it is required to recalculate it so as to enable quantization.

The most intuitive quantization of the update term is to set λ_t=1/Q_p. As the result thereof,

$λ_{t} h_{t} = \frac{1}{Q_{p}} h_{t}$

is satisfied, and the quantized update term may be represented as shown in Equation (21).

$\begin{matrix} {(λ_{t} h_{t})}^{Q} = \frac{1}{Q_{p}} (Q_{p} λ_{t} h_{t} + Q_{p} ɛ) = \frac{1}{Q_{p}} h_{t} + ɛ = {(\frac{1}{Q_{p}} h_{t})}^{Q} & (21) \end{matrix}$

Based on this, the function z(k_t)∈Z, z(k_t)>0, which outputs an arbitrary positive integer for the internal iteration index k_tin the t-th iteration, is defined, and using this function, the learning rate λ_tis set to λ_t=z(k_t)Q_p⁻¹. Accordingly, the update term may be represented as shown in Equation (22).

$\begin{matrix} {(λ_{t} h_{t})}^{Q} = \frac{z (k_{t})}{Q_{p}} h_{t} + ɛ = {(\frac{z (k_{t})}{Q_{p}} h_{t})}^{Q} & (22) \end{matrix}$

If quantization is defined well so as to satisfy

$\frac{1}{Q_{p}} h_{t} \in Z$

by appropriately defining Equation (22) (or if quantization can be defined through the internal characteristics of a calculator), because z(k_t)∈Z is satisfied, Equation (23) may be obtained.

$\begin{matrix} {(\frac{z (k_{t})}{Q_{p}} h_{t})}^{Q} = z (k_{t}) {(\frac{1}{Q_{p}} h_{t})}^{Q} & (23) \end{matrix}$

In Equation (23), z(k_t)∈Z(0,Q_p) is satisfied. When the well-defined

$\frac{1}{Q_{p}} h_{t} \in Z$

is assumed to be a basic quantization search direction vector, Equation (24) may be defined.

$\begin{matrix} h_{t}^{Q} \overset{Δ}{=} {(\frac{1}{Q_{p}} h_{t})}^{Q}, h_{t}^{Q} \in Z & (24) \end{matrix}$

When Equation (22) is rewritten using Equation (24), Equation (25) may be obtained.

(λ_th_t)^Q=z(k_t)h_t^Q (25)

That is, at step S10, the learning rate of the quantized machine-learning algorithm may be set using at least one of an Armijo rule and golden search methods.

Here, at step S10, using the gradient vector of the objective function of the search direction vector, the learning rate may be set based on a learning-rate-setting function predefined by the Armijo rule.

Here, at step S10, any one of a first candidate value, which is acquired by increasing the minimum candidate value of the learning rate by a golden ratio, and a second candidate value, which is acquired by decreasing the maximum candidate value of the learning rate by the golden ratio, may be set as the learning rate.

Here, at step S10, when the difference value between the first candidate value and the second candidate value is equal to or less than a preset value, any one of the first candidate value and the second candidate value may be set as the learning rate.

Also, in the method for optimizing a quantized machine-learning algorithm according to an embodiment of the present invention, quantization search performance may be compensated for at step S20.

That is, at step S20, a quantized orthogonal compensation search vector may be calculated from the search direction vector of the quantized machine-learning algorithm, and the search performance of the quantized machine-learning algorithm may be compensated for using the quantized orthogonal compensation search vector.

Here, at step S20, a vector in the direction orthogonal to the direction opposite the largest component vector of the search direction vector is selected, and the selected vector is quantized, whereby the quantized orthogonal compensation search vector may be calculated.

Here, at step S20, when the solution of the quantized machine-learning algorithm is not able to escape from a local minimum point, the solution may be made to escape from the local minimum point using the quantized orthogonal compensation search vector.

Also, in the method for optimizing a quantized machine-learning algorithm according to an embodiment of the present invention, an optimized learning algorithm may be calculated at step S30.

That is, at step S30, the optimized quantized machine-learning algorithm may be calculated using the learning rate and the quantized machine-learning algorithm, the search performance of which is compensated for.

FIG. 2 is a flowchart illustrating an optimization algorithm using a quantized Armijo rule according to an embodiment of the present invention.

Referring to FIG. 2, the process of setting the learning rate of a quantized machine-learning algorithm to which the Armijo rule is applied at step S10 is specifically illustrated.

First, at step S110, a quantization parameter may be set.

Also, at step S120, a learning-rate-setting parameter may be set.

Also, at step S130, a search direction vector may be calculated.

The search direction vector h_tis the gradient vector of an objective function ƒ(x) and is represented as h_t=∇ƒ(x_t), and the function for setting the learning rate may be represented as shown in Equation (26).

ϕ(β^k)=β^kα∥h_t∥² (26)

Also, at step S140, when the error of the search direction vector is equal to or less than a preset value, the optimization of the quantized machine-learning algorithm is terminated, but when the error is greater than the preset value, the learning rate may be set at step S150.

At step S150, the learning rate may be set using Equation (27) to which the Armijo rule is applied.

$\begin{matrix} λ_{t} \overset{Δ}{=} \underset{k \in N}{argmax} [β^{k} \langle f (x_{t} + β^{k} h_{t}) - f (x_{t}) \leq - β^{k} α { h_{t} }^{2} = ϕ (β^{k})] α, β \in R (0, 1) & (27) \end{matrix}$

Finally, the quantization parameter may be configured with positive integers, as shown in Equation (28).

Q
_p=η·ρⁿη,ρ, n∈Z⁺⁺ (28)

Because λ_t=ϕ(β^k) is satisfied in the definition of the learning rate, the Armijo rule may be modified so as to satisfy the characteristics of the quantized learning rate in Equation (23). First, assuming that the search direction vector is the basic quantization search direction vector and is represented as h_t custom-character (h_t)^Q=Q_ph_t^Q, Equation (29) may be obtained.

$\begin{matrix} \begin{matrix} ϕ (β^{k}) = - β^{k} α { h_{t} }^{2} \\ = - β^{k} α { {(h_{t})}^{Q} }^{2} \\ = - β^{k} α { Q_{p} h_{t}^{Q} }^{2} \\ = - β^{k} {αQ}_{p}^{2} { h_{t}^{Q} }^{2} ∵ Q_{p} \in Z, Q_{p} > 0 \\ = - (β^{k} Q_{p}) \cdot ({αQ}_{p}) { h_{t}^{Q} }^{2} \\ = - (β^{k} {ηρ}^{n}) \cdot (α {ηρ}^{n}) { h_{t}^{Q} }^{2} \\ = - (β^{k} ρ^{n}) \cdot (α η^{2} ρ^{n}) { h_{t}^{Q} }^{2} \end{matrix} & (29) \end{matrix}$

Equation (29) may be solved as shown in Equation (30).

$\begin{matrix} ϕ ({\overline{β}}^{\overline{k}}) \equiv - ({\overline{β}}^{\overline{k}}) \cdot \overline{α} { h_{t}^{Q} }^{2}, 0 < {\overline{β}}^{\max \overline{k}} \leq \frac{Q_{p}}{η}, 0 < \overline{α} \leq η \cdot Q_{p} & (30) \end{matrix}$

Because α and β are arbitrary values satisfying α,β∈R(0,1), an integer that is equal to or less than η is taken (that is, ζ≤η) for α in Equation (30), whereby the condition for a may be satisfied as shown in Equation (31).

$\begin{matrix} \overline{α} = ζ \cdot Q_{p} \in Z, η < \overline{α} \leq η \cdot Q_{p} ∵ α = \frac{ζ}{η} & (31) \end{matrix}$

Also, when β=ρ⁻¹is set, β may be represented as shown in Equation (32) for the fixed value n.

$\begin{matrix} {\overline{β}}^{\overline{k}} = β^{k} ρ^{n} = ρ^{- k} ρ^{n} = p^{n - k}, 0 < \overline{k} \leq n & (32) \end{matrix}$

Therefore, β and k are defined as β=ρ, k=n−k, and because max k=n is satisfied, Equation (33) may be obtained.

$\begin{matrix} {\overline{β}}^{\max \overline{k}} = ρ^{n} = \frac{Q_{p}}{η} & (33) \end{matrix}$

Therefore, when the objective function ƒ:Rⁿ→R is configured so as to satisfy ƒ:Zⁿ→Z for zⁿand when h₀is configured so as to satisfy h₀=−∇ƒ(x₀)∈Z, the learning rate λ_t^Qbased on the Armijo rule may be calculated using only the integer operations shown in Equation (34), and may then be applied to the quantized machine-learning algorithm at step S150.

$\begin{matrix} λ_{t}^{Q} \overset{Δ}{=} \underset{\overline{k} \in N}{argmax} {{\overline{β}}^{\overline{k}} \langle f (x_{t} + {\overline{β}}^{\overline{k}} h_{t}) - f (x_{t}) \leq - {\overline{β}}^{\overline{k}} \overline{α} { h_{t}^{Q} }^{2} = ϕ ({\overline{β}}^{\overline{k}})} & (34) \end{matrix}$

Also, at step S160, the quantized learning equation may be updated using the set learning rate.

FIG. 3 is a flowchart illustrating an optimization algorithm using a golden search according to an embodiment of the present invention.

Referring to FIG. 3, the process of setting the learning rate of a quantized machine-learning algorithm to which a golden search method is applied at step S10 is specifically illustrated.

The golden search method is a method for finding the optimized learning rate, like the Armijo rule and a learning rate satisfying Equation 35 may be set.

$\begin{matrix} \min_{λ \geq 0} {f (x_{i} + λ h_{i})} \Rightarrow Let ϕ (λ) = f (x_{i} + λ h_{i}) - f (x_{i}) \Rightarrow \min_{λ} ϕ (λ) & (35) \end{matrix}$

First, at step S210, a quantization parameter may be set.

Also, at step S220, a search range for applying the golden search method may be set.

Also, at step S230, an initial condition for applying the golden search method may be set.

Here, at step S230, when quantization is not applied, a₀=0 and b₀=1 may be set and h_t=Q_ph_t^Qmay be set.

Also, at step S240, the golden search method may be applied.

Here, at step S240, the i-th search range 1, may be defined as shown in Equation (36).

l
_i
custom-character
b
_i
−a
_i (36)

Here, a_iand b_imay be the minimum candidate value and the maximum candidate value for the learning rate A.

Here, at step S240, when the value of the search range 1, is made to approach 0 by increasing the minimum candidate value by a golden ratio and decreasing the maximum candidate value by the golden ratio using the golden search method, one of the minimum candidate value and the maximum candidate value may be set as the learning rate.

The golden ratio used for the golden search may be F₀=0.618.

Also, at steps S250 to S280, the minimum candidate value and the maximum candidate value may be updated.

Here, the update of the minimum candidate value and the maximum candidate value may be represented as shown in Equation (37).

a′
_i
=a
_i+(1−F₀)l_i, b′_i=b_i−(1−F₀)l_i (37)

Because a_iand b_iare set to a_i,b_i∈R(0,1) in Equation (37), quantization thereof is required.

Using Equations (24) and (25), the quantized learning equation shown in Equation (38) may be calculated.

x
_t+1
^Q
=x
_t
^Q−(λ_th_t)^Q=x_t^Q−(λ_tQ_ph_t^Q)^Q=x_t^Q−(λ_tQ_p)^Qh_t^Q (38)

In Equation (38), because (λ_tQ_p)^Qis a value that falls within the range of z(0,Q_p), when λ_t^Q custom-character (λ_tQ_p) is set, Equation (39) may be obtained.

a
_i
^Q≡(a_iQ_p)^Q, b_i≡(b_iQ_p)^Q, a_i,b_i∈Z(0,Q_p) (39)

Equation (39) may be solved as a quantized operation using the golden search method. To this end, F=1−F₀is set first, and the equation for the minimum candidate in Equation (37) is multiplied by Q_p, whereby Equation (40) may be obtained.

a′
_i
Q
_p
=a
_i
Q
_p
+Q
_p(1−F₀)l_i

a′
_i
Q
_p
=a
_i
Q
_p
+Q
_p

Fl
_i

(a′_iQ_p)^Q=(a_iQ_p+Q_pFl_i)^Q (40)

The error resulting from quantization is taken into consideration, and because l_i=b_i^Q−a_i^Q∈Z is satisfied as the result of quantization, Equation (41) may be obtained.

a′
_i
^Q=(a_iQ_p)^Q+(Q_pF)^Ql_i+O(ε) (41)

Because this is the operation for setting the learning rate, when the quantization error is ignored, Equations (39) and (41) may be solved as shown in Equation (42).

a′
_i
^Q
=a
_i
^Q+(Q_pF)^Ql_i, b′_i^Q=b_i^Q−(Q_pF)^Ql_i∀i∈N, a_i^Q,b_i^Q∈Z(0,Q_p) (42)

Because the learning rate set through the above process satisfies λ_t^Q^L∈Z(0,Q_p), when this is applied to the quantization unit search vector h_t^Q, the following quantized learning equation may be obtained.

x
_t+1
^Q
=x
_t
^Q−λ_t^Q^Lh_t^Q (43)

FIG. 4 is a flowchart illustrating an optimization algorithm using a compensated search vector according to an embodiment of the present invention.

Referring to FIG. 4, the process of compensating for the search performance of a quantized machine-learning algorithm at step S20 is specifically illustrated.

Here, at step S20, a point that makes an objective function smaller in a different direction other than a search direction vector is searched for, whereby optimization performance degradation may be overcome.

First, at step S310, a quantization parameter may be set.

Also, at step S320, an initial parameter may be set.

Also, at step S330, a search direction vector may be calculated.

Also, at step S340, when the error of the search direction vector is equal to or less than a preset value, the optimization of the quantized machine-learning algorithm is terminated, but when the error is greater than the preset value, a learning rate may be set at step S350.

At step S350, the learning rate may be set using the Armijo rule and the golden search method, as described with reference to FIG. 2 and FIG. 3.

Also, at step S360, whether the quantized learning equation using the set learning rate reaches a local minimum point is determined. When the quantized learning equation is determined not to reach the local minimum point, the search direction vector may be compensated for at step S370.

Here, at step S360, with regard to performance degradation, it may be checked whether the optimization parameter x, cannot converge near the local minimum point because the appropriate learning rate enabling arrival at the local minimum point cannot be applied due to quantization or whether a better minimum point cannot be found.

For the search direction vector h_t∈Rⁿ, the quantized vector (h_t)^Q=Q_ph_t^Q∈Rⁿmay be represented as shown in Equation (44).

$\begin{matrix} {(h_{t})}^{Q} = \sum_{i = 0}^{n - 1} v_{i} e_{i}, \forall i \in Z [0, n - 1], v_{i} \in Z & (44) \end{matrix}$

Here, at step S370, the different direction is the direction orthogonal to the search direction. Because a vector orthogonal to the search direction vector may have various vector directions, the vector in the direction orthogonal to the direction opposite the largest component vector of the search direction vector may be selected.

In Equation (44), e_iis the unit orthogonal vector of Euclidian space Rⁿand satisfies ∀i,j∈Z[0,n), ∥e_i∥=1, e_i^Te_j=0 i≠j. Assume that the largest component, among the components {v_i} of the quantized vector (h_t)^Q, is v_m=max{∥v_i∥}, and that the index thereof is m=argmax_i{∥v_i∥}. Here, when the vector acquired by setting v_m=0 in the quantized vector (h_t)^Qis v and when the vector in which all of the components excluding v_mare 0 is 0, v and {circumflex over (v)} may be defined as shown in Equation (45).

$\begin{matrix} \overline{v} = \sum_{i = 0}^{n - 1} (1 - δ (i - m)) v_{i} e_{i}, \hat{v} = \sum_{i = 0}^{n - 1} δ (i - m) v_{i} e_{i} & (45) \end{matrix}$

Therefore, the search direction vector may be divided into two orthogonal vectors v and {circumflex over (v)}, as shown in Equation (46).

$\begin{matrix} {(h_{t})}^{Q} = \overline{v} + \hat{v} = \sum_{i = 0}^{n - 1} (1 - δ (i - m) + δ (i - m)) v_{i} e_{i} = \sum_{i = 0}^{n - 1} v_{i} e_{i} & (46) \end{matrix}$

At step S370, vector z may be calculated as shown in Equation (47) in order to obtain the vector in the direction orthogonal to the largest component in the existing search direction vector (h_t)^Q.

z
_t
=v+r·{circumflex over (v)}, r∈R (47)

In Equation (47), r∈R is a proportional constant for {circumflex over (v)}, and through this value, the orthogonal vector z may be calculated. Using the orthogonality between the vector z and the vector (h_t)^Q, r may be calculated as shown in Equation (48).

$\begin{matrix} \begin{matrix} 0 = 〈 {(h_{t})}^{Q}, z 〉 = 〈 \overline{v} + \hat{v}, \overline{v} + r \hat{v} 〉 \\ = {\overline{v}}^{2} + (r + 1) 〈 \hat{v}, \overline{v} 〉 + r {\hat{v}}^{2} ∵ 〈 \hat{v}, \overline{v} 〉 = 0 \\ = {\overline{v}}^{2} + r {\hat{v}}^{2} \end{matrix} & (48) \\ ∴ r = - \frac{{\overline{v}}^{2}}{{\hat{v}}^{2}} = - \frac{{ {(h_{t})}^{Q} }^{2} - v_{m}^{2}}{v_{m}^{2}} = 1 - \frac{{ {(h_{t})}^{Q} }^{2}}{v_{m}^{2}} \end{matrix}$

However, because |r|<1 is satisfied, when this value is applied to the learning equation without change, the operation is not performed using an integer value.

Accordingly, the compensated search vector may become a general real number vector, rather than a vector configured with quantized values. Therefore, at step S370, the compensated search vector may be calculated in consideration of quantization of the proportional constant.

In Equation (47), because (h_t)^Q=Q_ph_t^Qis satisfied, when the equation is solved using v_m=Q_pv_m^Q, Equation (49) may be obtained.

$\begin{matrix} \hat{v} = \sum_{i = 0}^{n - 1} δ (i - m) v_{i} e_{i} = Q_{p} \cdot v_{m}^{Q} e_{m} & (49) \end{matrix}$

In Equation (49), when v_m^Qe_m custom-character {circumflex over (v)}^Qis set, Equation (50) may be obtained.

$\begin{matrix} \begin{matrix} z_{t} = \overline{v} + r \cdot \hat{v} + r \cdot Q_{p} {\hat{v}}^{Q} = \overline{v} + Q_{p} r \cdot {\hat{v}}^{Q} \\ = \overline{v} + Q_{p} {\hat{v}}^{Q} - Q_{p} {\hat{v}}^{Q} + Q_{p} r \cdot {\hat{v}}^{Q} \\ = (\overline{v} + \hat{v}) + Q_{p} (r - 1) {\hat{v}}^{Q} ∵ \hat{v} = Q_{p} {\hat{v}}^{Q} \end{matrix} & (50) \end{matrix}$

Based on Equations (46) and (48) and (h_t)^Q=Q_ph_t^Q, Equation (51) may be obtained.

$\begin{matrix} z_{t} = Q_{p} h_{t}^{Q} - Q_{p} \frac{{ {(h_{t})}^{Q} }^{2}}{v_{m}^{2}} {\hat{v}}^{Q} & (51) \end{matrix}$

Accordingly, when the coefficient of {circumflex over (v)}^Qis solved using Equations (11) and (25), Equation (52) may be obtained.

$\begin{matrix} Q_{p} \cdot \frac{{ {(h_{t})}^{Q} }^{2}}{v_{m}^{2}} = Q_{p} \cdot \frac{\sum_{i = 0}^{n - 1} v_{i}^{2}}{v_{m}^{2}} = \sum_{i = 0}^{n - 1} \frac{Q_{p} v_{i}^{2}}{v_{m}^{2}} & (52) \end{matrix}$

Accordingly, when the compensated search vector z_tis quantized using Equations (51) and (52), Equation (53) may be obtained.

$\begin{matrix} \begin{matrix} {(z_{t})}^{Q} = {(Q_{p} h_{t}^{Q} - Q_{p} \frac{{ {(h_{t})}^{Q} }^{2}}{v_{m}^{2}} {\hat{v}}^{Q})}^{Q} \\ = Q_{p} h_{t}^{Q} - {(\sum_{i = 0}^{n - 1} \frac{Q_{p} v_{i}^{2}}{v_{m}^{2}})}^{Q} {\hat{v}}^{Q} \\ = Q_{p} h_{t}^{Q} - ⌊ \sum_{i = 0}^{n - 1} \frac{Q_{p} v_{i}^{2}}{v_{m}^{2}} + 0.5 ⌋ {\hat{v}}^{Q} \\ = Q_{p} h_{t}^{Q} - ⌊ \sum_{i = 0}^{n - 1} {⌊ \frac{Q_{p} v_{i}^{2}}{v_{m}^{2}} ⌋ + \frac{1}{v_{m}^{2}} (Q_{p} v_{t}^{2} - v_{m}^{2} ⌊ \frac{Q_{p} v_{i}^{2}}{v_{m}^{2}} ⌋)} + 0.5 ⌋ {\hat{v}}^{Q} \end{matrix} & (53) \end{matrix}$

Therefore, Equation (53) may be solved to Equation (54) using Equation (4).

$\begin{matrix} {(z_{t})}^{Q} = Q_{p} h_{t}^{Q} - (\sum_{i = 0}^{n - 1} ⌊ \frac{Q_{p} v_{i}^{2}}{v_{m}^{2}} ⌋ + ⌊ \sum_{i = 0}^{n - 1} \frac{1}{v_{m}^{2}} (Q_{p} v_{t}^{2} - v_{m}^{2} ⌊ \frac{Q_{p} v_{i}^{2}}{v_{m}^{2}} ⌋ + 0.5 ⌋) {\hat{v}}^{Q} & (54) \end{matrix}$

In Equation (54),

$⌊ \frac{Q_{p} v_{i}^{2}}{v_{m}^{2}} ⌋$

and

$Q_{p} v_{i}^{2} - v_{m}^{2} ⌊ \frac{Q_{p} v_{i}^{2}}{v_{m}^{2}} ⌋$

are the quotient and remainder of

$\frac{Q_{p} v_{i}^{2}}{v_{m}^{2}} .$

The part corresponding to the remainder may be simplified as shown in Equation (55).

$\begin{matrix} Rem (\frac{Q_{p} v_{i}^{2}}{v_{m}^{2}}) = \frac{1}{v_{m}^{2}} (Q_{p} v_{i}^{2} - v_{m}^{2} ⌊ \frac{Q_{p} v_{i}^{2}}{v_{m}^{2}} ⌋) & (55) \end{matrix}$

The quantized orthogonal compensation search vector may be represented as shown in Equation (56).

$\begin{matrix} {(z_{t})}^{Q} = Q_{p} h_{t}^{Q} - (\sum_{i = 0}^{n - 1} ⌊ \frac{Q_{p} v_{i}^{2}}{v_{m}^{2}} ⌋ + [Rem (\frac{Q_{p} v_{i}^{2}}{v_{m}^{2}})]) {\hat{v}}^{Q} & (56) \end{matrix}$

That is, at step S370, the quantized compensation search vector z_t, which is orthogonal to the search direction vector, may be calculated.

Here, at step S370, when the quantized learning equation does not escape from a local minimum point using the quantized orthogonal compensation search vector, the quantized learning equation may be made to escape from the local minimum point using the quantized compensation search vector.

Because the quantized compensation search vector has a scale of a quantization coefficient Q_p, this may be calculated as the basic quantization compensation search vector, as shown in Equation (57).

$\begin{matrix} z_{t}^{Q} \overset{Δ}{=} \frac{1}{Q_{p}} {(z_{t})}^{Q} & (57) \end{matrix}$

Also, at step S380, the quantized learning equation may be calculated.

That is, at step S380, based on the quantized compensation search vector defined in Equation (57), the quantized compensation search vector is multiplied by the quantized learning rate calculated using the quantized Armijo rule or the quantized line search algorithm, whereby the quantized learning equation shown in Equation (58) may be calculated.

The quantized learning equation shown in Equation (58) may be easily combined with the existing machine-learning or nonlinear algorithm.

x
_t+1
^Q
=x
_t
^Q−λ_t^Qh_t^Q, h_t^Q=z_t^Q (58)

FIG. 5 is a view illustrating a computer system according to an embodiment of the present invention.

Referring to FIG. 5, the apparatus for optimizing a quantized machine-learning algorithm according to an embodiment of the present invention may be implemented in a computer system 1100 including a computer-readable recording medium. As illustrated in FIG. 5, the computer system 1100 may include one or more processors 1110, memory 1130, a user-interface input device 1140, a user-interface output device 1150, and storage 1160, which communicate with each other via a bus 1120. Also, the computer system 1100 may further include a network interface 1170 connected to a network 1180. The processor 1110 may be a central processing unit or a semiconductor device for executing processing instructions stored in the memory 1130 or the storage 1160. The memory 1130 and the storage 1160 may be any of various types of volatile or nonvolatile storage media. For example, the memory may include ROM 1131 or RAM 1132.

The apparatus for optimizing a quantized machine-learning algorithm according to an embodiment of the present invention includes one or more processors 1110 and executable memory 1130 for storing at least one program executed by the one or more processors 1110. The at least one program may set the learning rate of the quantized machine-learning algorithm using at least one of an Armijo rule and golden search methods, calculate a quantized orthogonal compensation search vector from the search direction vector of the quantized machine-learning algorithm, compensate for the search performance of the quantized machine-learning algorithm using the quantized orthogonal compensation search vector, and calculate an optimized quantized machine-learning algorithm using the learning rate and the quantized machine-learning algorithm, the search performance of which is compensated for.

Here, the at least one program may set any one of a first candidate value, which is acquired by increasing the minimum candidate value of the learning rate by a golden ratio, and a second candidate value, which is acquired by decreasing the maximum candidate value of the learning rate by the golden ratio, as the learning rate.

Here, when the difference value between the first candidate value and the second candidate value is equal to or less than a preset value, the at least one program may set any one of the first candidate value and the second candidate value as the learning rate.

Here, the at least one program may select a vector in a direction orthogonal to the direction opposite the largest component vector of the search direction vector and quantize the selected vector, thereby calculating the quantized orthogonal compensation search vector.

The present invention may implement an optimization algorithm capable of minimizing a quantization error in machine-learning and nonlinear-signal-processing fields using quantization and exhibiting excellent performance on lightweight hardware.

Also, the present invention may implement a machine-learning algorithm capable of providing sufficient optimization performance even on low-performance hardware.

As described above, the apparatus and method for optimizing a quantized machine-learning algorithm according to the present invention are not limitedly applied to the configurations and operations of the above-described embodiments, but all or some of the embodiments may be selectively combined and configured, so that the embodiments may be modified in various ways.

Number	Date	Country	Kind
10-2019-0138793	Nov 2019	KR	national
10-2020-0046724	Apr 2020	KR	national

APPARATUS AND METHOD FOR OPTIMIZING QUANTIZED MACHINE-LEARNING ALGORITHM

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (2)