The present invention relates to a technique for solving an optimization problem.
Commonly, an optimization problem involves computation to find a solution that minimizes the value of a function. When there are solutions and it is desired to find one that has a good structure, terms that impose constraints or regularization on the function to be minimized are added, and the solution that minimizes the sum of the two terms is computed. For example, ridge regression and sparse logistic regression, which are often used in statistics, solve an optimization problem of the sum of two terms. The Douglas-Rachford method is known as a method of computing a solution that minimizes the sum of two terms (NPL 1).
When there are two structures postulated in a solution, a minimization problem minimizing the sum of three terms is solved. Such optimization problems under multiple structures arise in support-vector machines, compressed sensing, estimation of sparse covariance matrices, and so on. Several methods have been proposed for solving optimization problems under multiple structures (NPL 2 to 4).
The method of NPL 1 is a method of determining a solution of an optimization function expressed by the sum of two terms. Although it is useful even when the optimization function is ill-conditioned, it is not possible to find a solution for an optimization problem under multiple structures where there are two structures postulated in the solution. The methods of NPL 2 to 4 can deal with optimization problems under multiple structures. One problem, however, is that, when the function to be minimized is ill-conditioned, it takes a long time to obtain a solution.
In view of the problems described above, an object of the present invention is to provide a technique whereby a solution of an optimization problem under multiple structures can be obtained at high speed even when a function to be minimized is ill-conditioned.
To solve the problems described above, one aspect of the present invention relates to a computing device that computes an optimal solution of an optimization function f+g+h represented by a sum of three functions f, g, and h, including: a first computing unit that computes a proximal point of a function F+h representing the optimization function f+g+h, the function F+h being a sum of a function F=f+g represented by a sum of two functions f and g and a function h; a second computing unit that computes an approximate proximal point of the function F; and a convergence determination unit that determines whether or not a predetermined termination condition is satisfied based on a proximal point computed by the first computing unit and an approximate proximal point computed by the second computing unit, and causing the first computing unit and the second computing unit to repeatedly compute the proximal point and the approximate proximal point until the predetermined termination condition is satisfied.
According to the present invention, a solution of an optimization problem under multiple structures can be obtained at high speed even when a function to be minimized is ill-conditioned.
The following embodiment discloses a computing device that calculates an optimal solution of an optimal problem under multiple structures. More particularly, the computing device according to the following embodiment computes an optimal solution of an optimization problem defined by three functions
ƒ:n
,g,h:
d→
∪{∞} [Formula 1]
and a matrix
A∈
n×d [Formula 2]
where the optimization problem being
With the computing device according to the following embodiment, an optimal solution can be obtained at high speed even when the function to be minimized f(Ax)+g(x)+h(x) is ill-conditioned.
First, the computing device according to one embodiment of the present invention will be described with reference to
As illustrated in
The memory unit 110 stores parameters that specify a target optimization problem. Specifically, the memory unit 110 stores three functions that configure an optimization function,
f:
n
→
,g,h:
d→∪{∞} [Formula 4]
a matrix,
A∈
n×d [Formula 5]
and parameters
γ∈>0 [Formula 6]
to be used in a computing process to be described later. Here, γ is a positive real number and may be set as suited. For example, γ may be 1 (γ=1). The respective functions, matrix, parameters and others are input from outside in advance and stored in the memory unit 110.
Function f, of the three functions f, g, and h given above is the function to be minimized. Functions g and h are functions that impose constraints and regularization on the function f to be minimized, i.e., functions that represent structures postulated in the solution. The function that is the object of optimization is expressed as follows:
The initialization unit 120 sets the value of a first point z1 of a point sequence {zt} (t being an index that represents the number of repetitions) to be used for the computation of a proximal point in the process that follows. z1 is a real d-dimension vector. The initialization unit 120 sets the value of each element of vector z1 to a suitable real number. The initialization unit 120 sets the number of repetitions t to 1 (t=1).
The first computing unit 130 computes a proximal point proxγh(zt) of zt relating to the function h. More specifically, the first computing unit 130 substitutes F(x) for f(Ax)+g(x),
F(x):=f(Ax)+g(x) [Formula 8]
taking Expression (1), which is the function that is the object of minimization, as the sum of two functions F(x) and h(x),
F(x)+h(x) [Formula 9]
obtains a proximal point proxγh(zt) by the Douglas-Rachford method, and sets it as xt.
The second computing unit 140 computes point ut (here, ut=2xt−zt) using the proximal point xt determined by the first computing unit 130, and computes an approximate proximal point yt of the ut relating to the function F(x) above, i.e., point yt approximate to the proximal point proxγF(ut). For this computation, the second computing unit 140 in this embodiment uses a primal-dual method. The process of the primal-dual method will be described later in detail.
The convergence determination unit 150 computes a next point zt+1 (here, zt+1=zt+yt−xt) using xt determined by the first computing unit 130, yt determined by the second computing unit 140, and the current zt, terminates the process if a predetermined termination condition is satisfied, and outputs the solution xt. If the predetermined termination condition is not satisfied, the convergence determination unit 150 increments t by 1 to cause the first computing unit 130 to repeat the computation of the proximal point. For example, a predefined evaluation function representing the accuracy of the current solution xt having reached a preset threshold, or the number of repetitions t having reached a preset threshold may be used as the termination condition. An evaluation function reaching a preset threshold may include, for example, an amount of decrease in training errors f(xt−1)−f(xt) being smaller than a predefined threshold, an amount of decrease in validation errors being smaller than a predefined threshold, and the minimum value of the validation error calculated from the solutions x1, . . . , xt being not renewed for a period of a preset number of iterations.
The computing device 100 may typically be realized by a computing device such as a server, and may be made up of drive devices mutually connected via a bus B, an auxiliary memory device, a memory device, a processor, an interface device, and a communication device, for example. Various computer programs including the programs that implement various functions and processes in the computing device 100 may be provided by a recording medium such as a CD-ROM (Compact Disk-Read Only Memory), DVD (Digital Versatile Disk), a flash memory, and the like. The program may be installed from the recording medium to the auxiliary memory device via the drive device when the recording medium storing the program therein is set in the drive device. Note, the program need not necessarily be installed from a recording medium, and may be downloaded from any external device via a network or the like. The auxiliary memory device stores the installed program, as well as necessary files and data. Upon receiving a program launch instruction, the memory device reads out the program and data from the auxiliary memory device and stores the same. The processor executes the various functions and processes of the computing device 100 described above in accordance with the program stored in the memory device and various data such as parameters necessary for executing the program. The interface device is used as a communication interface for connection with a network or an external device. The communication device executes various communication processes for communications with a network such as Internet.
It should be noted that the computing device 100 is not limited to the hardware structure described above and may be implemented by any other suitable hardware configurations.
Next, the optimal solution computing process according to one embodiment of the present invention will be described with reference to
At step S101, the memory unit 110 stores the three functions f, g, and h, matrix A, and parameter γ that configure the optimization function input to the computing device 100.
At step S102, the initialization unit 120 sets the index t of the point sequence {zt} to 1 (t=1), and initializes z1 as zero vector.
At step S103, the first computing unit 130 computes the proximal point proxγh(zt) of zt relating to the function h by the Douglas-Rachford method, and assigns it to xt.
At step S104, the second computing unit 140 computes the approximate proximal point of ut relating to f+g that is the sum of the functions f and g by the primal-dual method, and assigns it to yt.
At step S105, the convergence determination unit 150 computes zt+yt−xt and assigns the value to zt+1.
At step S106, the convergence determination unit 150 determines whether or not a predetermined termination condition is satisfied, and if the termination condition is satisfied (S106: Yes), the process goes to step S107, where the computing device 100 outputs the solution xt. On the other hand, if the termination condition is not satisfied (S106: No), the convergence determination unit 150 increments the index t by 1, and the process returns to step S103 and steps S103 to S106 described above are repeated.
Next, the process of the primal-dual method at step S104 according to one embodiment of the present invention will be described in detail with reference to
As illustrated in
βt←(1−θ)βt−1+θ∇f(Ayt−1) [Formula 10]
βt using yt−1 and βt−1, and
y
t←proxγg(ut−γATβt) [Formula 11]
initializes yt using the initialized βt. Here, ∇f represents the gradient of the function f, and θ∈(0, 1) represents parameters defined by backtracking.
At step S202, the second computing unit 140 renews βt by:
βt←(1−θ)βt+θ∇f(Ayt). [Formula 12]
At step S203, the second computing unit 140 renews yt by:
y
t←proxγg(ut−γATβt) [Formula 13]
At step S204, the second computing unit 140 computes a primal-dual gap G(yt, βt) by:
G(y,β)=f(Ay)+f*(β)−<Ay,β>. [Formula 14]
Here, f* represents a convex conjugate function of the function f, and the symbol <φ,⋅> represents a standard inner product in the Euclidean space.
At step S205, the second computing unit 140 terminates the process if the current (yt, βt) satisfies the following termination condition based on the primal-dual gap (S205: Yes),
and gives the current yt to the convergence determination unit 150. On the other hand, if the condition is not satisfied (S205: No), the second computing unit 140 increments the index t by 1, and the process returns to the step S202 of renewing βt. This way, the second computing unit 140 renews yt and βt repeatedly until the predetermined termination condition is satisfied, i.e., until the primal-dual gap becomes equal to or lower than a preset error.
Next, the results of numerical experiments according to the present invention and prior art will be described with reference to
An optimization problem of a kernel support vector machine was solved by various methods using six real datasets shown in
While one embodiment of the present invention has been described in detail above, the present invention is not limited to the specific embodiment described above and various modifications and alterations are possible within the scope of the subject matter of the present invention set forth in the claims.
Number | Date | Country | Kind |
---|---|---|---|
2018-087056 | Apr 2018 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/014460 | 4/1/2019 | WO | 00 |