Optimization method and optimization apparatus

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to solving a constrained optimization problem.

2. Description of the Related Art

Recently, there has been a demand for solving constrained optimization problems in various industrial fields. A constrained optimization problem is defined as a problem to be solved by finding a point at which the value of an objective function is maximized or minimized within a subset of points subject to constraints. In other words, solving a constrained optimization problem is to formulate the problem and select the optimal solution from among possible solutions.

A constrained optimization problem may be applied to automated design, such as determining design parameters for obtaining a desired specification. More specifically, a constrained optimization problem may applied to finding a molecular structure having a minimum level of energy, determining the interface shape of two types of liquids, or a liquid and a gas, held in a container at a constant volume, or determining optimal lens shape, lens spacing, and lens material for obtaining a desired optical characteristic.

A typical constrained optimization problem of minimizing the value of a real function E(r)=0 subject to the constraint S(r)=0 will be described below. Here, r is an n-dimensional real vector
$r = (\begin{matrix} r_{1} \\ r_{2} \\ ⋮ \\ r_{n} \end{matrix})$

The problem of maximizing the real function can be solved by inverting the positive and negative signs of the same objective function.

In general, an algorithm for solving an unconstrained optimization problem finds a point r at which a function E(r) is minimized, starting from an initial vector r₍₀₎, according to data related to a gradient vector T=−∇E(r).

However, the solutions of a constrained optimization problem cannot be obtained by such a simple algorithm. Therefore, various methods for solving a constrained optimization problem have been developed. Some typical methods include the penalty method and the multiplier method (the method of Lagrangian undetermined multipliers). These methods are described in Chapter 9 of “Shinban suchikeisan hando-bukku (New Edition of Numerical Calculation Handbook)” (Tokyo: Ohmsha, 1990) edited by Yutaka Ohno and Kazuo Isoda.

The penalty method adds a penalty function P(r) to the objective function E(r) so that the value of the function becomes 0 when the constraint holds and becomes significantly large when the constraint is violated. In other words, in the penalty method, a newly introduced objective function Ω(r)=E(r)+P(x) is minimized. A typical penalty function may be P(r)=ωS(r)², where ω is a positive number that is adjusted to a suitable value in the process of finding the local minimum.

The multiplier method makes a newly introduced objective function Ω(r, λ)=E(r)+λS(r) stationary by introducing an undetermined multiplier λ. The objective function Ω(r, λ)=E(r)+λS(r) is subject to the constraint
$\frac{\partial Ω (r, λ)}{\partial λ} = 0$

which is a stationary condition of Ω(r) with respect to λ.

However, these known methods for solving a constrained optimization have the following problems. For the penalty method, selecting the penalty function becomes a problem. In the case of P(r)=ωS(r)², unless the value of ω is adjusted satisfactorily, the results of the computation do not converge to the correct value.

For the method of Lagrangian undetermined multipliers, although a local minimum of E(r) is to be found, in actuality, in general, a saddle point of Ω(r) is obtained instead of the local minimum of Ω(r). Various algorithms for finding a saddle point have been proposed, but they are not as easy as the algorithms for finding local minima.

SUMMARY OF THE INVENTION

The present invention provides a method for easily solving a constrained optimization problem.

The present invention provides a method for converging the solution of the constrained optimization problem to a correct value.

According to one aspect of the present invention, a constrained optimization method includes steps of linearly approximating a point of interest to a first hypersurface given as a constraint and finding the extreme value of an objective function by moving a point of interest along a curved line having second-order osculation with the plane tangential to a second hypersurface associated with the first hypersurface, the second hypersurface passing through the point of interest.

According to another aspect of the present invention, a constrained optimization apparatus includes a linearly approximating unit for approximating a point of interest to a first hypersurface given as a constraint and a finding unit for finding the extreme value of an objective function by moving the point of interest along a curved line having second-order osculation with a plane tangential to a second hypersurface associated with the first hypersurface, the second hypersurface passing through the point of interest.

According to another aspect of the present invention, a computer-readable program controls a computer to perform optimization and includes code for causing the computer to perform steps of linearly approximating a point of interest to a first hypersurface given as a constraint and finding the extreme value of an objective function by moving the point of interest along a curved line having second-order osculation with a plane tangential to a second hypersurface associated with the first hypersurface, the second hypersurface passing through the point of interest.

Further features and advantages of the present invention will become apparent from the following description of example embodiments (with reference to the attached drawings).

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flow chart illustrating an optimization algorithm according to a first embodiment of the present invention.

FIGS. 2A and 2B are flow charts illustrating an optimization algorithm according to a second embodiment of the present invention.

FIGS. 3A and 3B are flow charts illustrating an optimization algorithm according to a third embodiment of the present invention.

FIG. 4 illustrates the system configuration of a computer system for employing the embodiments of the present invention.

FIG. 5 is a functional block diagram of a computer system for employing the embodiments of the present invention.

FIG. 6 shows the calculation results according to a first example application.

FIG. 7 shows the calculation results according to a third example application.

FIG. 8 shows the calculation results according to a fourth example application.

FIG. 9 shows the calculation results according to a fifth example application.

FIG. 10 shows the calculation results according to a sixth example application.

DESCRIPTION OF THE EMBODIMENTS

Example embodiments of the present invention will now be described in detail with reference to the drawings.

First Embodiment

The method for optimization according to this embodiment includes two steps: linearly approaching the hypersurface S(r)=0 and finding an extreme value along a curved line having second-order osculation in the projection direction of the gradient vector of the plane tangential to the hypersurface S(r)=c. More specifically, the method includes the steps illustrated in the flow chart of FIG. 1. Each step is described below.

Step S11: Setting the initial value for a point r: The initial value of a point r is set as r₀by r:=r₀. In this document, “:=” is a symbol indicating that the character(s) written on the right of this symbol (in the case where a formula is written, the value obtained by the formula) is set to the variable (a data item whose value can be changed in a computer program) written on the left of this symbol.

Step S12: Linearly approximating the hypersurface S(r)=0:

A straight line in the gradient direction of the hypersurface S(r)=0 is represented as r(t)=r−t∇S(r), where t is a parameter. Since S(r−t∇S(r))=S(r)−t<∇S(r), ∇S(r)>+o(t) in the proximity of the point r, the point r approximates the hypersurface S(r)=0 by setting t and r as
$t := \frac{S (r)}{{\langle \nabla S (r) \rangle}^{2}},$

and

r:=r−t∇S(r).

Here, “<, >” represents the inner product in a vector space and “∥” represents the norm derived from the inner product.

Step S13: Calculating the unit normal vector: The unit normal vector n orthogonal to the hypersurface S(r)=c passing through the point r is calculated by
$n := \frac{\nabla S (r)}{\langle \nabla S (r) \rangle} .$

Step S14: Projecting the gradient vector −∇E(r) of the real function E(r) onto the tangent plane:

The direction G that is a projection direction of the gradient vector −∇E(r) of the real function E(r) orthogonal to the plane tangential to the hypersurface S(r)=c is calculated by G:=−∇E(r)+<n, ∇E(r)+>n.

Step S15: Determining convergence:

If the magnitude of G is smaller than ε, the real function E(r) is assumed to have been minimized, and the process is ended.

Step S16: Calculating the geodesic line:

The geodesic line that passes through the point r on the hypersurface S(r)=c and is directed in the direction G is represented as r(t)=r+tG+t²λn+o(t²), where t is a parameter and λ is a constant. Here, λ is determined as shown below. In the proximity of the point r,

S(r+ε)=S(r)+<ε, ∇S(r)>+1/2<ε, Jε>+o(ε²)

Therefore,
$S (r + tG + t^{2} λ n) = S (r) + 〈 tG + t^{2} λ n, \nabla S (r) 〉 + 1 / 2 〈 tG, JtG 〉 + o (t^{2}) = S (r) + t^{2} (λ 〈 n, \nabla S (r) 〉 + 1 / 2 〈 G, JG 〉) + o (t^{2}) .$

In the equations above, <G, ∇S(r)>=0. Here, J is a matrix having ij components represented by
$J_{ij} = \frac{\partial^{2} S (r)}{\partial r_{1} \partial r_{j}} .$

If the value of λ is determined so that the coefficient of t²equals 0, then
$λ = - \frac{1}{2} \cdot \frac{〈 G, JG 〉}{〈 n, \nabla S (r) 〉} .$

Step S17: Updating the point r along the quadratic curve:

For the real function E(r), the value along the quadratic curve r(t)=r+tG+t²λn having second-order osculation with the geodesic line passing through the point r and directed in the direction G is
$E (r (t)) = E (r) + 〈 tG + t^{2} λ n, \nabla E (r) 〉 + t^{2} / 2 〈 G, KG 〉 + o (t^{2}) = E (r) + tb + t^{2} a + o (r^{2}) .$

Here, K is a matrix having ij components represented by
$K_{ij} = \frac{\partial^{2} E (r)}{\partial r_{i} \partial r_{j}},$

and a and b are calculated as

a=λ<n, ∇E(r)>+1/2<G, KG>, and
b=<G, ∇E(r)>.

Therefore, if a>0, the value of the real function E(r(t)) is minimized at t=−b/2a. The point r is preferably be updated by approximating the minimal value so that the value of t does not become too large. In other words,

S:=−b/2a
$t := {\begin{matrix} s & if s \leq d / \langle G \rangle \\ d / \langle G \rangle & otherwise \end{matrix},$
r:=r+tG+t²λn.

Here, d is the upper limit of t|G|.

On the other hand, if a<0, the real function E(r(t)) does not have a local minimum. Therefore, the point r can be moved in the direction that reduces the value of the function E(r(t)) so that the point r is not moved a large distance. In other words, t:=d/|G| and r:=r+tG+t²λn.

Since b≦0, the cases described above can be combined to derived the following computational expressions:

s:=−b/2a $\begin{matrix} s := - b / 2 a \\ t := {\begin{matrix} s \\ d / \langle G \rangle \end{matrix} \begin{matrix} if 0 \leq s \leq d / \langle G \rangle \\ otherwise \end{matrix}, and \\ r := r + tG + t^{2} λ n . \end{matrix}$

if 0≦s≦d/|G|, and otherwise

r:=r+tG+t²λn.

Step S18: The process returns to Step S12.

Second Embodiment

The method for optimization according to this embodiment includes two steps: linearly approximating the hypersurface S(r) and finding extreme values along curved lines having second-order osculation to the plane tangential to the hypersurface S(r)=c in the projection direction of the unit vectors of each axial direction of the coordinate system. More specifically, the method includes the steps illustrated in the flow chart of FIG. 2. Each step is described below.

Step S21: Setting the initial value of the point r: Each component of r₍₀₎is r_1(o), r₂₍₀₎, . . . , r_n(0). These initial values r₁₍₀₎, r₂₍₀₎, . . . , r_n(0)are set in the variables r₁, r₂, . . . , r_nwhereas
$r = (\begin{matrix} r_{1} \\ r_{2} \\ ⋮ \\ r_{n} \end{matrix}), r_{1}, r_{2}, \dots, r_{n} \in R .$

r₁, r₂, . . . , r_n∈R.

Step S22: Linearly approximating the hypersurface S(r)=0:

A straight line in the gradient direction of the hypersurface S(r)=0 is represented as r(t)=r−t∇S(r), where t is a parameter. Since S(r−t∇S(r))=S(r)−t<∇S(r), ∇S(r)>+o(t) in the proximity of the point r, the point r approximates the hypersurface S(r)=0 by setting S, t, and r as

S(r−t∇S(r))≈0 $t := \frac{S (r)}{{\langle \nabla S (r) \rangle}^{2}},$

and

r:=r−t∇S(r)

Here, “<, >” represents the inner product in a vector space and “∥” represents the norm derived from the inner product.

Step S23: Calculating a unit normal vector:

The unit normal vector n orthogonal to the hypersurface S(r)=c passing through the point r is calculated by
$n := \frac{\nabla S (r)}{\langle \nabla S (r) \rangle} .$

Then, Steps 24 to 26 are carried out in sequence for each direction i=1, . . . , n.

Step 24: Projecting the unit vector e_iof direction i onto the tangent plane:

If G_iis the vector of the unit vector e_iof a direction i projected onto the tangent plane, then G_i:=e_i−<n, e_i>n.

Step 25: Determining an unknown parameter of the updated amount of the point r:

The updated amount of the point r is t_iG_i+t_i²λ_in, where t is a parameter and G_iis the projection vector of the unit vector projected in the direction i onto the plane tangential to hypersurface S(r)=c. Here, if the updated point r is represented as r(t_i), then r(t_i)=r+t_iG_i+t_i²λ_in+o(t_i²), where λ_iis a constant determined as shown below. In the proximity of the point r,
$\begin{matrix} S (r + t_{i} G_{i} + t_{i}^{2} λ_{i} n) = S (r) + < t_{i} G_{i} + t_{i}^{2} λ_{i} n, \nabla S (r) > + 1 / 2 < t_{i} G_{i}, \\ {Jt}_{i} G_{i} > + o (t_{i}^{2}) \\ = S (r) + t_{i}^{2} (λ_{i} n, \nabla S (r) > + 1 / 2 < G_{i}, {JG}_{i} > + o (t_{i}^{2}) . \end{matrix}$

In the equations above, <G_i, ∇S(r)>=0. Here, J is a matrix having ij components represented by
$J_{ij} = \frac{\partial^{2} S (r)}{\partial r_{i} \partial r_{j}} .$

If the value of λ_iis determined so that the coefficient of t_i²equals 0, then
$λ_{i} := - \frac{1}{2} \frac{< G_{i}, {JG}_{i} >}{< n, \nabla S (r) >} .$

Step 26: Updating the point r(i=1, . . . , n):

For the real function E(r), the value along the quadratic curve r(t_i)=r+t_iG_i+t_i²λ_in+o(t_i²) having second-order osculation and passing through the point r is
$\begin{matrix} E (r + t_{i} G_{i} + t_{i}^{2} λ_{i} n) = E (r) + < t_{i} G_{i} + t_{i}^{2} λ_{i} n, \nabla E (r) > + t_{i}^{2} / 2 < G_{i}, \\ {KG}_{i} > + o (t_{i}^{2}) \\ = E (r) + t_{i} b + t_{i}^{2} a + o (t_{i}^{2}) . \end{matrix}$

Here, K is a matrix having ij components represented by
$K_{ij} = \frac{\partial^{2} E (r)}{\partial r_{i} \partial r_{j}}$

and a and b are calculated as

a=λ_i<n, ∇E(r)>+1/2<G_i, KG_i>
b=<G_i, ∇E(r)>.

Therefore, if a>0, the value of the real function E(r(t_i)) is minimized at t_i=−b/2a. The point r is preferably updated by approximating the minimal value so that the value of t_idoes not become too large. In other words,

S:=−b/2a $\begin{matrix} S := - b / 2 a \\ t_{i} := {\begin{matrix} s \\ d_{i} / \langle G \rangle \end{matrix} \begin{matrix} if \langle s \rangle \leq d_{i} / \langle G_{i} \rangle \\ otherwise \end{matrix} \\ r := r + t_{i} G_{i} + t_{i}^{2} λ_{i} n \end{matrix}$

if |s|≦d_i/|G_i|otherwise

r:=r+t_iG_i+t_i²λ_in

where, d_iis the upper limit of t_i|G_i|.

On the other hand, if a<0, the real function E(r(t_i)) does not have a local minimum. Therefore, the point r can be moved in the direction that reduces the value of the function E(r(t_i)) so that the point r is not moved a large distance. In other words, t_i:=d_i/|G_i| and r(t_i):=r+t_iG_i+t_i²λ_in.

Step 27: The process returns to Step 24 and processes the next i.

Step S28: Determining convergence:

If |G_i| for each direction i(i=1, . . . , n) is smaller than the convergence ε, the real function E(r(t_i)) is assumed to have been minimized, and the process is ended.

Step 29: The process returns to Step 22.

Third Embodiment

The optimization algorithm according to this embodiment includes two steps: linearly and asymptotically approximating a hypersurface S(r)=0 as a constraint and finding an extreme value in the projection direction onto a plane tangential to the hypersurface S(r)=c.

The optimization algorithm according to this embodiment is extremely effective in solving the problem of determining the interface shape of two types of liquids, or a liquid and a gas, held in a container at a constant volume.

This kind of problem is formulated into a problem of minimizing the sum of the surface energy and the gravitational energy of the liquids (or liquid and gas). When generalized, such a constrained optimization problem can be perceived as a problem of finding the local minimum of the formulated real function E(r) subject to the constraint S(r)=0.

Accordingly, the optimization algorithm according to this embodiment includes the steps illustrated in the flow chart of FIG. 3. Each step is described below.

Step S31: Setting the initial value for a point r: The initial value of a point r is set as r₀(r:=r₀).

Step S32: Linearly approximating the hypersurface S(r):

A straight line in the gradient direction of the hypersurface S(r) is represented as r(t)=r−t∇S(r), where t is a parameter. Since S(r−t∇S(r))=S(r)−t<∇S(r), ∇S(r)>+o(t) in the proximity of the point r, the point r approximates the hypersurface S(r) by setting t and r as
$t := \frac{S (r)}{{\langle \nabla S (r) \rangle}^{2}},$

and

r:=r−t∇S(r).

Here, “<, >” represents the inner product in a vector space and “∥” represents the norm derived from the inner product.

Step S33: Calculating the unit normal vector:

The unit normal vector n orthogonal to the hypersurface S(r)=c passing through the point r is calculated by
$n := \frac{\nabla S (r)}{\langle \nabla S (r) \rangle} .$

Step S34: Projecting the gradient vector −∇E(r) of the real function E(r) onto the tangent plane:

The direction G that is the projection direction of the gradient vector −∇E(r) of the real function E(r) orthogonal to the plane tangential to the hypersurface S(r)=c is calculated by G:=−∇E(r)+<r, ∇E(r)+>n.

Step 35: Geodesic line:

The geodesic line that passes through the point r on the hypersurface S(r)=c and is directed in the direction G is represented as r(t)=r+tG, where t is a parameter. Here, the value of the real function E(r) along this geodesic line is
$\begin{matrix} E (r + tG) = E (r) + 〈 tG, \nabla E (r) 〉 + t^{2} / 2 〈 G, KG 〉 + o (t^{2}) \\ = E (r) + \frac{1}{2} 〈 G, KG 〉 {(t + \frac{〈 G, \nabla E (r) 〉}{〈 G, KG 〉})}^{2} - \\ \frac{1}{2} \frac{{〈 G, \nabla E (r) 〉}^{2}}{〈 G, KG 〉} + o (t^{2}) . \end{matrix}$

Here, K is a matrix having ij components represented by
$K_{ij} = \frac{\partial^{2} E (r)}{\partial r_{i} \partial r_{j}} .$

In the equations above, if <G, KG>>0, the value of the real function E(r) is minimized at
$t := - \frac{〈 G, \nabla E (r) 〉}{〈 G, KG 〉} .$

The point r is preferably updated by approximating the minimal value so that the value of t does not become too large. In other words,
$s := - \frac{〈 G, \nabla E (r) 〉}{〈 G, KG 〉}$ $t := {\begin{matrix} s & if s \leq Lmax / \langle G \rangle \\ Lmax / \langle G \rangle & otherwise \end{matrix}$

if s≦Lmax/|G|otherwise

r:=r+tG.

On the other hand, if <G, KG><0, the real function E(r) does not have a local minimum. Therefore, the point r can be moved in the direction that reduces the value of the function E(r) so that the point r is not moved a large distance. In other words,
$t := \frac{Lmax}{\langle G \rangle}$
r:=r+tG.

Since <G, ∇E(r)>≦0, the two equations described above can be combined to derive the following expressions:
$s := - \frac{〈 G, \nabla E (r) 〉}{〈 G, KG 〉}$ $t := {\begin{matrix} s & if 0 \leq s \leq Lmax / \langle G \rangle \\ Lmax / \langle G \rangle & otherwise \end{matrix}$

if 0≦s≦Lmax/|G|otherwise

r:=r+tG

where Lmax is the upper limit of t|G|.

Step 36: The process returns to Step 32.

Step S37: Determining convergence:

If the |G| is smaller than the convergence ε, the real function E(r) is assumed to have been minimized, and the process is ended.

FIG. 4 illustrates the system configuration of a computer system adopting the method for solving a constrained optimization problem according to the embodiments of the present invention. The above-described optimization algorithm is executed in such a computer system illustrated in FIG. 4.

In FIG. 4, a multi-purpose computer 22 is equipped with a central processing unit (CPU), a read only memory (ROM), a random access memory (RAM), a floppy disk drive 26, a compact disk read only memory (CD-ROM) drive 27, and a hard disk device 28. The computer 22 is connected to a monitor 23, and an input/output device, such as a keyboard 24 and a mouse 25, allowing various data to be input to and output from the computer 22.

The program, including an optimization algorithm, for solving an optimization problem according to this embodiment is recorded on the floppy disk 29. The floppy disk 29 is loaded into the floppy disk drive 26 and then the program for solving an optimization problem stored on the floppy disk 29 is downloaded into the computer 22. The program is stored into the RAM and is executed. The program may be recorded on media other than a floppy disk, for example, a CD-ROM or a hard disk.

While executing the program for solving an optimization problem, the minimal value at each predetermined time interval is displayed on the monitor 23 so that the found minimal value can be monitored in real time by an observer.

FIG. 5 is a functional block diagram of a computer system for adopting the method for solving a constrained optimization problem according to the embodiments of the present invention.

In FIG. 5, the processing unit 30 includes a computing unit 30a, a computing-process storage unit 30b, an intermediate-data storage unit 30c, a data output unit 30d, and an input unit 30e. The input unit 30e is used to input the program for solving an optimization problem to the processing unit 30. The processing unit 30 is connected to an input device 31, a printer 32, a monitor 23, and an external storage device 33.

The processing unit 30, illustrated in FIG. 5, is installed into the computer 22, illustrated in FIG. 4. The computing unit 30a includes at least the above-mentioned CPU. The computing-process storage unit 30b is included in the RAM. The program for solving an optimization problem is read out from the external storage device 33, such as the floppy disk drive 26, the CD-ROM drive 27, or the hard disk device 28, and is stored in the RAM.

The computing unit 30a executes the program for solving an optimization problem stored in the computing-process storage unit 30b. At the same time, the computing unit 30a stores intermediate computed results and the final computed results in the intermediate-data storage unit 30c and saves the final computed results in the external storage device 33. The computing unit 30a prints out the final computed results stored in the intermediate-data storage unit 30c from the printer 32 via the data output unit 30d. The computing unit 30a also displays, in real time, the final computed results on the monitor 23 via the data output unit 30d.

The input device 31 includes the keyboard 24 and the mouse 25, as illustrated in FIG. 4. The input device 31 controls the execution of various application programs such as the program for solving an optimization problem according to the present invention. The input device 31 is also used for inputting various commands and data. The input commands and data are input to the processing unit 30 via the data input unit 30e.

First Example Application

A first example application of the embodiments of the present invention is described next.

The constrained optimization algorithm according to the first example application is based on the first embodiment of the present invention and includes two steps: linearly approximating the hypersurface S(r)=0 and finding an extreme value along a curved line having second-order osculation with the geodetic line of the hypersurface S(r)=c passing through a point r. Here, for the two-dimensional real vector
$r = (\begin{matrix} x \\ y \end{matrix}),$

x, y∈R,

the real function E(r) is to be minimized subject to the constraint condition S(r)=0 where
$S (r) = \frac{x^{2}}{a^{2}} + \frac{y^{2}}{b} - 1$
E(r)=x²+y².

This problem can be solved according to the steps described below.

First Step: Setting the initial value for a point r:

The initial values x₍₀₎and y₍₀₎are set as x:=x₍₀₎and y:=y₍₀₎.

Second Step: Linearly approximating the hypersurface S(r)=0:
$S := \frac{x^{2}}{a^{2}} + \frac{y^{2}}{b} - 1$ $S_{x} := \frac{2 x}{a^{2}}$ $S_{y} := \frac{2 y}{b^{2}}$ $t := \frac{S}{S_{x}^{2} + S_{y}^{2}}$
x:=x−tS_x
y:=y−tS_y

Third Step: The unit normal vector n of the hypersurface S(r)=c passing through the point r:
$n_{x} := \frac{S_{x}}{\sqrt{S_{x}^{2} + S_{y}^{2}}}$ $n_{y} := \frac{S_{y}}{\sqrt{S_{x}^{2} + S_{y}^{2}}}$

Fourth Step: Projecting the gradient vector of the real function E(r) to the tangent plane:

E_x:=2x
E_y:=2y
G_x=−E_x+(n_xE_x+n_yE_y)n_x
G_y:=−E_y+(n_xE_x+n_yE_y)n_y

Fifth Step: Determining convergence:

If |G|<ε, then the process is ended.

Sixth Step: Coefficient of the geodetic line equation:
$λ := - \frac{G_{x}^{2} / a^{2} + G_{y}^{2} / b^{2}}{n_{x} S_{x} + n_{y} S_{y}}$

Seventh Step: Updating the point r along the quadratic curve:

a:=λ(n_xE_x+n_yE_y)+G_x²+G_y²
b:=G_xE_x+G_y+E_y
s:=−b/2a $t := {\begin{matrix} s & if 0 \leq t \leq d / \sqrt{G_{x}^{2} + G_{y}^{2}} \\ d / \sqrt{G_{x}^{2} + G_{y}^{2}} & otherwise \end{matrix}$
x:=x+tG_x+t²λn_x
y=y+tG_y+t²λn_y

Eighth Step: The process returns to the second step.

As an example of the computation, the results of executing the above-described steps, where a=2, b=1, x₀=2.5, y₀=0.5, d=0.5, and ε=10⁻⁸are illustrated in FIG. 6.

As illustrated in FIG. 6, the values start from the initial values (2.5, 0.5) and converge to the values (0.0, 1.0) that minimize the real function E(x, y).

Second Example Application

The constrained optimization algorithm according to a second example application is similar to the first example application in that it is based on the first embodiment of the present invention and includes two steps: linearly approximating the hypersurface S(r)=0 and finding an extreme value along a curved line having second-order osculation with the geodetic line of the hypersurface S(r)=c passing through a point r. Here, p that minimizes the real function

E(ρ)=T_r[(3ρ²+2ρ³)H]

subject to the constraint condition S(ρ)=0 and the value of the real function E(ρ) are to be determined, where ρ and H are real symmetric matrices of m×n and N_eis a positive value, S(ρ) is a function defined as

S(ρ)=T_r[3ρ²−2ρ³]−N_e,

and T_ris the trace of the matrix. In this case, the problem of determining the ground state of an independent Ne electron system denoted in a Hamiltonian matrix H is formulated using a density matrix (X.-P. Li, R. W. Nues, and D. Vanderbilt, Physical Review B, Vol. 47, p. 10891, 1993).

A known method for solving this problem is the method using a chemical potential (which can be understood as a type of multiplier method) (SY Qui, C. Z. Wang, K. M. Ho, and C. T. Chan, Journal of Physics: Condensed Matter, Vol. 6, p. 9153, 1994).

If the real symmetric matrix p is a vector having m(m−1)/2 independent components and the inner product of two symmetric matrixes ρ₁and ρ₂is defined as <ρ₁, ρ₂>=T_r(ρ₁ρ₂), this problem can be solved by applying the optimization algorithm according to this embodiment through the steps described below.

First Step: Setting the initial value for the real symmetric matrix ρ:

The initial matrix ρ₍₀₎is used to set ρ:=ρ₍₀₎.

Second Step: Linearly approximating the

hypersurface S(ρ)=0:

S:=T_r[3ρ²−2ρ³]−N_e
D:=6(ρ−ρ²) $t := \frac{S}{T_{r} (D^{2})}$
ρ:=ρ−tD

Third Step: The unit normal vector (m×m real symmetric matrix) N to the hypersurface S(ρ)=c passing through the point ρ:

D:=6(ρ−ρ²)
α:=T_r(D²) $N := \frac{D}{\sqrt{α}}$

Fourth Step: Projecting the gradient vector of the real function E(r) to the tangent plane:

F=−3 (ρH−Hρ)+2 (ρ²H+ρHρ+Hρ²)
G:=F−T_r(NF)N

Fifth Step: Determining convergence:

If {square root}{square root over (T_r(G²))}<ε, then the process is ended.

Sixth Step: Coefficient of the geodetic line equation:
$λ := - \frac{3 T_{r} (G^{2}) - 6 T_{r} (ρ G^{2})}{\sqrt{α}}$

Seventh Step: Updating the point r along the quadratic curve:
$a := 3 T_{r} [G^{2} H] - 2 T_{r} [ρ G^{2} H + G ρ G H + G^{2} ρ H] + \frac{λ}{\sqrt{α}} T_{r} [D^{2} H]$
b:=−T_r[GF]
s:=−b/2a $t := {\begin{matrix} s & if 0 \leq t \leq d / \sqrt{T_{r} (G^{2})} \\ d / \sqrt{T_{r} (G^{2})} & otherwise \end{matrix}$
ρ:=ρ+tG+t²λN

Eighth Step: The process returns to the second step.

By using an algorithm that includes a step of linearly approximating the hypersurface S(r)=0 and a step of finding an extreme value along a curved line having second-order osculation with the geodetic line of the hypersurface S(r)=c passing through a point r, the solution to a constrained optimization problem can be derived through a simple calculation process.

Third Example Application

The constrained optimization algorithm according to a third example application is based on the second embodiment of the present invention and includes two steps: linearly approximating the hypersurface S(r) and finding extreme values along curved lines having second-order osculation to the plane tangential to the hypersurface S(r)=c in the projection direction of the unit vectors of each axial direction of the coordinate system. Here, for the two-dimensional real vector
$r = (\begin{matrix} x \\ y \end{matrix}),$

x, y∈R,

the real function E(r) is to be minimized subject to the constraint condition S(r)=0 where
$S (r) = \frac{x^{2}}{a^{2}} + \frac{y^{2}}{b} - 1$
E(r)=x²+y².

This problem can be solved according to the steps described below.

First Step: Setting the initial value for a point r:

The initial values x₍₀₎and y₍₀₎are set as x:=x₍₀₎and y:=y₍₀₎.

Second Step: Linearly approximating the

hypersurface S(r)=0:
$S := \frac{x^{2}}{a^{2}} + \frac{y^{2}}{b} - 1$ $S_{x} := \frac{2 x}{a^{2}}$ $S_{y} := \frac{2 y}{b^{2}}$ $t := \frac{S}{S_{x}^{2} + S_{y}^{2}}$
x:=x−tS_x
y:=y−tS_y

Fourth Step: Projecting the unit vector e_iof a direction i onto the tangent plane (i=1, 2):

- for i=1,
  
  G_1x=1−n_x²
  G_1y=−n_xn_y
- for i=2,
  
  G_2x=−n_xn_y
  G_2y=1−n_y²

Fifth Step: Determining an unknown parameter λ_i:

- for i=1,
  $λ_{1} := - \frac{b^{2} {(1 - n_{x}^{2})}^{2} + a^{2} n_{x}^{2} n_{y}^{2}}{a^{2} b^{2} (n_{x} S_{x} + n_{y} S_{y})}$
- for i=2,
  $λ_{2} := - \frac{b^{2} n_{x}^{2} n_{y}^{2} + {a^{2} (1 - n_{y}^{2})}^{2}}{a^{2} b^{2} (n_{x} S_{x} + n_{y} S_{y})}$

Sixth Step: Updating the point r(i=1, 2):

a:=1 $b : {\begin{matrix} 2 x & (if i = 1) \\ 2 y & (if i = 2) \end{matrix}$
s:=−b/2a $t_{i} := {\begin{matrix} s & if \langle s \rangle \leq d_{i} / \langle G_{i} \rangle \\ d_{i} / \langle G_{i} \rangle & otherwise \end{matrix}$

if |s|≦d_i/|G_i|otherwise

x=x+t_iG_ix+t_i²λ_in_x
y:=y+t_iG_iy+t_i²λ_in_y

Seventh Step: The process returns to the fourth step.

Eighth Step: Determining convergence:

If |G_i| for each of the directions i(i=1, 2) is smaller than the convergence ε, the real function E(r) is assumed to have been minimized, and the process is ended.

If this process is carried out, where a=2, b=1, x₀=2.5, y₀=0.5, d=0.1, and ε=10⁻⁷, the values start from the initial values (2.5, 0.5) and converge to the values (0.0, 1.0) that minimize the real function E(x, y), as illustrated in FIG. 7.

Fourth Example Application

The constrained optimization algorithm according to a fourth example application is similar to the third example application in that it is based on the second embodiment of the present invention and includes two steps: linearly approximating the hypersurface S(r) and finding extreme values along curved lines having second-order osculation to the plane tangential to the hypersurface S(r)=c in the projection direction of the unit vectors of each axial direction of the coordinate system. Here, for the three-dimensional real vector
$r = (\begin{matrix} x \\ y \\ z \end{matrix}),$

x, y, z∈R,

the real function E(r) is to be minimized subject to the constraint condition S(r)=0 where
$S (r) = \frac{x^{2}}{a^{2}} + \frac{y^{2}}{b^{2}} + \frac{z^{2}}{c^{2}} - 1$
E(r)=x²+y²+z².

This problem can be solved according to the steps described below.

First Step: Setting the initial value for a point r:

The initial values x₍₀₎y₍₀₎, and z₍₀₎are set as x:=x₍₀₎, y:=y₍₀₎and z:=z₍₀₎.

Second Step: Linearly approximating the hypersurface S(r)=0:
$S := \frac{x^{2}}{a^{2}} + \frac{y^{2}}{b^{2}} + \frac{z^{2}}{c^{2}} - 1$ $S_{x} := \frac{2 x}{a^{2}}$ $S_{y} := \frac{2 y}{b^{2}}$ $S_{z} := \frac{2 z}{c^{2}}$ $t := \frac{S}{S_{x}^{2} + S_{y}^{2} + S_{z}^{2}}$
x:=x−tS_x
y:=y−tS_y
z:=z−tS_z

Third Step: The unit normal vectors:
$n_{x} := \frac{S_{x}}{\sqrt{S_{x}^{2} + S_{y}^{2} + S_{z}^{2}}}$ $n_{y} := \frac{S_{y}}{\sqrt{S_{x}^{2} + S_{y}^{2} + S_{z}^{2}}}$ $n_{z} : = \frac{S_{z}}{\sqrt{S_{x}^{2} + S_{y}^{2} + S_{z}^{2}}}$

Fourth Step: Projecting the unit vector e_iin direction i onto the tangent plane (i=1, 2, 3):

- for i=1,
  
  G_1x=1−n_x²
  G_1y=−n_xn_y
  G_1z=−n_xn_z
- for i=2,
  
  G_2x=−n_xn_y
  G_2y=1−n_y²
  G_2z=−n_yn_z
- for i=3,
  
  G_3x=−n_xn_z
  G_3y=−n_yn_z
  G_3z=1−n_z²

Fifth Step: Determining an unknown parameter λ_i:

- for i=1,
  $λ_{1} := - \frac{b^{2} {c^{2} (1 - n_{x}^{2})}^{2} + a^{2} c^{2} n_{x}^{2} n_{y}^{2} + a^{2} b^{2} n_{x}^{2} n_{z}^{2}}{a^{2} b^{2} c^{2} (n_{x} S_{x} + n_{y} S_{y} + n_{z} S_{z})}$
- for i=2,
  $λ_{2} := - \frac{b^{2} c^{2} n_{x}^{2} n_{y}^{2} + a^{2} {c^{2} (1 - n_{y}^{2})}^{2} + a^{2} b^{2} n_{y}^{2} n_{z}^{2}}{a^{2} b^{2} c^{2} (n_{x} S_{x} + n_{y} S_{y} + n_{z} S_{z})}$
- for i=3,
  $λ_{3} := - \frac{b^{2} c^{2} n_{x}^{2} n_{z}^{2} + a^{2} c^{2} n_{y}^{2} n_{z}^{2} + a^{2} {b^{2} (1 - n_{z}^{2})}^{2}}{a^{2} b^{2} c^{2} (n_{x} S_{x} + n_{y} S_{y} + n_{z} S_{z})}$

Sixth Step: Updating the point r(i=1, 2, 3):

a:=1 $b := {\begin{matrix} 2 x & (if & i = 1) \\ 2 y & (if & i = 2) \\ 2 z & (if & i = 3) \end{matrix}$
s:=−b/2a $t_{i} := {\begin{matrix} s & if \langle s \rangle \leq d_{i} / \langle G_{i} \rangle \\ d_{i} / \langle G_{i} \rangle & otherwise \end{matrix}$

if |s|≦d_i/|G_i|otherwise

x:=x+t_iG_ix+t_i²λn_x
y:=y+t_iG_iy+t_i²λ_in_y
z:=z+t_iG_iz+t_i²λ_in_z

Seventh Step: The process returns to the fourth step.

Eighth Step: Determining convergence:

If |G_i| for each of the directions i(i=1, 2, 3) is smaller than the convergence ε, the real function E(r) is assumed to have been minimized, and the process is ended.

If this process is carried out, where a=3, b=2, c=1, x₀=2.5, y₀=2.5, z₀=2.5, d=0.1, and ε=10⁻⁷, the values start from the initial values (2.5, 2.5, 2.5) and converge to the values (0.0, 0.0, 1.0) that minimize the real function E(x, y, z), as illustrated in FIG. 8.

By using an algorithm that includes a step of linearly approximating the hypersurface S(r) and a step of finding an extreme value along a curved line having second-order osculation with the geodetic line of the hypersurface S(r)=c passing through a point r, the solution to a constrained optimization problem can be derived through a simple calculation process.

Fifth Example Application

The optimization algorithm according to this embodiment includes two steps: linearly and asymptotically approximating a hypersurface S(r)=0 as a constraint wherein
$r := r - \frac{S (r)}{{\langle \nabla S (r) \rangle}^{2}} \nabla S (r)$

and finding an extreme value in the projection direction onto a plane tangential to the hypersurface S(r)=c wherein

G:=−∇E(r)+<r, ∇E(r)>n $s := - \frac{〈 G, \nabla E (r) 〉}{〈 G, KG 〉} (K_{ij} = \frac{\partial^{2} E (r)}{\partial r_{i} \partial r_{j}})$ $t := {\begin{matrix} s & if s \leq L \max / \langle G \rangle \\ L \max / \langle G \rangle & otherwise \end{matrix}, and r := r + tG .$

if s≦Lmax/|G|

otherwise

r:=r+tG.

Here, for the two-dimensional real vector r=(x, y), the real function E(r) is to be minimized subject to the constraint condition S(r)=0 where

E(r)=x²+y²
S(r)=ax+by−1.

This problem can be solved according to the steps described below.

First Step: Setting the initial value for a point r:

The initial values x₍₀₎and y₍₀₎are set as x:=x(₀) and y:=y₍₀₎.

Second Step: Linearly approximating the hypersurface S(r)=0:

S:=ax+by−1
S_x:=a
S_y:=b $t := \frac{S}{S_{x}^{2} + S_{y}^{2}}$
x:=x−tS_x
y:=y−tS_y

Fourth Step: Projecting the gradient vector of the real function E(r) to the tangent plane:

E_x:=2x
E_y:=2y
G_x=−E_x+(n_xE_x+n_yE_y)n_x
G_y:=−E_y+(n_xE_x+n_yE_y)n_y

Fifth Step: Determining convergence:

If |G|<ε, then the process is ended.

Sixth Step: Updating the point r along the quadratic curve:

α=G_x²+G_y²
β=G_xE_x+G_yE_y
s:=−β/2α $t := {\begin{matrix} s & if 0 \leq t \leq L \max / | G | \\ L \max / \langle G \rangle & otherwise \end{matrix}$

if 0≦t≦Lmax/|G|otherwise

x:=x+tG_x
y:=y+tG_y

Seventh Step: The process returns to the second step.

The computational results of carrying out the above-described steps, where a=2, b=1, x₀=1.5, y₀=1.8, and ε=10⁻⁸, are illustrated in FIG. 9.

As illustrated in FIG. 9, the values start from the initial values (1.5, 1.8) and converge to the values (0.4, 0.2) that minimize the real function E(x, y).

Sixth Example Application

The sixth example application of the optimization algorithm according to the third embodiment of the present invention is similar to the fifth example application in that, for the three-dimensional real vector r=(x, y, z), the real function E(r) is to be minimized subject to the constraint condition S(r)=0 where

E(r)=x²+y²+z²
S(r)=ax+by+cz−1.

This problem can be solved according to the steps described below.

First Step: Setting the initial value for a point r:

The initial values x₍₀₎, y₍₀₎, and z₍₀₎are set as x:=x₍₀₎, y:=y₍₀₎, and z:=z₍₀₎.

Second Step: Linearly approximating the hypersurface S(r)=0:

S:ax+by+cz−1
S_x:=a
S_y=b
S_z:=c $t := \frac{S}{S_{x}^{2} + S_{y}^{2} + S_{z}^{2}}$
x:=x−tS_x
y:=y−tS_y
z:=z−tS_z

Third Step: The unit normal vectors:
$\begin{matrix} n_{x} := \frac{S_{x}}{\sqrt{S_{x}^{2} + S_{y}^{2} + S_{z}^{2}}} \\ n_{y} := \frac{S_{y}}{\sqrt{S_{x}^{2} + S_{y}^{2} + S_{z}^{2}}} \\ n_{z} := \frac{S_{z}}{\sqrt{S_{x}^{2} + S_{y}^{2} + S_{z}^{2}}} \end{matrix}$

Fourth Step: Projecting the gradient vector of the real function E(r) to the tangent plane:

E_x=2x
E_y=2y
E_z=2z
G_x=−E_x+(n_xE_x+n_yE_y+n_zE_z)n_x
G_y:=−E_y+(n_xE_x+n_yE_y+n_zE_z)n_y
G_z:=−E_z+(n_xE_x+n_yE_y+n_zE_z)n_z

Fifth Step: Determining convergence:

If |G|<ε, then the process is ended.

Sixth Step: Updating the point r along the quadratic curve:

α=G_x²+G_y²+G_z²
β=G_xE_x+G_yE_y+G_zE_z
s:=−β/2α $t := {\begin{matrix} s & if 0 \leq t \leq L \max / | G | \\ L \max / \langle G \rangle & otherwise \end{matrix}$

if 0≦t≦Lmax/|G|otherwise

x:=x+tG_x
y:=y+tG_y
z:=y+tG_z

Seventh Step: The process returns to the second step.

The computational results of carrying out the above-described steps, where a=2, b=1, c=1, x₀=1.5, y₀=1.8, z₀=0.8, and ε=10⁻⁸, the values start from the initial values (1.5, 1.8, 0.8) and converge to the values (⅓, ⅙, ⅙) that minimize the real function E(x, y, z), as illustrated in FIG. 10.

As described above, by carrying out the steps according to the fifth and sixth example applications, the solution to a constrained optimization problem can be derived through a simple calculation process.

The present invention is not limited to the above-described embodiments and may be applied to multi-purpose computers, such as personal computers and mainframes that can be used for various purposes, such as scientific computations, business transactions, or controls, in accordance with the software executed. Moreover, the present invention may be applied in a computer that stores an optimization algorithm according to the present invention and that is mainly used for solving optimization problems.

The present invention may be applied to any kind of structure so long as the functions according to the embodiments of the present invention can be realized. For example, the structure of the software and hardware according to the embodiments of the present invention are interchangeable.

The present invention may be realized by supplying a storage medium (or recording medium) storing the program code for the software for realizing the functions of the above-described embodiments to a system or apparatus and by reading out and executing the program code stored on the storage medium (or recording medium) by a computer (CPU or micro-processing unit (MPU)). In this case, the program code read out from the storage medium (or recording medium) realizes the functions of the embodiments, and, thus, the program code is included in the scope of the present invention. The software program code is included in the scope of the present invention when the functions of the above-described embodiments are realized by executing the read-out program code by a computer and also when the functions of the above-described embodiments are realized by an operating system (OS) operating on a computer fully or partially carrying out a process based on the program code.

The program code read out from the storage medium (or recording medium) is included in the scope of the present invention also when the functions according to the above-described embodiments are realized by storing the program code in a memory included in a function expansion board of a computer or a function expansion unit connected to a computer and carrying out the entire process or part of the process according to the software program code by a CPU of the function expansion board or the function expansion unit. When the present invention is applied to the above-described storage medium (or recording medium), the program code corresponding to the above-described flow charts will be stored in the storage medium (or recording medium).

Although the present invention has been described with reference to exemplary embodiments with a certain degree of particularity, many apparently widely different embodiments of the invention can be made without departing from the spirit and the scope thereof. It is to be understood that the invention is not limited to the specific embodiments thereof except as defined in the appended claims.

This application claims priority from Japanese Patent Application No. 2004-073299 filed Mar. 15, 2004, which is hereby incorporated by reference herein.

Optimization method and optimization apparatus

Information

Publication Number

Date Filed

Date Published

Inventors

CPC

US Classifications

International Classifications

Abstract

Description

Claims

Priority Claims (1)