1. Field of the Invention
The present invention relates to an optimization method for determining a minimum value of an optimization function under constraints given by equations, to an apparatus therefor, and to a program therefor.
2. Description of the Related Art
A problem of minimizing a real-valued function E(r) under constraints specified by the equations is considered as,
Si(r)=0,i=1, . . . ,m, (1)
where it is assumed that r is a n-dimensional real vector, that is:
The problem of maximizing the real-valued function reduces to the above-described problem by inverting the sign of the function.
When there are no constraints, as a commonly used algorithm for solving such a problem, starting from an appropriate initial vector r(0), a point r at which the function E(r) reaches a minimum is searched for by using a gradient vector F=−∇E(r).
In recent years, the conjugate gradient method, including the Fletcher-Reeves method, the Polak-Ribiere method, etc., has also been used. The details of these algorithms are described in “Computational Methods in Optimization”, by E. Polak, 1971, published by Academic Press. These methods have the advantages that, because a search direction orthogonal to the direction of a previously performed search is selected, the search is performed efficiently, and the algebraic properties of the Hestnes-Stiefel conjugate gradient method can be taken advantage of where the function E(r) can be approximated by a second order. Accordingly, these methods are frequently used.
When constraints are added, since such simple algorithms do not succeed, various contrivances have been made. As representative examples, the penalty method and the multiplication method (the Lagrangian method of undetermined coefficients) are well known. These methods are described in detail in, for example, Iwanami Lectures on Applied Mathematics, “Optimization Methods”, by Hiroshi FUJITA, Hiroshi KONNO, and Kunio TANABE, 1994, published by Iwanami Shoten.
In the penalty method, to express constraints, a penalty function P(r) which becomes 0 when the constraint is satisfied and which becomes a very large number when the constraint is not satisfied is introduced so that a new function,
Ω(r)=E(r)+P(r) (3)
is minimized. A specific form of the penalty function may be considered as,
where ωi is a positive number and is adjusted to an appropriate value during the process of searching for a minimum value.
In the multiplication method, an undetermined multiplier λi is introduced so that a new function of r and λi,
is made stationary. The constraint is expressed as a stationary condition of Ω(r, λi) with respect to λi,
In addition to these methods, an algorithm using a dynamic system in which a vector field is considered inside a space under consideration and the optimum point is an asymptotic solution of the vector field is also being investigated (described in the above-described “Optimization Methods” in the Iwanami Lectures on Applied Mathematics.)
In addition, recently, Smith et al. have proposed an algorithm in which an admissible set, that is, a set of points which satisfy constraints, is regarded as a Riemannian manifold, and based on that, a Newtonian method and a conjugate gradient method have been considered (“Optimization Techniques on Riemannian Manifolds”, S. T. Smith, AMS, Fields Institute Communications, Vol. 3, 1994, pp. 113–136). Furthermore, regarding a case in which the admissible set is a Grassmann manifold or a Stiefel manifold, a more detailed algorithm has been proposed “The Geometry of Algorithms with Orthogonality Constraints”, A. Edelman, T. A. Arias, S. T. Smith, SIAM J. Matrix Anal. Appl. Vol. 20, 1998, pp. 303–353). Since this algorithm becomes basically the same as the case in which there are no constraints, it has the feature that an additional variable or an adjustment parameter, such as those which appear in other methods, is unnecessary. This technique provides a new point of view with respect to optimization methods with constraints, and a wide range of applications to eigenvalue problems, etc., can be expected.
As described in the conventional methods, as an algorithm for an optimization problem under equation constraints, the multiplication method and the penalty method are widely used. The former becomes a saddle search problem, and the calculation algorithm becomes complex compared to an extremal-value search. The latter has a problem in that the selection of a penalty parameter must be contrived according to the problem.
Furthermore, in the algorithm using the above-described dynamic system, a contrivance for setting an appropriate dynamic system is required.
Furthermore, in the above-described algorithm of Smith et al., during the process of searching for the optimum point, a point is moved along a geodesic line on an admissible set. Smith et al. specifically form a solution of a geodesic line equation with regard to the cases of the Grassmann manifold and the Stiefel manifold. From the viewpoint of principles, if a point is moved in accordance with this solution, the point should stay within the admissible set. In practice, however, there is a problem in that, due to errors in numerical calculations, the point deviates from the admissible set. Furthermore, for a case in which the admissible set is not one of these manifolds, a mathematical expression of a specific solution of a geodesic line equation is not given.
Accordingly, an object of the present invention is to provide a method which is capable of determining a solution of an optimization problem with constraints using a simple procedure, and an apparatus therefor.
According to one aspect, the present invention which achieves this objective relates to a method for determining a minimum value of an optimization function under constraints given by equations, the method comprising the steps of: an approaching step of regarding a set of points which satisfy the constraints as a Riemannian manifold within a finite-dimensional real-vector space and of approaching the Riemannian manifold from an initial position within the real-vector space; an orbit generating step of truncating an exponential map with respect to a geodesic line equation for a tangent vector on the Riemannian manifold by a finite order and of generating an approximate geodesic line as a one-dimensional orbit obeying the truncated exponential map; and an approximate parallel-translation step of performing approximate parallel-translation on the tangent vector on the Riemannian manifold and on the orbit generated in the orbit generating step by finite-order approximation of the exponential map for the parallel translation of the tangent vector.
According to another aspect, the present invention which achieves this objective relates to an apparatus for determining a minimum value of an optimization function under constraints given by equations, the apparatus comprising: approaching means for regarding a set of points which satisfy the constraints as a Riemannian manifold within a finite-dimensional real-vector space and of approaching the Riemannian manifold from an initial position within the real-vector space; orbit generating means for truncating an exponential map regarding a geodesic line equation with respect to a tangent vector on the Riemannian manifold at a finite order and of generating an approximate geodesic line as a one-dimensional orbit obeying the truncated exponential map; and approximate parallel-translation means for performing approximate parallel-translation on the tangent vector on the Riemannian manifold and on the orbit generated in the orbit generating step by finite-order approximation of the exponential map regarding the parallel translation of the tangent vector.
According to still another aspect, the present invention which achieves this objective relates to a computer-readable program for determining a minimum value of an optimization function under constraints given by equations, the program comprising code for causing the computer to perform: an approaching step of regarding a set of points which satisfy the constraints as a Riemannian manifold within a finite-dimensional real-vector space and of approaching the Riemannian manifold from an initial position within the real-vector space; an orbit generating step of truncating an exponential map with respect to a geodesic line equation for a tangent vector on the Riemannian manifold by a finite order and of generating an approximate geodesic line as a one-dimensional orbit obeying the truncated exponential map; and an approximate parallel-translation step of performing approximate parallel-translation on the tangent vector on the Riemannian manifold and on the orbit generated in the orbit generating step by finite-order approximation of the exponential map with respect to the parallel translation of the tangent vector.
Other objectives and advantages besides those discussed above shall be apparent to those skilled in the art from the description of a preferred embodiment of the invention which follows. In the description, reference is made to accompanying drawings, which form a part thereof, and which illustrate an example of the invention. Such example, however, is not exhaustive of the various embodiments of the invention, and therefore reference is made to the claims which follow the description for determining the scope of the invention.
A preferred embodiment of the present invention will be described below in detail with reference to the attached drawings.
Here, a problem of determining a minimum value of an optimization function under constraints given by the above-described equations is formularized mathematically.
An n-dimensional real-vector space Rn is considered, and (ei|i=1, . . . , n) is assumed to be the base thereof. It is assumed that (ei|i=1, . . . , n) is not necessarily orthogonal and takes, with respect to the inner product <,>, the value of
bij=<ei, ej>i,j=1, . . . ,n. (7)
In practice, in an optimization problem which is a target in a finite element method, the base is not orthogonalized. Furthermore, it is assumed that a C∞-class function indicating m constraints (1≦m≦n):
Sk:Rn→R(k=1, . . . ,m), (8)
and a C∞-class function (object function) bounded to the below:
E:Rn→R, (9)
are given. It is assumed that the constraints are given by Sk(x)=0, where k=1, . . . , m by using the above-described functions. A set which satisfies the constraints is called an “admissible set” and is written as:
M={x∈Rn|Sk(x)=0,for k=1, . . . ,m}. (10)
At this time, it is assumed that the regularity condition
rankJ(x)=m, (11)
is satisfied in the neighborhood of M within Rn. Here, J(x) is a Jacobian matrix, and the ki components thereof are defined on the basis of the following:
The Euclidean space Rn can be regarded as a flat Riemannian manifold in which b is a metric. On the basis of the regularity of the constraints, M also becomes an m-dimensional sub-Riemannian-manifold in which b|M, obtained by restricting the metric of Rn onto M, is a metric. Therefore, the optimization function with constraints in consideration is formularized as an optimization problem on a Riemannian manifold M: “determine x∈ M which minimizes E(x)”.
Next, describing the manifold M by using vector analysis in Rn is shown. As the coordinates for representing the points on an (n−m)-dimensional Riemannian manifold, a local coordinate (ξi|i=1, . . . , n−m) can be used. Here, by using the fact that M is a submanifold of Rn, M is represented by using the positional vector of Rn. In order to distinguish this coordinate (xi|i=1, . . . , n−m) from a local coordinate in M, here, this is called an “extrinsic coordinate”. From the viewpoint of numerical calculations, when an extrinsic coordinate is used, there is an advantage in that an efficient algorithm can be constructed by using the storage of algorithms on linear algebra. M, as a Riemannian manifold, and the properties in the vicinity thereof are described using the extrinsic coordinate.
Accordingly, here, a gradient, directional derivative, and Hessian in Rn are defined as tools therefor. When the representations of vector analysis thereof in Rn are used, it becomes possible to describe the quantities on M in a form which is independent of the selection of the extrinsic coordinate, thus offering good prospects.
In the following description, for subscripts regarding the extrinsic coordinates, the convention of Einstein is used. That is, when the same subscript appears two times, the sum from 1 to n will be taken as regard the subscript. Furthermore, ij components of an inverse matrix of a matrix (bij|i,j=1, . . . , n) will be written as bij.
The gradient, the directional derivative, and the Hessian are defined below.
When x is a point within Rn and f is a C28-class function on Rn,f: Rn→R, with respect to an arbitrary vector Z=Ziei, a vector Y uniquely exists so that it satisfies
This vector Y is called the “gradient” of f in x, and is written as Gradx f. When the gradient is shown using components, the gradient becomes as follows:
For the convenience of calculations, the following quantities are introduced:
Fk(x):=GradxSk(k=1, . . . ,m),
Qkl(x):=<Fk(x),Fl(x)>(k,l=1, . . . ,m), (15)
However, the existence of the matrix Qkl(x) depends on the regularity of the constraints as a matrix of Qkl(x), but the regularity of the constraints is ensured on the admissible set M and is also ensured in the vicinity of M within the Rn. In many physical examples, the function Sk(x) is a sufficiently relaxed and blunt function, and M itself has a sufficiently relaxed and blunt shape. Therefore, within a region to be considered below, it may be considered that Qkl(x) exists.
Next, directional derivative is introduced. At the point x within Rn, the directional derivative (Y·Grad)xf of the function f∈ C28(Rn) along the vector Y=Yiei is defined as follows:
Furthermore, the directional derivative (Y·Grad)xZ of the vector field Z(x)=Zi(x)ei along the vector Y=Yiei is defined as follows:
It is a well-known fact that the definition of this directed differentiation does not depend on the selection of the extrinsic coordinate.
Next, the Hessian is defined. When x is a point within Rn and f is a C∞-class function on Rn, f: Rn→R, a real-valued function Hessx f, in which vectors Y=Yiei and Z=Ziei are arguments, is defined as,
This function Hessx f is called “Hessian” of f at x.
Next, a description is given of an approaching algorithm onto the admissible set M by using the above-described definitions of the gradient, directed differentiation, and Hessian. x is assumed to be a point within Rn and a point in the neighborhood of M. In the neighborhood system, a family {Sk} of functions can be approximated as components of a vector space along the normal direction in which M is a zero point. That is, if a point y on M is assumed to be a point on M and to be a point closest to x, an approximation can be made with good accuracy as shown below:
where each μk is an n-dimensional vector quantity, and the components are shown as μk=(μk1, μk2, . . . , μkn), Since y is assumed to be a point on M, Sk(y)=0. Since the following is satisfied on the basis of the assumption
the following is obtained:
When this is approximated by a first order Taylor expansion, the following is obtained:
With the assistance of the quantity of equation (15), the following is obtained:
When the linear independence of the neighborhood system is considered, the following is obtained,
<Fk(x),μ1>=δkl. (21)
When this equation is solved with respect to μk, based on equation (15), the following is obtained:
Therefore, when the projection of x onto M is performed at the point x within Rn and in the neighborhood of M, the vector corresponding to the amount of change thereof is defined as follows:
Next, as a property of an embedded manifold, the Riemannian geometrical properties of the admissible set M are described by using a well-known technique. As is well known in the discussion of integrability, a necessary and sufficient condition for the vector Y on M to become a tangent vector of M is
(dSk)x(Y)=0 or <Gradx Sk,Y>. (24)
When a vector Y on Rn at a point x on M is given, for an arbitrary tangent vector U∈TxM at the point x, V∈TxM, which satisfies the relation,
<U,V>=<U,Y> (25)
uniquely exists, and this can be written as:
A projection V of the vector Y onto a tangent space TxM is written as Projx Y.
When tangent vector fields U,V∈x(M) on M are given, the covariant derivative ∇UV∈x(M) of V along the U direction is given as,
∇UV=Projx[U·GradxV]. (27)
When this is rewritten using the definition of Projx of equation (26), the following is obtained:
Here, by performing U,Grad on the relation <Fk, V>=0 which is obtained due to the fact that V is a tangent vector field on M, the following relation is obtained,
With the aid of this relation,
∇UV=(U·Grad)xV−Γ(U,V), (30)
Next, the parallel translation is described. The condition for the vector field V on M to be parallel along a smooth curve {γ(t)|t∈[0,1]} is that the covariant derivative of V along the tangential direction of the curve at each point γ(t) on the curve is 0, that is:
({dot over (γ)}(t)·Grad)xV=Γ({dot over (γ)}(t),V). (32)
Here, by using
the parallel translation equation of the vector field V(t) on the well-known curve {γ(t)|t∈[0,1]} on M becomes
The solution of the parallel translation equation with respect to the initial condition V(0)=G is given by equation,
V(t)=G−tΓ(H,G)+o(t), (35)
where H={dot over (γ)}(0).
The fact that the curve {γ(t)|t∈[0,1]} on M is a geodesic line means that {dot over (γ)}(t) satisfies the parallel translation equation. Therefore, the geodesic line equation becomes
The solution of the geodesic line equation with respect to the initial condition γ(0)=x0 is given on the basis of the following equation
where H={dot over (γ)}(0).
Next, an outline of the algorithm of Smith is described. In the first place, in general, the concept of the direction on a Riemannian manifold is not well defined. When a tangent vector at a particular point on a Riemannian manifold is defined and it is parallel-translated on a particular closed curve passing through that point, in general, the direction of the initial vector differs from the direction of the vector which has passed around along the closed curve. It is well known that the amount of change thereof depends on the scalar curvature within the adopted closed curve.
For this reason, the attempt that a conjugate gradient method is formed on a manifold cannot offer prospects that all the good properties of the conjugate gradient method with respect to a quadratic form are satisfied. However,
Therefore, in a case where the energy integration is a quadratic form, it is a natural idea that the conjugate gradient method on a manifold is defined to become an algorithm similar to those of Hestenes, Lanczos, and Stiefel. In practice, in the case of a system in which a model of a physical phenomenon is formed, it is empirically known that, often, the energy function has such properties. Accordingly, the algorithm of Smith is described with reference to
The algorithm of Smith:
In step S71, the initial position is set to x=x0.
In step S72, if the point x does not exist on the admissible manifold M, the point x is moved onto the admissible manifold M.
In step S73, it is determined whether or not the energy is converged. It is set in advance that, at the first time, this becomes “NO”.
In step S74, the following is calculated:
Gnew:=ProjxGradxE. (38)
In step S75, a determination is made as to the number of iterations. For the first iteration, the process proceeds to step S76, where G:=Gnew and H:=−Gnew are set.
In step S78,
Then, the process returns to step S73, where it is determined whether the energy has converged. When the amount of change of the energy is smaller than a desired value, it is determined that the energy has converged, and the processing is terminated. When it is determined that the energy has not converged, Gnew is calculated in step S74, and then a determination in step S75 is made. For the second and subsequent iterations, the process proceeds to step S77, where the following are calculated:
Thereafter, the process proceeds to step S78 again. The above-described steps are performed in accordance with the flowchart.
However, when numerical calculations are performed using the algorithm of Smith, due to numerical errors, τ0(G) and τ0(H) obtained as a result of parallel translation are no longer tangent vectors, and x is no longer a point on M. For this reason, approaching the admissible manifold M performed in step S72 must also be performed each time in step S78. To begin with, in the manner described above, this algorithm is not validated mathematically as a conjugate gradient method with respect to general constraints and the energy function, and accurately calculating parallel translation, etc., has no significant meaning.
On the other hand, the operations for τ0(G), τ0(H), etc., are calculations requiring cost in terms of computer use, and a long calculation time is required.
Therefore, in this embodiment, substitution by replacing the above operations with the second-order approximation described below is proposed. In the neighborhood of the minimum value on the admissible manifold M, when the energy function along M can be sufficiently regarded as a quadratic form, this approximation matches the exact calculation result. Therefore, it has a validity comparable to the algorithm of Smith, and the calculation time is substantially reduced.
A method of solving a problem according to this embodiment is now described with reference to
In step S1, the initial position is set as x=x0.
In step S2, the following operation of bringing x close to M is performed:
x:=x+ApprM(x). (40)
In step S3, it is determined whether the energy E has converged. Hereafter, steps S2 to S8 are performed until the energy E converges.
In step S4, a new tangent vector Gnew is calculated on the basis of the following:
Gnew:=ProjxGradxE. (41)
In step S5, the number of iterations is checked. For the first iteration, the process proceeds to step S6, where the tangent vector G is set as G:=Gnew and the searching-direction vector H is set as H:=−Gnew.
Next, in step S8,
and t at which E(t) reaches a minimum on C is determined.
(2) The tangent vector G which is approximately parallel-translated by t along the geodesic line is set as
τ(G):=Projx(G−tΓ(H,G)). (42)
(3) The search vector H which is approximately parallel-translated by t along the geodesic line is set as:
τ(H):=Projx(H−tΓ(H,H)) (43)
(4) The point x is moved by t along the geodesic line:
The process returns to step S2, where the point is moved onto the admissible manifold M, after which, in step S3, it is determined whether convergence has occurred. For this determination, for example, it is determined whether the energy difference is greater or smaller than a desired value. When it is determined that the energy has sufficiently converged, the program (subroutine or function) is terminated.
In step S5, Gnew is calculated, after which the determination in step S6 is made. For the second and subsequent iterations, the process proceeds to step S8, where the calculations of the following substitutions are performed:
and the process proceeds to step S8. This operation is performed until the energy converges.
Next, a description is given of the configuration of a computer for realizing the optimization method described with reference to the flowchart in
The computer loads, into the memory 118, an executable program for an optimization method in accordance with the algorithm of the present invention from a storage medium, such as the hard disk 117 or the floppy disk 110, allocates a necessary calculation area in the memory 118, and performs calculations, in accordance with the procedure corresponding to the flowchart in
description is now given below on the basis of specific examples.
As Example 1, the following geometrical problem is considered. The space to be considered is an n-dimensional real-vector space Rn (ei|i=1, . . . , n), and the standard orthogonal base is (ei|i=1, . . . , n), that is, bij=<ei, ej>=δ(i,j=1, . . . , n). The admissible set M is set as:
Furthermore, the energy functional is set as:
In this case, as a conditional equation representing the constraints, the following may be used:
For the sake of convenience, here, when a dual basis of ei:=bijej is introduced, the following are obtained:
The Hessian is also calculated as follows:
HessxS1(U,V)=2δijUiVj, (50)
HessxS2(U,V)=0.
When calculations are performed in accordance with the algorithm of
As Example 2, an optimization algorithm of the above-described embodiment is applied to the problem of the shape of droplets on a plane.
The constraints are such that the volume of the droplets is fixed, and an admissible manifold which satisfies the constraints is given on the basis of the following equation:
M={(r1,r2, . . . rn)∈Rn|S(r)=0}, (51)
where
However, by setting
the following are set:
si(r)=π/3(ri2 sin2 θi+riri+1 sin θi sin θl+1+ri+12 sin2 θi+1),
h1(r)=ri cos θi−ri+1 cos θi+1.
Furthermore, as an energy functional, the following may be used:
At this time, as geometrical quantities for using the above-described algorithm, quantities such as those shown below are specifically determined. However, since there is one conditional equation, S(r) is regarded as being the same as S1(r).
For example, the Hessian of S(r) is determined as shown below:
Hessxs1(U,V)=2sijUiVj, (51)
where
Furthermore, the individual terms are given on the basis of the following:
Similarly, regarding the energy functional, by rewriting the energy functional as follows:
The directional derivative thereof, etc., are calculated as follows.
Similarly, the Hessian Hessx E(U,V)=2eijUiVj of the energy functional can also be calculated as,
When calculations are performed in accordance with the algorithm in
As described above, the algorithm of this embodiment can be contained as a subroutine program on a medium such as the floppy disk 110 or the hard disk 117 of
In the above-described embodiment, in an n-dimensional Euclidean space on a real number field, solution of a geodesic line equation is specifically constructed with second-order accuracy with respect to a case where constraints are specified by m independent equations, and an algorithm in which correction when the solution deviates from an admissible set is taken into consideration is proposed.
When the method of the above-described embodiment is used, the number of calculations becomes much smaller than that in which a geodesic line equation is solved exactly. On the other hand, when calculations are performed in accordance with an exact solution of the geodesic line equation, in practice, a process of recurring to an admissible set in a manner similar to that for a second-order approximation becomes necessary due to numerical errors. Therefore, in practice, calculations based on second-order approximation are not inferior to calculations using an exact solution: rather, they are considered to be advantageous by an amount corresponding to the smaller number of calculations.
Furthermore, in the algorithm described in the above-described embodiment, second-order function approximation is performed in the calculation of a geodesic line and in the calculation of the parallel translation of a vector. Even if one of them or both are substituted into a first-order approximation, the same algorithm can be constructed, and the same calculation results can be obtained.
The present invention may be applied to a system composed of a plurality of devices (for example, a computer main unit, an interface device, a display, etc.) and also to an apparatus composed of a single device as long as in which the functions of the above-described embodiment can be realized.
Furthermore, with a view to operating various devices so as to realize the functions of the above-described embodiment, an implementation in which program code of software which realizes the functions of the above-described embodiment is supplied to a computer within an apparatus or a system, which is connected to the various devices, the computer (or a CPU or an MPU) of that system or apparatus causes the various devices to operate in accordance with the supplied program is also within the scope of the present invention. In this case, the program code itself which is read from a storage medium realizes the functions of the above-described embodiment. The program code itself, and means for supplying the program code to the computer, for example, a storage medium having such program code stored therein, constitute the present invention.
For storage media for supplying such program code, for example, a floppy disk, a hard disk, an optical disk, a magneto-optical disk, a CD-ROM, a CD-R, magnetic tape, a non-volatile memory card, a ROM, etc., can be used.
Furthermore, when not only are the functions of the above-described embodiment realized by executing the program code read by the computer, but also the functions of the above-described embodiment are realized in cooperation with the OS (Operating System) running on the computer or other application software on the basis of the instructions of the program code, such program code is within the scope of the present invention.
In addition, a case in which program code read from a storage medium is written into a memory provided in a function expansion board inserted into a computer or a function expansion unit connected to a computer, after which a CPU provided in that function expansion board or in that function expansion unit executes a part or the entirety of the actual processing, and the functions of the above-described embodiment are realized by that processing is within the scope of the present invention.
When the present invention is to be applied to the storage medium, the storage medium may store program code corresponding to the flowchart described above.
Use of the optimization method which has thus been described makes it possible to determine a solution of an optimization problem with constraints with a simple calculation procedure.
Although the present invention has been described in its preferred form with a certain degree of particularity, many apparently widely different embodiments of the invention can be made without departing from the spirit and the scope thereof. It is to be understood that the invention is not limited to the specific embodiments thereof except as defined in the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
2001-210796 | Jul 2001 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
5377307 | Hoskins et al. | Dec 1994 | A |
5590063 | Golio et al. | Dec 1996 | A |
Number | Date | Country | |
---|---|---|---|
20030065689 A1 | Apr 2003 | US |