The most straightforward way to handle inequality constraints in an optimization problem consists of transforming the inequality constraints into equality constraints by subtracting a vector of slack variables. The main benefit of this transformation is the ability to find an easy feasible starting point for interior-point based approaches. The second approach for handling inequality constraints consists of placing them into a barrier function. The most common two examples of barrier functions are the log-barrier function and the inverse barrier function. The log-barrier is usually preferred because the logarithm increases so slowly as the boundary is approached. The classical log-barrier (CLB) method transforms the optimization problem with inequality constraints to an unconstrained subproblem. The main difficulty associated with the CLB method is finding a strictly feasible starting point because finding the strictly feasible starting point has the same complexity as solving the original optimization problem with inequality constraints. Furthermore, it has been shown that a Hessian matrix of the log-barrier function becomes increasingly ill-conditioned as the barrier parameter converges to zero. As a result, CLB methods were quickly abandoned by practitioners.
As an alternative to CLB methods, a new method based on a modified log-barrier (MLB) function was introduced that eliminates most of the deficiencies associated with the CLB method while retaining its best features. However, though MLB methods may exhibit strong convergence properties, they require shifting a starting point via an auxiliary problem after updating the barrier parameter. The MLB method further has singularities at its boundary leading to numerical difficulties that prevent the algorithm from performing well when approaching the solution.
In an example embodiment, a computer-readable medium is provided having stored thereon computer-readable instructions that when executed by a computing device, cause the computing device to determine a solution to a nonlinear optimization problem. (A) An optimality check is performed for a current solution to a predefined objective function. The predefined objective function is a nonlinear equation with a plurality of decision variables. (B) When the performed optimality check indicates that the current solution is an optimal solution, the optimal solution to the predefined objective function is output. The optimal solution includes a value for each decision variable of the plurality of decision variables that optimizes a result of the predefined objective function.
When the performed optimality check indicates that the current solution is not the optimal solution, (C) a complementarity slackness test is performed with the current solution; (D) based on the result of the performed complementarity slackness test, a barrier parameter value is updated and a Lagrange multiplier value is updated for each constraint function of a plurality of constraint functions defined for objective function optimization; (E) a search direction vector is determined by solving a primal-dual linear system that includes a dual variable defined for each constraint function of the plurality of constraint functions; (F) a step length value is determined for each decision variable of the plurality of decision variables and for each dual variable of the plurality of dual variables based on the updated barrier parameter value; (G) the current solution to the predefined objective function is updated using the determined search direction vector and the determined step length value for each decision variable of the plurality of decision variables and for each dual variable of a plurality of dual variables; and (H) (A) to (G) are repeated until the performed optimality check indicates that the current solution is the optimal solution or a predefined number of iterations of (D) has been performed.
Each constraint function applies one or more of the plurality of decision variables. The search direction vector includes a direction value for each decision variable of the plurality of decision variables and for each dual variable of a plurality of dual variables. The search direction vector is determined based on the updated barrier parameter value and the updated Lagrange multiplier value for each constraint function of a plurality of constraint functions.
In another example embodiment, a computing device is provided. The computing device includes, but is not limited to, a processor and a computer-readable medium operably coupled to the processor. The computer-readable medium has instructions stored thereon that, when executed by the processor, cause the computing device to determine a solution to a nonlinear optimization problem.
In yet another example embodiment, a method of determining a solution to a nonlinear optimization problem is provided.
Other principal features of the disclosed subject matter will become apparent to those skilled in the art upon review of the following drawings, the detailed description, and the appended claims.
Illustrative embodiments of the disclosed subject matter will hereafter be described referring to the accompanying drawings, wherein like numerals denote like elements.
Referring to
Input interface 102 provides an interface for receiving information from the user or another device for entry into nonlinear optimization device 100 as understood by those skilled in the art. Input interface 102 may interface with various input technologies including, but not limited to, a keyboard 112, a microphone 113, a mouse 114, a display 116, a track ball, a keypad, one or more buttons, etc. to allow the user to enter information into nonlinear optimization device 100 or to make selections presented in a user interface displayed on display 116.
The same interface may support both input interface 102 and output interface 104. For example, display 116 comprising a touch screen provides a mechanism for user input and for presentation of output to the user. Nonlinear optimization device 100 may have one or more input interfaces that use the same or a different input interface technology. The input interface technology further may be accessible by nonlinear optimization device 100 through communication interface 106.
Output interface 104 provides an interface for outputting information for review by a user of nonlinear optimization device 100 and/or for use by another application or device. For example, output interface 104 may interface with various output technologies including, but not limited to, display 116, a speaker 118, a printer 120, etc. Nonlinear optimization device 100 may have one or more output interfaces that use the same or a different output interface technology. The output interface technology further may be accessible by nonlinear optimization device 100 through communication interface 106.
Communication interface 106 provides an interface for receiving and transmitting data between devices using various protocols, transmission technologies, and media as understood by those skilled in the art. Communication interface 106 may support communication using various transmission media that may be wired and/or wireless. Nonlinear optimization device 100 may have one or more communication interfaces that use the same or a different communication interface technology. For example, nonlinear optimization device 100 may support communication using an Ethernet port, a Bluetooth antenna, a telephone jack, a USB port, etc. Data and/or messages may be transferred between nonlinear optimization device 100 and another computing device of a distributed computing system 130 using communication interface 106.
Computer-readable medium 108 is an electronic holding place or storage for information so the information can be accessed by processor 110 as understood by those skilled in the art. Computer-readable medium 108 can include, but is not limited to, any type of random access memory (RAM), any type of read only memory (ROM), any type of flash memory, etc. such as magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips, . . . ), optical disks (e.g., compact disc (CD), digital versatile disc (DVD), . . . ), smart cards, flash memory devices, etc. Nonlinear optimization device 100 may have one or more computer-readable media that use the same or a different memory media technology. For example, computer-readable medium 108 may include different types of computer-readable media that may be organized hierarchically to provide efficient access to the data stored therein as understood by a person of skill in the art. As an example, a cache may be implemented in a smaller, faster memory that stores copies of data from the most frequently/recently accessed main memory locations to reduce an access latency. Nonlinear optimization device 100 also may have one or more drives that support the loading of a memory media such as a CD, DVD, an external hard drive, etc. One or more external hard drives further may be connected to nonlinear optimization device 100 using communication interface 106.
Processor 110 executes instructions as understood by those skilled in the art. The instructions may be carried out by a special purpose computer, logic circuits, or hardware circuits. Processor 110 may be implemented in hardware and/or firmware. Processor 110 executes an instruction, meaning it performs/controls the operations called for by that instruction. The term “execution” is the process of running an application or the carrying out of the operation called for by an instruction. The instructions may be written using one or more programming language, scripting language, assembly language, etc. Processor 110 operably couples with input interface 102, with output interface 104, with communication interface 106, and with computer-readable medium 108 to receive, to send, and to process information. Processor 110 may retrieve a set of instructions from a permanent memory device and copy the instructions in an executable form to a temporary memory device that is generally some form of RAM. Nonlinear optimization device 100 may include a plurality of processors that use the same or a different processing technology.
Some machine-learning approaches may be more efficiently and speedily executed and processed with machine-learning specific processors (e.g., not a generic central processing unit (CPU)). Such processors may also provide additional energy savings when compared to generic CPUs. For example, some of these processors can include a graphical processing unit (GPU), an application-specific integrated circuit, a field-programmable gate array, an artificial intelligence accelerator, a purpose-built chip architecture for machine learning, and/or some other machine-learning specific processor that implements a machine learning approach using semiconductor (e.g., silicon, gallium arsenide) devices. These processors may also be employed in heterogeneous computing architectures with a number of and a variety of different types of cores, engines, nodes, and/or layers to achieve additional various energy efficiencies, processing speed improvements, data communication speed improvements, and/or data efficiency targets and improvements throughout various parts of the system.
Nonlinear optimization application 122 performs operations associated with defining solution description 124 that describes a solution to a nonlinear optimization problem with inequality constraints. Some or all of the operations described herein may be embodied in nonlinear optimization application 122. The operations may be implemented using hardware, firmware, software, or any combination of these methods.
Referring to the example embodiment of
Nonlinear optimization application 122 may be implemented as a Web application. For example, nonlinear optimization application 122 may be configured to receive hypertext transport protocol (HTTP) responses and to send HTTP requests. The HTTP responses may include web pages such as hypertext markup language (HTML) documents and linked objects generated in response to the HTTP requests. Each web page may be identified by a uniform resource locator (URL) that includes the location or address of the computing device that contains the resource to be accessed in addition to the location of the resource on that computing device. The type of file or resource depends on the Internet application protocol such as the file transfer protocol, HTTP, H.323, etc. The file accessed may be a simple text file, an image file, an audio file, a video file, an executable, a common gateway interface application, a Java applet, an extensible markup language (XML) file, or any other type of file supported by HTTP.
Nonlinear optimization application 122 provides an optimization that is a search for a maximum or a minimum of an objective function (also called a cost function), where search variables are restricted by particular constraints, where the constraints define a feasible region for the solution. A Lagrange function for the problem minimize ƒ(x) subject to c(x)≥0 is defined by (x, z)=ƒ(x)−c(x)Tz, where (x,z) indicates the Lagrange function, x indicates one or more decision variables, ƒ(x) indicates the objective function, c(x) indicates a set of one or more constraint functions applied to the optimization of the objective function, T indicates a transpose, and z indicates one or more dual variables defined for each constraint function of the one or more constraint functions. The maximum of ƒ(x) is simply a negation of the minimum of −ƒ(x). The objective function is a mathematical formula defined using a set of one or more variables x called optimization variables or decision variables. The one or more constraint functions are mathematical formulas that may include equalities (=) and inequalities (<, >, ≤, ≥) required to be simultaneously satisfied based on values for the one or more variables x. A problem is said to be linearly constrained if the functions in the constraints are linear. A linear programming problem is a linearly constrained problem with a linear objective function. A nonlinear programming problem occurs where some function in the objective function or constraint functions is nonlinear. The objective function and/or constraint functions may include one or more polynomial functions. For illustration, the optimization problem may be defined as minimize ƒ(x1,x2)=100(x2−x2)2+(1−xt)2 subject to xt≥5 and xt+Sx2≥76, where xt and x2 are decision variables, and xt≥5 and xt+Sx2≥76 are two constraint functions.
A feasible solution for the one or more decision variables x minimizes an objective function value and is called an optimal solution to the optimization problem, and the corresponding objective value is called the optimal value. Nonlinear optimization application 122 varies values for the optimization variables when searching for the optimal value for the objective function that is output to solution description 124. The solution may further be sent to a calling function, for example, of a model training process such as for training a neural network model, a support vector machine model, etc.
For two real vectors x and y of the same length, a Euclidean scalar product can be denoted by xTy. A Euclidean norm can be defined as ∥x∥=(xTx)1/2. A positive part of a real number t is a function defined by t+=max{t, 0}. ∀ƒ(x)ϵn indicates a gradient of ƒ at x. ∀c(x)T indicates a Jacobian matrix of c at x, and is an m×n matrix, whose ith row is a vector ∀ci(x)T, where m is a number of the one or more constraint functions, and n is a number of the one or more decision variables. C(x) and Z are diagonal matrices whose diagonal entries are given by the components of c(x) and z, respectively, e is a vector of all ones. A gradient and a Hessian of (x,z) with respect to the primal-dual variables (x,z) can be defined as ∀x(x, z)=∀ƒ(x)−∀c(x)z and
respectively, where ∀x2 indicates a Hessian computation with respect to x, and ∀x indicates a derivative computation with respect to x.
The first order optimality conditions, also called Karush-Khun-Tucker (KKT) conditions, for the problem minimize ƒ(x) subject to c(x)≥0 can be defined by
A paper by R. Polyak, Modified barrier functions (theory and methods), Mathematical Programming, Volume 54, Issue 2, Series A, pp. 177-222 (Feb. 1992) proposed the following MLB function associated with solving the problem minimize ƒ(x) subject to c(x)≥0
where ={1, . . . , m}, λ is a vector of positive estimates of m Lagrange multipliers, μ>0 is a barrier parameter value, λi is an ith Lagrange multiplier, ci(x) is an ith constraint function, and m is a number of constraint functions included in c(x). The MLB method solves the problem minimize ƒ(x) subject to c(x)≥0 by solving a sequence of unconstrained minimization problems
where n is a number of decision variables included in ƒ(x). The KKT conditions for solving the sequence of unconstrained minimization problems is
In the present application,
i=1, . . . m is introduced where zi is an ith dual variable value paired with an ith constraint function ci(x),
These conditions appear as a perturbation of the KKT conditions of the initial problem. Nonlinear optimization application 122 applies a Newton-like method to equation (2) while updating X and p to guarantee a global convergence. Nonlinear optimization application 122 is more flexible than the CLB method and offers two possibilities to ensure the convergence to a point for which the KKT conditions are satisfied: 1) reduce the barrier parameter or 2) update the Lagrange multiplier estimate. Using the CLB method, only the barrier parameter is driven to zero.
Nonlinear optimization application 122 executes an outer iteration and an inner iteration. A solution vector wk=(xk,zk) is associated with a kth outer iteration, where xk is a kth one or more decision variables, zk is a kth one or more dual variables paired with a kth one or more constraint functions. A minimization process starts by updating λk and pk depending on a progress toward a complementarity slackness, where λk is a kth estimate of each Lagrange multiplier of the plurality of Lagrange multiplier, μk is a kth barrier parameter value, and ∥C(xk)zk∥≤α∥C(xk−1)zk−1∥ is a complementarity slackness test used to determine how λk and μk are updated, where αϵ(0,1). When ∥(xk)zk∥≤α∥C(xk−1)zk−1∥, the Lagrange multiplier estimate is updated by setting λk+1=zk, and the barrier parameter is updated by setting μk+1≤μk. When ∥C(xk)zk≥α∥C(xk−1)zk−1∥, the barrier parameter is updated by setting μk+1≤ρμk and the Lagrange multiplier is updated by setting λk+1=λk to ensure that it does not behave badly far from a solution. The complementarity slackness test used to measure the progress achieved by the kth outer iteration may be too demanding especially for nonlinear optimization. As a result, a condition based on a non-monotonic decrease of the complementarity slackness may be used instead as defined by
∥C(xk)zk∥≤α max{∥C(xi
where ≥1, {ik} is a sequence whose elements are the iteration indices at which λk+1=zk (or ∥C(xk)zk∥≤α∥C(xk−1)zk−1∥) and {(ζk} is a first sequence that converges to zero.
After updating μk and λk, a search direction d={dx, dz} is computed by solving the following primal-dual linear system
where T indicates a transpose. This corresponds to the result of applying a Newton-like method to equation (2). Setting λk+1=zk, the linear system appears as a regularized Newton method applied to the KKT conditions of the initial problem. Hence a high rate of convergence is expected provided that the parameters are correctly updated and the Jacobian constraint is of full rank. The complementarity slackness test is used to ensure that the update λk+1=zk occurs as often as possible, where the Jacobian constraint is defined using
The linear system defined by equation (4) can be written as J(wk,μk+1)d=−r(wk,λk+1,μk+1). After the step computation, a new iteration is defined by {circumflex over (x)}k=xk+αkxdkx and {circumflex over (z)}k=zk+αkzdkz, where the step lengths αkx and αkx are computed such that
and
zk+αkzdkz>0 (6)
To decide whether the candidate iterate ŵk is selected, the following criterion is tested ∥r(ŵk,λk+1,μk+1)∥≤εk, where {εk} is a second sequence that converges to zero. If ∥r(ŵk,λk+1,μk+1)∥>εk, a sequence of inner iterations to find wk+1 satisfying ∥r(wk+1,λk+1,μk+1)∥≤εk is performed. The sequence of inner iterations minimize a new primal-dual merit function Φλ,μ(w)=λ,μ(x)+ηΨλ,μ(w) with fixed values of the Lagrange multiplier estimate and the barrier parameter where η≥0 is a first scaling parameter and
The control of iterates is performed in both primal and dual spaces during the minimization process including the inner iterations.
For the logarithmic term to be well defined, the following inequality should be satisfied
Due to non-convexity, the inequality
can restrict how quickly the barrier parameter p can be reduced. This situation mainly occurs when the complementarity slackness test is not satisfied and the current values of the constraints do not allow the barrier parameter μ to be reduced by a large factor. To respond to this occurrence, a quadratic extrapolation is applied by modifying a definition of the function by distinguishing two cases based on the infeasibility of the current iterate:
where 0≤β≤1 and the coefficients qa, qb, qc are given by
In an illustrative embodiment, β=0.9. Whenever a “large” infeasibility is detected using ci(x)<−βμ, the quadratic extrapolation term for the corresponding constraint may be used. In this manner, quadratic approximations may be seen as a penalty rather than a barrier. In other words, if the ith constraint function becomes increasingly infeasible, the constraint is penalized using a quadratic term. Let and Q be two non-intersecting index sets denoting the set of constraints that satisfy ci(x)≥−βμ and ci(x)<−βμ, respectively. The linear system J(wk,h,μk)d=−r(wk,h,λk,μk) has the following form
where ΛB=diag(λB). During the inner iterations, the following merit function is applied
where ν is a second scaling parameter.
Methods for solving nonlinear optimization problems may be classified into two classes: primal and primal-dual. Using the substitution
is a starting point for introducing a primal-dual framework in the context of the MLB method. Nonlinear optimization application 122 implements the MLB method in the primal-dual space. The control of the iterates is done in both primal and dual spaces during the entire minimization process, including the globalization phase corresponding to an inner algorithm. A new primal-dual merit function is used, and an estimate of the Lagrange multipliers is updated depending on a progress made by the iterates towards a complementarity slackness.
Referring to
In operation 200, a first indicator of an objective function ƒ(xi), i=1, . . . , n may be received, where n is a number of decision variables the values of which are to be computed by optimization based on minimization or maximization. In an alternative embodiment, the first indicator may not be received. For example, the objective function may be stored, for example, in computer-readable medium 108 and used automatically. The objective function is defined by a nonlinear equation that is not defined by a line when graphed.
In an operation 202, a second indicator may be received that indicates a set of constraint functions ci(x), i=1, . . . , m, where m is a number of constraint functions. Each constraint function may be an equation that defines an equality (=) or an inequality (<, >, ≤, ≥) with constant values and one or more of the decision variables. At least one of the constraint functions includes an inequality. In an alternative embodiment, the second indicator may not be received. For example, the set of constraint functions may be stored, for example, in computer-readable medium 108 and used automatically.
In an operation 204, a third indicator may be received that indicates a solution starting point w0=(x0,z0) for searching for the optimum solution using the primal x and dual z, where x0 is a primal vector x0,i, i=1, . . . , n, and z0 is a dual vector z0,i, i=1, . . . , m, where n≥1 and m≥1. In an alternative embodiment, the third indicator may not be received or selectable. For example, the starting point w0=(x0,z0) may be stored, for example, in computer-readable medium 108 and used automatically. For illustration, the primal vector is initialized to all zeroes and the dual vector is initialized to all ones.
In an operation 206, a fourth indicator may be received that indicates an initial barrier parameter μ0>0. In an alternative embodiment, the fourth indicator may not be received. For example, the initial barrier parameter μ0 may be stored, for example, in computer-readable medium 108 and used automatically. For illustration, μ0=max(−ci(x0)+0.1, 0.1), which guarantees that
to ensure that the barrier term is well defined, where e is a vector of all ones.
In an operation 208, a fifth indicator may be received that indicates an initial Lagrange multiplier value for each constraint function λ0,i>0, i=1, . . . , m, where λ0 is a Lagrange multiplier vector. In an alternative embodiment, the fifth indicator may not be received. For example, the initial Lagrange multiplier value for each constraint λ0,i, i=1, . . . , m may be stored, for example, in computer-readable medium 108 and used automatically. For illustration, λ0,i=1.
In an operation 210, a sixth indicator may be received that indicates a value for a convergence parameter a. In an alternative embodiment, the sixth indicator may not be received. For example, the value for the convergence parameter a may be stored, for example, in computer-readable medium 108 and used automatically. The convergence parameter a should be chosen such that a e (0,1), which guarantees that a right hand-side of equation (3) converges to zero. For illustration, α=0.9.
In an operation 212, a seventh indicator may be received that indicates a value for a barrier reduction parameter ρ, where ρϵ(0,1). In an alternative embodiment, the seventh indicator may not be received. For example, the value for the barrier reduction parameter ρ may be stored, for example, in computer-readable medium 108 and used automatically. For illustration, ρ=0.2.
In an operation 214, an eighth indicator may be received that indicates a predefined integer value , where ϵ. In an alternative embodiment, the eighth indicator may not be received. For example, the predefined integer value may be stored, for example, in computer-readable medium 108 and used automatically. The predefined integer value is ≥1 and is used to define a number of last iterations used to satisfy a condition for a non-monotonic decrease of the complementarity slackness to update the barrier parameter p and the estimate of the Lagrange multipliers λ. For illustration, =2.
In an operation 216, a ninth indicator of a maximum number of iterations Imax may be received. In an alternative embodiment, the ninth indicator may not be received. For example, a default value may be stored, for example, in computer-readable medium 108 and used automatically. In another alternative embodiment, the value of the maximum number of iterations Imax may not be selectable. Instead, a fixed, predefined value may be used. For illustration, a default value of the maximum number of iterations Imax may be 250 though other values may be used. The maximum number of iterations Imax indicates how many iterations are performed as part of solving the optimization problem. In an alternative embodiment, a maximum computing time may be specified in addition to or instead of the maximum number of iterations Imax and used in a similar manner to stop the optimization process when the maximum computing time is reached.
In an operation 220, a tenth indicator may be received that indicates a value for a first scaling parameter η, where ηϵ(0,1). In an alternative embodiment, the tenth indicator may not be received. For example, the value for the first scaling parameter η may be stored, for example, in computer-readable medium 108 and used automatically. For illustration, η=10−4.
In an operation 222, an eleventh indicator may be received that indicates a value for a second scaling parameter ν, where ν≥0. In an alternative embodiment, the eleventh indicator may not be received. For example, the value for the second scaling parameter ν may be stored, for example, in computer-readable medium 108 and used automatically. For illustration, ν=10−4.
In an operation 224, a twelfth indicator may be received that indicates a value for whether the quadratic penalty is to be applied. For example, a quadratic penalty application may be set to zero or one, or alternatively, true or false, to indicate whether the quadratic penalty is to be applied. In an alternative embodiment, the twelfth indicator may not be received. For example, the quadratic penalty application may be stored, for example, in computer-readable medium 108 and used automatically. For illustration, the quadratic penalty application may be defined as one or true by default.
In an operation 226, a thirteenth indicator may be received that indicates a value for a feasibility parameter β, where 0≤β≤1, when the quadratic penalty application is selected, for example, indicates true. In an alternative embodiment, the thirteenth indicator may not be received. For example, the value for feasibility parameter β may be stored, for example, in computer-readable medium 108 and used automatically. For illustration, β=0.9. The feasibility parameter β defines how close to the singularities the MLB function is extrapolated.
In an operation 228, an index i0 and an index j are initialized, for example, as i0=0 and j=0.
In an operation 230, an iteration index k is initialized, for example, as k=0.
In an operation 232, an optimality check is performed using a current solution wk=(xk,zk), where the current decision variable vector x and dual variable vector z are indicated by the iteration index k. For example, the optimality check is performed using
as defined above.
In operation 234, a determination is made concerning whether the current solution wk is an optimum solution based on the optimality check or whether k≥Imax. When wk is an optimum solution or k≥Imax, processing continues in an operation 236. When wk is not an optimum solution and k<Imax, processing continues in an operation 240 to perform another iteration.
In operation 236, the current solution vector xk is output as the optimized solution that includes a value for each decision variable xk,i, i=1, . . . n. For example, the current solution vector xk may be output to solution description 124 and/or to display 116. Additional information regarding the optimization process further may be output such as the current objective function value ƒ(xk).
Referring to
In an operation 242, a complementarity slackness test is performed using the current solution wk=(xk,zk). For example, the complementarity slackness test is performed using ∥C(xk)zk∥≤α∥C(xk−1)zk−1∥ or ∥C(xk)zk∥≤α max {∥C(xi
In operation 244, a determination is made concerning whether ∥C(xk)zk∥≤α max{∥C(xi
In operation 246, the next barrier parameter value is defined or updated, for example, using μk+1=0.5μk such that μk+1≤μk. As another example, μk+1=qμk, where 0<q≤1 and q may be defined by a user or using a default value. An illustrative value for q may be q=0.5 or q=0.2 though other values may be used.
In an operation 248, the next Lagrange multiplier values are defined or updated, for example, using λk+1,h=zk,h, h=1, . . . , m.
In an operation 250, the index ik and the index j are updated, for example, using ik+1=k and j=k, and processing continues in an operation 258.
In operation 252, the next barrier parameter value is defined or updated, for example, using μk+1=ρμk.
In an operation 254, the next Lagrange multiplier values are defined or updated, for example, using λk+1,h=λk,h, h=1, . . . , m.
In an operation 256, the index ik and the index j are updated, for example, using ik+1=ik and j=k, and processing continues in an operation 258.
In operation 258, a search direction vector d is computed as
for each next solution value by solving the primal-dual linear system J(wk,μk+1)d=−r(wk,λk+1,μk+1), where
and search direction vector d is computed using a regularized Newton method applied to the KKT conditions of the initial problem. For example, the linear system is solved by applying an LDL factorization as described in Chapter 4 of a book by Golub, G. H., and C. F. Van Loan, Matrix Computations, 3rd ed. (Johns Hopkins University Press 1996).
In an operation 260, step lengths are computed as αk,gx
as discussed above relative to equations (5) and (6). A fraction is applied to the boundary rule to compute αkx and αkz that satisfy equations (5) and (6). For illustration,
The constraints are first linearized to compute an initial estimate of a primal step length αkx. For example, an initial estimate is computed using
If the initial estimate is not acceptable because
a backtracking method is applied to iterate to a solution for the primal step length αkx such that
For the dual step length αkz, a unit step length is tried as an initial estimate of the dual step length αkz=1. If the initial estimate of the dual step length αkz is not acceptable because zk+αkzdkz<0.005zk, a backtracking method is applied to iterate to a solution for the dual step length αkz such that zk+αkzdkz≥0.005zk.
In an operation 262, a next solution ŵk=({circumflex over (x)}k,{circumflex over (z)}k) is determined using {circumflex over (x)}k=xk,g+αk,gx
In an operation 264, a value εk is determined, for example, using εk=0.9 max{∥r(wj,λj,μj)∥:(k−4)+≤j≤k}+10μk+1, where
A maximum of the norm of the vector ∥(wj,zj,μj)∥ over a last five iterations (iterations k, k−1, k−2, k−3, k−4) is chosen where r(wj,λj,μj) was already computed and stored above.
In an operation 266, an iteration check is performed using the next solution ŵk=({circumflex over (x)}k,{circumflex over (z)}k), and processing continues in operation 268. For example, the iteration check is performed using ∥r(ŵk, λk+1, μk+1)∥≤εk as described relative to equation (4) above, where
Referring to
In operation 270, the next solution is updated using index wk+1=ŵk.
In an operation 272, the iteration index k is incremented, for example, using k=k+1, and processing continues in operation 232 to determine if wK+1 is the optimal solution.
In operation 274, an inner iteration index h is initialized using h=0, wk,h=ŵk, λ=λk+1, and μ=μk+1.
Similar to operation 258, in an operation 276, a search direction vector d is computed as d=(dx
In an operation 278, α is defined using
In an operation 280, step lengths are computed based on the value of the quadratic penalty application. For example, referring to
In an operation 300, a determination is made concerning the value of the quadratic penalty application. When the quadratic penalty application is zero or false, processing continues in an operation 302. When the quadratic penalty application is one or true, processing continues in an operation 304.
Similar to operation 260, in operation 302, step lengths are computed as {tilde over (α)}k,hw
as discussed above relative to equations (5) and (6) and processing is done for the quadratic penalty application and continues in an operation 282. A fraction is again applied to the boundary rule to compute {tilde over (α)}kx and {tilde over (α)}kz that satisfy equations (5) and (6). For illustration,
In operation 304, a determination is made concerning whether ci(xk,h)<−βμ. When ci(xk,h)<−βμ, processing continues in an operation 306. When ci(xk,h)≥−βμ, processing continues in operation 302.
In operation 306, quadratic penalty coefficient values are computed, for example, using coefficients qa, qb, qc defined by
In an operation 308, step lengths are computed as {tilde over (α)}k,hw
and processing is done for the quadratic penalty application and continues in operation 282.
is applied to make sure that the dual variable z remains positive. Once the dual step length {tilde over (α)}k,hz
Referring again to
In an operation 284, a determination is made concerning whether another backtracking step is needed. When another backtracking step is needed,
When another backtracking step is not needed,
When another backtracking step is needed, processing continues in an operation 286. When another backtracking step is not needed, processing continues in an operation 288.
In operation 286,
In operation 288, αk,h=α for each next solution value.
In an operation 290, a next solution wk,h+1=(xk,h,zk,h) is determined using wk,h+1=wk,h+αk,hdw
In an operation 292, a determination is made concerning whether ∥r(wk,h+1,λ, μ)∥≥εk, When ∥r(wk,h+1,λ, μ)∥≤εk, processing continues in an operation 296 because additional inner iterations are not needed. When ∥r(wk,h+1,λ, μ)∥>εk, processing continues in an operation 294 to perform additional inner iterations.
In operation 294, the inner iteration index h is incremented, for example, using h=h+1, and processing continues in operation 276 to perform another inner iteration.
In operation 296, the next solution is updated using index wk+1=wk,h+1.
In an operation 298, the iteration index k is incremented, for example, using k=k+1, and processing continues in operation 232 to determine if wk+1 is the optimal solution.
Nonlinear optimization application 122 was developed and executed in the C programming language without quadratic penalty application (indicated as 122) and with quadratic penalty application (indicated as 122 with Q). The models were written using the OPTMODEL Procedure developed and provided by SAS Institute Inc. of Cary, N.C., USA as part of SAS OR®. The numerical solutions were obtained using a 64-bit 2.50 gigahertz Intel® Xeon® CPU E7-4880 machine with 8 processors and 128 GB RAM.
A first set of problems includes two problems that naturally arise in astrophysics, engineering, and biology. The Lane-Emden problem is defined as Δu+u3=0. The variational problem associated with this problem is
where D is a unit square. The singularly perturbed Dirichlet problem defined as ε2Δu+u3−u=0. The variational problem associated with this problem is
where D is a unit circle. A solution for the Lane-Emden and Dirichlet problems can be obtained by discretizing the corresponding variational problems using difference approximations based on a triangularization of the corresponding domain D. Using this, an approximate solution of the two problems is found by solving the following inequality optimization problem:
Minimize w, subject to g(uk)≤h, 1≤k≤m and |uk−uk+1|≤h, 0≤k≤m, where h>0 is a fixed tolerance, and m is a number of breakpoints used by an elastic string algorithm. Note that for the first set of inequality constraints, the function g is replaced with gLE or gD. For each problem, the results were computed using m=10 and m=40.
Table 1 below summarizes the performance comparison between nonlinear optimization application 122 without quadratic penalty application and with quadratic penalty application in terms of computing time on the first set of problems with m=10 (_10) and m=40 (_40).
The results show that nonlinear optimization application 122 with quadratic penalty application provides an improved efficiency relative to nonlinear optimization application 122 without quadratic penalty application. This improvement results based on the number of function evaluations required. For the first set of problems, the number of function evaluations required by nonlinear optimization application 122 with quadratic penalty application was much less than that required by nonlinear optimization application 122 without quadratic penalty application because nonlinear optimization application 122 without quadratic penalty application has to perform backtracking to satisfy the inequality
whereas, when the inequality
is not satisfied using nonlinear optimization application 122 with quadratic penalty application, the quadratic term is used instead. As a result, there is no backtracking required, and a unit step-length could be chosen.
A primal-dual CLB method was also developed and executed in the C programming language, for example, based on methods described in a paper by Silva, R. et al., Local analysis of the feasible primal-dual interior-point method, Computational Optimization and Applications, Volume 40, Issue 1, pp. 41-57 (May 2008). Again, the model was written using the OPTMODEL Procedure. Using the primal-dual CLB method, a strictly feasible initial point was assumed with respect to the inequality constraint. The computation time results are summarized in Table 2 below.
Table 2 confirms the superiority of nonlinear optimization application 122 with quadratic penalty application over the classical primal-dual CLB method. Again, the improvement results based on the number of function evaluations required. For example, for the problem Lane_Emden_10, the primal-dual CLB method required 32,761 function evaluations while nonlinear optimization application 122 with quadratic penalty application only required 24, which had a significant impact on the computing time that made nonlinear optimization application 122 with quadratic penalty application 84 times faster when solving problem Lane_Emden_10.
A first general-purpose nonlinear optimization solver described in a paper by A. Wachter and L. T. Biegler, On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming, Mathematical Programming, Volume 106, Issue 1, Series A, pp. 25-57 (Mar. 2006) (GPNO1) was developed and executed in the C programming language. A second general-purpose nonlinear optimization solver described in a paper by P. Armand and R. Omheni, A globally and quadraticaiiy convergent primal-dual augmented iagrangian algorithm for equality constrained optimization, Optimization Methods and Software, Volume 32, Issue 1, pp. 1-21 (2017) (GPNO2) was developed and executed in the C programming language. Again, the models were written using the OPTMODEL Procedure. Both GPNO1 and GPNO2 are well-established nonlinear optimization problem solvers. Both GPNO1 and GPNO2 convert the inequality constraints to equality constraints using slack variables. GPNO1 is understood to be the best open-source general-purpose nonlinear optimization solver. GPNO2 applies a primal-dual augmented Lagrangian interior-point algorithm while GPNO1 implements a line-search filter method. Both GPNO1 and GPNO2 were run using their default options except for the scaling procedure. All executions were done without scaling to have a similar execution stop condition.
The performance was compared using a class of degenerate problems created by manipulating the OPTMODEL models and adding the constraint −c1(x)2≤0. As a result, the problems are appended with “_d” to indicate this addition. This transformation does not have any impact on the feasible region but makes every problem degenerate since the Mangasarian-Fromovitz Constraint Qualification is not satisfied at all solution points. This transformation makes the problem harder to solve. The computation time results are summarized in Table 3 below.
For the four different problems, nonlinear optimization application 122 with quadratic penalty application clearly outperformed GPNO1 and GPNO1 in terms of computation time. Using nonlinear optimization application 122 with quadratic penalty application provides the improvement by extrapolating the modified logarithmic term with quadratic approximations.
A second set of computational experiments explores the behavior when solving a product portfolio problem that arises in industry. The goal of this problem is to define a portfolio strategy for a laundry powder business. More precisely, the portfolio variables (additive levels, intermediate compositions, intermediate levels, intermediate assignments) that minimize a total cost are determined. Each product uses one intermediate composition and a plurality of additive compositions.
The problem can be reformulated as a nonlinear optimization with equality and inequality constraints. The objective function consists of minimizing the portfolio cost, including raw material and some manufacturing and shipment costs across all products. Equality constraints are added to make sure that a sum of a product ingredient fractions is equal to one. The inequality constraints are added to ensure that product quality predictions and product composition variables remain within specified ranges. Two sets of data were used as summarized in Table 4 below.
The computation time results are summarized in Table 5 below.
From the results presented in Table 5, nonlinear optimization application 122 with quadratic penalty application is faster than nonlinear optimization application 122 without quadratic penalty application. For both datasets, nonlinear optimization application 122 with quadratic extrapolation is over twice as fast as nonlinear optimization application 122 without quadratic penalty application. Table 4 also shows that nonlinear optimization application 122 with quadratic extrapolation is faster in comparison to the nonlinear optimization problem solver GPNO1.
The word “illustrative” is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “illustrative” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Further, for the purposes of this disclosure and unless otherwise specified, “a” or “an” means “one or more”. Still further, using “and” or “or” in the detailed description is intended to include “and/or” unless specifically indicated otherwise. The illustrative embodiments may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer to implement the disclosed embodiments.
The foregoing description of illustrative embodiments of the disclosed subject matter has been presented for purposes of illustration and of description. It is not intended to be exhaustive or to limit the disclosed subject matter to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the disclosed subject matter. The embodiments were chosen and described in order to explain the principles of the disclosed subject matter and as practical applications of the disclosed subject matter to enable one skilled in the art to utilize the disclosed subject matter in various embodiments and with various modifications as suited to the particular use contemplated.
The present application claims the benefit of and priority to 35 U.S.C. § 119(e) to U.S. Provisional Patent Application No. 63/002,315 filed Mar. 30, 2020, the entire contents of which are hereby incorporated by reference.
Entry |
---|
Yamashita, Hiroshi. “A globally convergent primal-dual interior point method for constrained optimization.” Optimization Methods and Software 10.2 (1998): 443-469. (Year: 1998). |
Gill, Philip E., and Daniel P. Robinson. “A primal-dual augmented Lagrangian.” Computational Optimization and Applications 51.1 (2012): 1-25. (Year: 2012). |
Lei, Jinlong, Han-Fu Chen, and Hai-Tao Fang. “Primal-dual algorithm for distributed constrained optimization.” Systems & Control Letters 96 (2016): 110-117. (Year: 2016). |
R. Polyak. Modied barrier functions (theory and methods). Math. Programming, 54(2, Ser. A):177-222, 1992. |
P. Armand and R. Omheni. A globally and quadratically convergent primal-dual augmented lagrangian algorithm for equality constrained optimization. Optimization Methods and Software, 32(1):1{21, 2017. |
P. Armand and R. Omheni. A mixed logarithmic barrier-augmented lagrangian method for nonlinear optimization. J. Optim. Theory Appl., 173(2):523{547, May 2017. |
P. Armand, J. Benoist, R. Omheni, and Vincent Pateloup. Study of a primal-dual algorithm for equality constrained minimization. Computational Optimization and Applications, 59(3):405{433, 2014. |
A. Ben-Tal and M. Zibulevsky. Penalty/barrier multiplier methods for convex programming problems. SIAM Journal on Optimization, 7(2):347{366, 1997. |
M. G. Breitfeld and D. F. Shanno. Computational experience with penalty-barrier methods for nonlinear programming. Ann. Oper. Res., 62:439{463, 1996. Interior point methods in mathematical programming. |
G. Nash, R. Polyak, and A. Sofer. A numerical comparison of barrier and modified barrier methods for large-scale bound-constrained optimization. In Large scale optimization (Gainesville, FL, 1993), pp. 319-338. Kluwer Acad. Publ., Dordrecht, 1994. |
R. Silva, J. Soares, and L. N. Vicente. Local analysis of the feasible primal-dual interior-point method. Comput. Optim. Appl., 40(1):41{57, 2008. |
A. R. Conn, Nick Gould, and Ph. L. Toint. A globally convergent lagrangian barrier algorithm for optimization with general inequality constraints and simple bounds. Math. of Computation, 66:261{288, 1992. |
A. Wachter and L. T. Biegler. On the implementation of an interior-point filter linesearch algorithm for large-scale nonlinear programming. Math. Program., 106(1, Ser. A):25{57, 2006. |
Chapter 4 of a book by Golub, G. H., and C. F. Van Loan, Matrix Computations, 3rd ed. (Johns Hopkins University Press 1996). |
G. Breitfeld and D. F. Shanno. Preliminary Computational experience with modified log-barrier Functions for Large-scale Nonlinear Programming, Rutcor Research Report, Jul. 1993. |
Byrd et al., “A trust region method based on interior point techniques for nonlinear programming,” Math. Program., Ser. A 89: 149-185 (2000). |
Dolan et al., “Benchmarking optimization software with performance profiles,” Math. Program., Ser. A 91:201-213 (2002). |
Goldfarb et al., “A Modified Barrier-Augmented Lagrangian Method for Constrained Minimization,” Computational Optimization and Applications 14, 55-74 (1999). |
Griffin et al, “A primal-dual modified log-barrier method for inequality constrained nonlinear optimization,” Optimization Letters, published Mar. 10, 2020. |
W. Murray, “Analytical Expressions for the Eigenvalues ane Eigenvectors of the Hessian Matrices of Barrier and Penalty Functions,” Journal of Optimization Theory and Applications: vol. 7, No. 3, 1971. |
Shanno et al., “Interior-point methods for nonconvex nonlinear programming: orderings and higher-order methods,” Math. Progam., Ser. B 87:303-316 (2000). |
Backtracking line search, Wikipedia, retrieved from https://en.wikipedia.org/w/index.php?title=Backtracking_line_search&oldid=960063023, last edited on May 31, 2020. |
SAS/OR 15.1 User's Guide Mathematical Programming The OPTMODEL, SAS Institute Inc, 2018. |
Griffin et al., “An Efficient Primal-Dual Modified Barrier Method for Nonlinear Optimization with Inequality Constraints,” SAS Institute, Mar. 26, 2020. |
Number | Date | Country | |
---|---|---|---|
63002315 | Mar 2020 | US |