Fast adaptation in real-time systems

Description

BACKGROUND

The present invention relates to optimization problems, and more specifically, to techniques for minimum regret learning in online convex optimization.

In real-time systems, the costs and other conditions are always changing. The operator of the system has to make decisions continually and the utility from making a decision depends not only on the decision itself but also on the conditions of the system or the environment. For example, the operator's task may be to track a “moving target” in the sense that the target may jump from one point to another and the operator has to aim without knowing exactly where the target is, but only where it previously was. This happens, for example, in inventory systems, where there is an optimal level of inventory in hindsight, but the decision about the inventory level has to be made before the actual demand for the item is known. The “regret” of the operator is the difference between the cost that is incurred as a result of his decision and the optimal cost that could have been incurred using another decision if the conditions had been known. In the prior art, methods have been known which minimize the total regret so that it is proportional to the square root of the total amount of time.

SUMMARY

According to one embodiment of the present invention, a method comprises: performing a step that relies on the selection of x at a time t (x_t), where x is a variable involved with the step; calculating a resulting cost (ƒ_t(x_t)) that results from selecting x_twhen performing the step, where ƒ_tis a cost function; finding a minimum possible cost (ƒ_t(x*_t)) associated with the selection of x*; determining the difference between the resulting cost (ƒ_t(x_t)) and the minimum possible cost (ƒ_t(x*_t)); selecting a direction of movement from x_tto x_t+1; and performing a subsequent step that relies on the section of x_t+1.

According to another embodiment of the present invention, a system is provided for iteratively improving a chosen solution to an online convex optimization problem. The system executing procedures for: selecting x at a time t (x_t), where x is a quantity; calculating a resulting cost (ƒ_t(x_t)) that results from selecting x_twhen performing the step, where ƒ_tis a cost function; finding a minimum possible cost (ƒ_t(x*_t)) associated with the selection of x*; determining the difference between the resulting cost (ƒ_t(x_t)) and the minimum possible cost (ƒ_t(x*_t)); selecting a direction of movement from x_tto x_t+1.

According to another embodiment of the present invention, a computer program product for online convex optimization comprises: a computer usable medium having computer usable program code embodied therewith, the computer usable program code comprising: computer usable program code configured to: perform a step that relies on the selection of x at a time t (x_t), where x is a variable involved with the step; calculate a resulting cost (ƒ_t(x_t)) that results from selecting x_twhen performing the step, where ƒ_tis a cost function; finding a minimum possible cost (ƒ_t(x*_t)) associated with the selection of x*; determining the difference between the resulting cost (ƒ_t(x_t)) and the minimum possible cost (ƒ_t(x*_t)); selecting a direction of movement from x_tto x_t+1; and performing a subsequent step that relies on the section of x_t+1.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

FIG. 1 shows a diagram of a flow-chart for performing online convex optimization in accordance with an embodiment of the invention; and

FIG. 2 shows a high level block diagram of an information processing system useful for implementing one embodiment of the present invention.

DETAILED DESCRIPTION

Embodiments of the invention provide ways to improve the speed of computing online convex optimization problems.

As described above, prior art methods of computing total regret exist so that it is proportional to the square root of the total amount of time. However, these methods required elaborate computation before each decision. More precisely, they required the solution of an optimization problem. In some real-time systems such a computational task is not practical. Hence, there is a need for much faster computational methods. Also, prior-art methods that were relatively fast were limited to a linear cost function and were not “adaptive.”

In embodiments of the invention, the above-discussed problem is formulated so that at time t the operator picks a point x_twithout knowing the current cost function ƒ. The resulting cost is ƒ_t(x_t). The regret after T time stages is equal to the difference between the sum of the incurred costs ƒ₁(x₁)+ . . . +ƒ_T(x_T) and the minimum possible cost with some fixed decision x*, i.e., the sum ƒ₁(x*)+ . . . +ƒ_T(x*). The domain K in n-dimensional space, from which the operator picks x_tis given by a “barrier function”β. Embodiments of the invention choose the direction of movement from x_tto x_t+1. In particular, these embodiments pick this direction as product of the inverse Hessian of the barrier function beta times the gradient of ƒ_t. This yields an algorithm that in each step requires only solving a set of linear equations in dimension n rather an optimization problem over K. The resulting regret is proportional to the square root of T log T, so it is almost optimal.

In particular, embodiments of the invention utilize a new method for regret minimization in online convex optimization. The regret of the algorithm after T time periods is almost the minimum possible. However, in n-dimensional space, during each iteration, the embodiments of the invention essentially solves a system of linear equations of order n, whereas previous techniques had to solve some constrained convex optimization problem in n dimensions and possibly many constraints. Thus, the embodiments of the invention improve running time by a factor of at least the square root of n, and much more for nontrivial domains. These embodiments are also adaptive, in the sense that the regret bounds hold not only for the time periods 1, . . . , T, but also for every sub-interval s, s+1, . . . , t.

Online Convex Optimization

In the Online Convex Optimization problem an adversary picks a sequence of convex functions ƒ_t: K→ custom character t=1, 2, . . . , T, where K⊂ⁿis convex and compact. At stage t, the player has to pick an x_tεK without knowing the function ƒ_t. The player then incurs a cost of ƒ_t(x_t). The setting of this disclosure is that after choosing x_t, the player is informed of the entire function ƒ_tover K. The total cost to the player is Σ_t=1^Tƒ_t(x_t). Online Convex Optimization encompasses, for example, expert algorithms with arbitrary convex loss function and the problem of universal portfolio optimization.

Regret Minimization

Suppose the minimum cost over all possible single choices is attained at some x*=argumin_xεKΣ_t=1^Tƒ_t(x). In this case the regret resulting from the choices (x₁; ƒ₁, . . . , x_T; ƒ_T) is defined as

R=R(x₁;ƒ₁, . . . ,x_T;ƒ_T)≡Σ_t=1^T[ƒ_t(x_t)−ƒ_t(x_t)−ƒ_t(x*)]

The problem of regret minimization calls for choosing the points x₁, . . . , x_Tso as to minimize R, subject to the condition that, when x_t+1is chosen, only x₁, ƒ₁, . . . , x_t, ƒ_tare known. It is known that, in the worst case, the minimum possible regret is Ω(√{square root over (T)}).

Computational Efficiency

In all the previously known algorithms that attain minimum possible regret, in each stage the algorithm must solve some constrained convex optimization problem over K, which can be prohibitive in some practical applications. In particular, if K is a convex polyhedron, the best known worst-case bound on the number of iterations of an optimization algorithm is O(√{square root over (n)}L), where L is the number of bits in the description of K, and each iteration requires solving a linear system of order n. Motivated by this shortcoming of the previous algorithms, embodiments of the present invention utilize a new method for constructing an almost-minimum-regret algorithm, which requires in each stage only solving a system of linear equations of order n, rather than solving an optimization problem over K. Thus, embodiments of the invention improve the running time at least by a factor n, and much more than that when K is more complicated, for example, a convex polyhedron with many facets. In addition, embodiments of the invention are “adaptive” in the sense that its regret is the almost the minimum possible not only over the stages 1, . . . , T but also over every sub-interval of stages s, s+1, . . . , t.

Previous Approaches

There are numerous algorithms for Online Convex Optimization, some of which attain the minimum possible regret of O(√{square root over (T)}) Most of these algorithms can be classified into the following two classes: (i) link-function algorithms, and (ii) regularized follow-the-leader algorithms.

Follow-the-Regularized-Leader Algorithms

The intuitive “Follow-The-Leader” (FTL) algorithm picks for x_t+1a minimizer of the function F_t(x)≡Σ_s=1^tƒ_s(x) over K. It is known that the regret of FTL is not optimal. This fact suggested the more general “Follow-The-Regularized-Leader” (FTRL) algorithm that picks x_t+1as a minimizer of the function F_t(x)+ρ(x) over K, where ρ(x) is a certain function that serves as a “regularizer”. Different variants of the method correspond to different choices of ρ(x). The FTRL approach led to the resolution of some prediction problems, notably the resolution of the value of bandit information in. One advantage of the FTRL approach is its relatively intuitive analysis. On the negative side, FTRL algorithms are known to be “non-adaptive”, in the sense that the regret over a general sub-interval s, s+1, . . . , t may be linear in t−s rather than O(√{square root over (t−s)}) Furthermore, the running time of the algorithm in each stage may not be practical because the algorithm has to solve some optimization problem over K.

The “Link Function” Methodology

In contrast to the intuitive FTRL methodology, which relies on the entire past of history of the play, link-function algorithms use less information and proceed “incrementally.” Perhaps the easiest algorithm to describe is Linkevich's online gradient descent, which picks x_t+1to be the orthogonal projection of the point y_t+1≡x_t+η∇ƒ_t(x_t) into K. Of course, x_t+1is the point in K nearest y_t+1, hence its computation can be costly, for example, if K has many facets. On the other hand, link-function algorithms are adaptive (in the sense explained above) and are usually more efficiently computable than FTRL algorithms in case projections turn out to be easy. However, link-function algorithms tend to be harder to analyze.

Merging the Two Approaches

An important aspect of the embodiments of the invention is to follow the incremental-update approach, but make sure it never requires projections from the exterior of K into K (hence the name “interior point”). This is accomplished by moving from x_tto x_t+1in a direction that is obtained from the gradient of ∇ƒ_t(x_t) by a linear transformation (like in Newton's method), which depends on K and x_t. The assumption is that K is specified by means of a self-concordant barrier function (see below). This particular concept was introduced to learning theory, which used these barriers as regularizers. Embodiments of the invention can be interpreted as using the barrier function as a link function rather than a regularizes.

Embodiments of the invention teach the design and analysis of a new method for online convex optimization. The regret of the algorithm is almost the minimum possible. It is adaptive and requires only to solve one system of linear equations of order n per stage. In comparison to previous work, prior minimum-regret algorithms require, in the worst case, to solve a complete optimization problem each iteration. Also, they are generally not adaptive, they works only on linear cost functions rather than the general setting, and they require the computation of the so-called analytic center of K for the starting point x₁, which requires to solving a nontrivial optimization problem.

Preliminaries—Self-Concordant Barrier

We assume that K is given by means of a barrier function β: int K→ custom character , i.e., for every sequence {x^k}_k=1^∞ int K that converges to the boundary of K, the sequence tends to infinity. We further assume that for some >0, β(x) is a self-concordant barrier, i.e., it is thrice differentiable and for every xεint K and every hεRⁿ, the function {tilde over (ƒ)}(t)≡β(x+th) satisfies (i) |{tilde over (ƒ)}′″(0)|≦2[{tilde over (ƒ)}′(0)]^3/2(i.e., f is a self-concordant function), and also (ii) [{tilde over (ƒ)}′(0)]²≦ custom character ·{tilde over (ƒ)}″(0). It follows that β(x) is strictly convex. For example, for Aε^m×nand bε^m, the function β(x)=−Σ_i=1^mln [(Ax)_i−b_i], (defined for x such that Ax>b is an m-self-concordant barrier for a polyhedron {xεⁿ|Ax+b}.

The Dikin Ellipsoid

For every vε custom character ⁿand Aε^n×n, denote ∥v∥_A≡√{square root over (v^TAv)}. For every hεⁿ, denote ∥h∥_x≡√{square root over (h^T[∇²β(x)]h)}. The open Dikin ellipsoid of radius r centered at x, denoted by W_r(x), is the set of all y=x+hεK such that ∥h∥_x²≡h^T[∇²β(x)]h<r².

Below we use the following known facts about the Dikin ellipsoid and self-concordant functions:

Proposition 1. For every xεK, W₁(x)∪int K.

The next proposition provides “bounds” on the Hessian ∇²β(x+h) of β(x+h) within Dikin's ellipsoid. For A, Bε custom character ^m×n, the notation AB means that A−B is positive semi-definite.

Proposition 2. For every h such that ∥h∥_x<1,

(1−∥h∥_x)²∇²β(x) custom character ∇²β(x+h)(1−∥h∥_x)⁻²∇²β(x) (1)

We denote the diameter of K by Δ.

Proposition 3. If (i) β(x) is a barrier function for K, and (ii) β(x) is self-concordant, then for every xεint K all the eigenvalues of ∇²β(x) are greater than or equal to

$\frac{1}{Δ^{2}} .$

Corollary 4. For every xεint K, all the eigenvalues of [∇²β(x)]⁻¹are less than or equal to Δ².

Method of the Embodiments and Regret Bounds

We assume in this section that when the player has to pick the next point x_t+1, the player recalls x_tand knows ∇ƒ_t(x_t) and ∇²β(x_t). Interior-point algorithms for optimization typically utilize the Newton direction, which in the case of minimizing a function of the form F_μ(x)≡ƒ(x)−μ·β(x), while at a point x, would be n=−[∇²F_μ(x)]⁻¹∇(F_μ)(x). However, for minimum regret online optimization, it turns out that the following direction is useful:

n_t=−[∇²β(x_t)]⁻¹∇(ƒ_t)(x_t)

i.e., the gradient factor is determined by the previous objective function ƒ_t, while the Hessian factor is determined by the barrier function β. Thus, when the method of the invention is used, the player picks x_t+1=x_t+η n_twhere 0<η<1 is a scalar whose value depends on T; it tends to zero as T tends to infinity. Denote g_t=∇(ƒ_t)(x_t) and H_t=∇²β(x_t). Thus, n_t=−H_t⁻¹g_t.

Validity. It can be proven that the algorithm generates only points in K.

Proposition 5. For every t, if x_tεint K and η<(g_t^TH_t⁻¹g_t)^−1/2, then x_t+1εint K.

By corollary 4,

g_t^TH_t⁻¹g_t≦∇²·∥g_t∥² (2)

Thus, we also have

Corollary 6. If

$n < \frac{1}{Δ \cdot  g_{t} },$

then x_t+1εint K.

A Bound on the Gradients

We wish to express our bound on the regret with respect to bounds on the gradients of the functions selected by the adversary. Thus, we denote

G=max{∥∇ƒt(x)∥:xεK,t=1, . . . ,T}

Since the player does not know the function ƒ_tat the time of picking x_t, and that choice depends on G, we simply assume that the adversary is restricted to choosing only functions ƒ such that ∥∇ƒt(x)∥≦G for every xεK. We note that standard techniques can be used, without harming our asymptotic regret bounds, to eliminate the requirement that the algorithm knows an upper bound G a priori.

Proposition 7. For every t, t=1, . . . , T,

ηg_t^T(x_t−x*)≦[∇β(x_t+1)−∇β(x_t)]^T(x_t−x*)+GΔ·(3G²+4GΔ+3Δ²)·η₂. (3)

A Bound Dependent on Bregman Divergence

Bregman divergence. Let x₁, . . . , x_Tdenote the sequence that is generated by the algorithm of this section. Recall that for x, yεint K, the Bregman divergence B_β(x, y) with respect to the barrier β(x) is

B_β(x,y)=β(x)−β(y)−[∇β(y)]^T(x−y).

Regret. Given the functions ƒ₁, . . . , ƒ_Tand the choices x₁, . . . , x_Tfor any x*εK, the regret with respect to x* is defined by

R(x*)≡Σ_t=1^Tƒ_t(x_t)−Σ_t=1^T(x*).
Denote
C(G,Δ)=√{square root over (3GΔ)}·(G+Δ).

Theorem 1. For every x*εK,

R(x*)≦2C(G,Δ)√{square root over (B_β(x*,x₁))}·√{square root over (T)}.

Note that as x* tends to the boundary of K, B_β(x*,x₁) tends to infinity, and hence necessarily so does D. Thus, the regret bound for x* on the boundary of K requires further analysis. This is what we describe below.

The Final Regret Bound

Proposition 8. Let β(·) be a custom character -self-concordant barrier for K, and let xεint K and unit vector uεⁿbe such that u^T∇β(x)>0. Let t_max=t_max(x,u) be defined as

t_max=t_max(x,u)max{t|x+tuεK}≦Δ

Under these conditions,

$u^{T} \nabla β (x) \leq t_{\max} (x, u) .$

For distinct vectors x, yεK, denote

$τ_{\max} (x, y) = t_{\max} (x, \frac{y - x}{ y - x }) .$

Proposition 9. If x, yεint K are distinct, then

$β (y) - β (x) \leq - \ln (1 - \frac{ y - x }{τ_{\max} (x, y)}) \cdot .$

Definition 1. Given the initial point x₁εint K and a real δ>0, the inner subset K(δ; x₁) is defined by

$K (δ; x_{1}) = {y \in K :  y - x_{1}  \leq \frac{1}{1 + δ} \cdot τ_{\max} (x_{1}, y)} .$

Corollary 10. If yεK(δ; x₁), then β(y)−β(x₁)≦ln(1+1/δ)· custom character .

Proposition 11. There exists a constant c such that for every x*εK,

R(x*)≦c·√{square root over (GΔ)}·(G+Δ)√{square root over (T log T)}.

The bound of the latter proposition can be improved by a suitable of choice of units as follows.

Theorem 2. There exists a constant c such that such that for every x* εK,

R(x*)≦c·√{square root over ( custom character )}·GΔ√{square root over (T log T)}.

Generalized News Vendor Problem

In one embodiment of the invention, the above-described methods are applied to the news vendor problem (NVP), which is a classic problem in operations research. In the NVP a seller of newspapers has to order a certain number of copies of the next day's newspaper without knowing exactly how many copies he could sell. The paper becomes worthless if not sold. If the vendor orders too many copies, he loses on the unsold copies. If he orders too few copies, he loses the opportunity to sell more. In the prior art, the problem is solved under an assumed probability distribution of the next day's demand. In this embodiment of the invention a generalization of this problem is employed without any assumptions on the distribution of the demands.

In the present embodiment the NVP may be applied to a situation involving an arbitrary number of perishable commodities. The vendor has to determine at each time t (t=1, 2, . . . ) the order quantities x_tⁱof commodities i=1, . . . , n. The (nonrefundable) total cost of the orders is c¹x_t¹+ . . . +cⁿx_tⁿ. If the vendor later sells the quantities s_t¹, . . . , s_tⁿ, respectively, then he realizes a revenue of r¹, s_t¹+ . . . +rⁿs_tⁿ. However, the vendor does not know the amounts s_t¹, . . . , s_tⁿin advance, except that, necessarily,

0≦s_tⁱ≦x_tⁱ(i=1, . . . ,n)

The vendor has to make these decisions every time period at the end of the period, after having observed the demands d_t¹, . . . , d_tⁿfor the respective commodities during that period.

The amounts x_t¹, . . . , x_tⁿmust also satisfy some constraints. First, x_tⁱ≧0 (i=1, . . . , n). Second, there is a budget constraint

b¹x_t¹+ . . . +bⁿx_tⁿ≦B

Finally, there are also availability constraints x_tⁱ≦aⁱ(i=1, . . . , n).

Loss Functions

We denote x=(x¹, . . . , xⁿ). Given actual demands d_t¹, . . . , d_tⁿ, if the orders are X, then the cost to the vendor is equal to

$f_{t} (x) = \sum_{i} c^{i} x^{i} - \sum_{i} r^{i} \min {x^{i}, d_{t}^{i}} .$

If xⁱ≠d_tⁱ, then

$\begin{matrix} \frac{\partial f_{t} (x)}{\partial x^{i}} = {\begin{matrix} c_{i} - r_{i} & if x^{i} < d_{t}^{i} \\ c_{i} & if x^{i} > d_{t}^{i} \end{matrix} & (3) \end{matrix}$

If xⁱ=d_tⁱ, we define

$\frac{\partial f_{t} (x)}{\partial x^{i}} = c_{i} - r_{i} .$

Denote g_t=(g_t¹, . . . , g_tⁿ) where

$g_{t}^{i} = \frac{\partial f_{t} (x)}{\partial x^{i}} .$

Constraints. The domain of decisions is the set P of all x=(x¹, . . . , xⁿ) in n space such that

$\sum_{i} b^{i} x^{i} \leq B$

0≦xⁱ≦aⁱ(i=1, . . . ,n).

We define the following “barrier function”

β(x)=−log(B−Σ_ibⁱxⁱ)−Σ_ilog xⁱ−Σ_ilog(aⁱ−xⁱ)

for all x that satisfies all the constraints strictly, i.e.,

Σ_ibⁱxⁱ<B
0<x_i<a_i(i=1, . . . ,n).

We have

$\begin{matrix} \frac{\partial β (x_{t})}{\partial x^{i}} = \frac{b^{i}}{B - \sum_{j} b^{j} x_{t}^{j}} - \frac{1}{x_{t}^{i}} + \frac{1}{a^{i} - x_{t}^{i}}, \frac{\partial^{2} β (x_{t})}{{(\partial x^{i})}^{2}} = \frac{{(b^{i})}^{2}}{{(B - \sum_{j} b^{j} x_{t}^{j})}^{2}} + \frac{1}{{(x_{t}^{i})}^{2}} + \frac{1}{{(a^{i} - x_{t}^{i})}^{2}}, & (4) \end{matrix}$

and for k≠i,

$\begin{matrix} \frac{\partial^{2} β (x_{t})}{\partial x^{k} \partial x^{i}} = \frac{b^{k} b^{i}}{{(B - \sum_{j} b^{j} x_{t}^{j})}^{2}} . & (5) \end{matrix}$

Denote by

H_t=((H_t)_ik)

the Hessian matrix, where

$\begin{matrix} {(H_{t})}_{ik} = \frac{\partial^{2} β (x_{t})}{\partial x^{k} \partial x^{i}} & (6) \end{matrix}$

In an embodiment of the invention, a constant η is used such that 0<η<1.

1. At time t, when it is time to choose the vector x_t+1, first calculate the entries of the matrix H_taccording to equations 4-6 above and the vector g_t, the gradient of ƒ_t, as defined in equation 3 above.

2. Let n_tbe the solution of the following systems of linear equations:

H_tn_t=−g_t.

3. The choice of x_t+1is

x_t+1=x_t+ηn_t.

Referring now to FIG. 1 there is shown a flowchart of an online convex optimization method in accordance with an embodiment of the invention. The method 10 includes step 12 which includes taking action at a time t (x_t) that relies on x_t. In step 14, a resulting cost (ƒ_t(x_t)) is calculated for the selection of x_t, where ƒ_tis a cost function. The process then finds a minimum possible cost (ƒ_t(x*_t)) associated with the selection of x*, in step 16. In step 18 the difference between the resulting cost (ƒ_t(x_t)) and the minimum possible cost (ƒ_t(x*_t)) is determined. A direction of movement is selected from x_tto x_t+t, in step 20. In step 22, an action that relies on x_t+1is taken.

As can be seen from the above disclosure, embodiments of the invention provide techniques for online convex optimization. As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.

Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction running system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction running system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may run entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).

Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which run via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which run on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be run substantially concurrently, or the blocks may sometimes be run in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

FIG. 2 is a high level block diagram showing an information processing system useful for implementing one embodiment of the present invention. The computer system includes one or more processors, such as processor 102. The processor 102 is connected to a communication infrastructure 104 (e.g., a communications bus, cross-over bar, or network). Various software embodiments are described in terms of this exemplary computer system. After reading this description, it will become apparent to a person of ordinary skill in the relevant art(s) how to implement the invention using other computer systems and/or computer architectures.

The computer system can include a display interface 106 that forwards graphics, text, and other data from the communication infrastructure 104 (or from a frame buffer not shown) for display on a display unit 108. The computer system also includes a main memory 110, preferably random access memory (RAM), and may also include a secondary memory 112. The secondary memory 112 may include, for example, a hard disk drive 114 and/or a removable storage drive 116, representing, for example, a floppy disk drive, a magnetic tape drive, or an optical disk drive. The removable storage drive 116 reads from and/or writes to a removable storage unit 118 in a manner well known to those having ordinary skill in the art. Removable storage unit 118 represents, for example, a floppy disk, a compact disc, a magnetic tape, or an optical disk, etc. which is read by and written to by removable storage drive 116. As will be appreciated, the removable storage unit 118 includes a computer readable medium having stored therein computer software and/or data.

In alternative embodiments, the secondary memory 112 may include other similar means for allowing computer programs or other instructions to be loaded into the computer system. Such means may include, for example, a removable storage unit 120 and an interface 122. Examples of such means may include a program package and package interface (such as that found in video game devices), a removable memory chip (such as an EPROM, or PROM) and associated socket, and other removable storage units 120 and interfaces 122 which allow software and data to be transferred from the removable storage unit 120 to the computer system.

The computer system may also include a communications interface 124. Communications interface 124 allows software and data to be transferred between the computer system and external devices. Examples of communications interface 124 may include a modem, a network interface (such as an Ethernet card), a communications port, or a PCMCIA slot and card, etc. Software and data transferred via communications interface 124 are in the form of signals which may be, for example, electronic, electromagnetic, optical, or other signals capable of being received by communications interface 124. These signals are provided to communications interface 124 via a communications path (i.e., channel) 126. This communications path 126 carries signals and may be implemented using wire or cable, fiber optics, a phone line, a cellular phone link, an RF link, and/or other communications channels.

In this document, the terms “computer program medium,” “computer usable medium,” and “computer readable medium” are used to generally refer to media such as main memory 110 and secondary memory 112, removable storage drive 116, and a hard disk installed in hard disk drive 114.

Computer programs (also called computer control logic) are stored in main memory 110 and/or secondary memory 112. Computer programs may also be received via communications interface 124. Such computer programs, when run, enable the computer system to perform the features of the present invention as discussed herein. In particular, the computer programs, when run, enable the processor 102 to perform the features of the computer system. Accordingly, such computer programs represent controllers of the computer system.

From the above description, it can be seen that the present invention provides a system, computer program product, and method for implementing the embodiments of the invention. References in the claims to an element in the singular is not intended to mean “one and only” unless explicitly so stated, but rather “one or more.” All structural and functional equivalents to the elements of the above-described exemplary embodiment that are currently known or later come to be known to those of ordinary skill in the art are intended to be encompassed by the present claims. No claim element herein is to be construed under the provisions of 35 U.S.C. section 112, sixth paragraph, unless the element is expressly recited using the phrase “means for” or “step for.”

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A method comprising: performing a step that relies on the selection of x at a time t (xt), where x is a time dependent variable involved with said step;calculating a resulting cost (ƒt(xt)) that results from selecting xt when performing said step, where ƒt is a cost function;finding a minimum possible cost associated with said selection of x*t, wherein x*t is a variable that results in the minimum possible cost ƒt(x*t);determining the difference between the resulting cost (ƒt(xt)) and said minimum possible cost (ƒt(x*t));selecting a direction of movement from xt to xt+1; andperforming a subsequent step that relies on said section of xt+1,wherein said selecting a direction of movement further comprises selecting a direction that is a function of a product of the inverse Hessian of B and the gradient of said cost ƒt.
2. The method of claim 1 wherein said performing a step that relies on the selection of x at a time t (xt) further comprises selecting xt from a barrier function B.
3. The method of claim 2 wherein said barrier function B defines a domain K in n-dimensional space.
4. The method of claim 3 wherein said selection a direction of movement further comprises solving a system of linear equations of order n, where n is the dimensionality of domain K from which x is selected.
5. The method of claim 1 further comprising performing a series of T stages, each stage comprising the performance of each of said above steps at different times using a different x.
6. The method of claim 5 wherein after T stages, said difference between the sum of the incurred costs (ƒ1(x1)+ . . . +ƒT(xT)) and the minimum possible cost with some fixed decision x*t(ƒt(x*t)+ . . . +ƒT(x*t) is the regret R, wherein R is proportional to √ {right arrow over (T log T)}.
7. The method of claim 1 wherein said performing a step that results from the selection of x at a time t (xt) is performed without advanced knowledge of said resulting cost (ƒt(xt)).
8. The method of claim 1 wherein x is a quantity of a product.
9. The method of claim 8 wherein said performing a step further comprises ordering a quantity x of said product.
10. A computer system having a processor for iteratively improving a chosen solution to an online convex optimization problem, said computer system executing procedures for: selecting x at a time t (xt), where x is a quantity;calculating, by the processor, a resulting cost (ƒt(xt)) that results from selecting xt when performing said step, where ƒt is a cost function;finding a minimum possible cost (ƒt(x*t)) associated with said selection of x*t;determining the difference between said resulting cost (ƒt(xt)) and said minimum possible cost (ƒt(x*t));selecting a direction of movement from xt to xt+1,wherein said selecting a direction of movement further comprises selecting a direction that is a function of a product of the inverse Hessian of B and the gradient of said cost ƒt.
11. The system of claim 10 wherein said selecting x at a time t (xt) further comprises selecting xt from a barrier function B.
12. The system of claim 11 wherein said barrier function B defines a domain K in n-dimensional space.
13. The system of claim 12 wherein said selection a direction of movement further comprises solving a system of linear equations of order n, where n is the dimensionality of domain K from which x is selected.
14. A computer program product for online convex optimization, said computer program product comprising: a non-transitory computer readable storage medium having computer readable program code embodied therewith, said computer readable program code comprising:computer readable program code configured to:performing a step that relies on the selection of x at a time t (xt), where x is a variable involved with said step;calculating a resulting cost (ƒt(xt)) that results from selecting xt when performing said step, where ƒt is a cost function;finding a minimum possible cost (ƒt(x*t)) associated with said selection of x*t;determining the difference between said resulting cost (ƒt(xt)) and said minimum possible cost (ƒt(x*t));selecting a direction of movement from xt to xt+1; andperforming a subsequent step that relies on the section of xt+1,wherein said selecting a direction of movement further comprises selecting a direction that is a function of a product of the inverse Hessian of B and the gradient of said cost ƒt.
15. The computer program product of claim 14 further comprising performing a series of T stages, each stage comprising the performance of each of said above steps at different times using a different x.
16. The computer program product of claim 15 wherein after T stages, said difference between the sum of the incurred costs (ƒ1(x1)+ . . . +ƒT(xT)) and the minimum possible cost with some fixed decision x*t(ƒt(x*t)+ . . . +fT(x*t) is the regret R, wherein R is proportional to √ {right arrow over (T log T)}.
17. The computer program product of claim 14 wherein said performing a step that results from the selection of x at a time t (xt) is performed without advanced knowledge of said resulting cost (ƒt(xt)).
18. The computer program product of claim 14 wherein x is a quantity of a product.

US Referenced Citations (10)

Number	Name	Date	Kind
7184992	Polyak et al.	Feb 2007	B1
7216004	Kohn et al.	May 2007	B2
20050102044	Kohn et al.	May 2005	A1
20050257178	Daems et al.	Nov 2005	A1
20060112049	Mehrotra et al.	May 2006	A1
20080134193	Corley et al.	Jun 2008	A1
20080279434	Cassill	Nov 2008	A1
20080304516	Feng et al.	Dec 2008	A1
20080306891	Hazan et al.	Dec 2008	A1
20090280856	Ohwatari et al.	Nov 2009	A1

Non-Patent Literature Citations (15)

Entry
Flaxman et al., “Online convex optimization in the bandit setting: Gradient descent without a gradient,” Symposium on Discrete Algorithms, Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, Vancouver, British Columbia, SESSION: Session 4C pp. 385-394, 2005, ISBN:0-89871-585.
Hazan et al., “Logarithmic Regret Algorithms for Online Convex Optimization,” Machine Learning Journal vol. 69, Issue 2-3 Paaes: 169-192 (Dec. 2007).
Zhang et al., “Single and Multi-Period Optimal Inventory Control Models with Rish-Averse Constraints”, European Journal of Operational Research 199, 2009, pp. 420-434.
J. Abernethy, E. Hazan, and A. Rakhlin, Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization, Proceedings of the 21st Annual Conference on Learning Theory (COLT) (R. Servedio and T. Zhang, eds.), Springer, Berlin, 2008, http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-18.html, pp. 264-274.
N. Cesa-Bianchi and G. Lugosi, Prediction, learning, and games, Cambridge University Press, 2006.
T. Cover, Universal portfolios, Math. Finance 1 (1991), 1-19.
A .J. Grove, N. Littlestone, and D. Schuurmans, General convergence results for linear discriminant updates, Machine Learning 43 (2001), No. 3, 173-210.
E. Hazan, A. Kalai, S. Kale, and A. Agarwal, Logarithmic Regret Algorithms for Online Convex Optimization, Learning Theory: 19th Annual Conference on Learning Theory (COLT) (H. U. Simon and G. Lugosi, eds.), Springer, Berlin, 2006, http://www.springerlink.com/content/m1h022028472281v/, pp. 499-513.
A. T. Kalai and S. Vempala, Efficient algorithms for online decision problems, Journal of Computer and System Sciences 71(3) (2005), 291-307.
J. Kivinen and M. K. Warmuth, Relative loss bounds for multidimensional regression problems, Machine Learning 45 (2001), No. 3, 301-329.
N. Littlestone and M. K. Warmuth, The weighted majority algorithm, Proceedings of the 30th Annual Symposium on the Foundations of Computer Science, 1989, pp. 256-266.
A. Nemirovski, Interior point polynomial time methods in convex programming, Tech. Report 8813, Georgia Institute of Technology, 2004, http://www2.isye.gatech.edu/˜ nemirovs/Lect IPM.pdf.
M. A. Zinkevich, Online Convex Programming and Generalized Infinitesimal Gradient Ascent, Proceedings of the 20th International Conference on Machine Learning (ICML) (T. Fawcett and N. Mishra, eds.), 2003, pp. 928-936.
Hazan et al., “Logarithmic Regret Algorithms for Online Convex Optimization,” Machine Learning Journal vol. 69 , Issue 2-3 pp. 169-192 (Dec. 2007).
Flaxman et al., “Online convex optimization in the bandit setting: Gradient descent without a gradient,” Symposium on Discrete Algorithms, Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, Vancouver, British Columbia, SESSION: Session 4C pp. 385-394, 2005, ISBN:0-89871-585-7.

Related Publications (1)

	Number	Date	Country
	20120005142 A1	Jan 2012	US

Fast adaptation in real-time systems

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

US Classifications

Field of Search

US

International Classifications

Term Extension

Abstract

Description

Claims

US Referenced Citations (10)

Non-Patent Literature Citations (15)

Related Publications (1)