The present invention relates to optimization problems, and more specifically, to techniques for minimum regret learning in online convex optimization.
In real-time systems, the costs and other conditions are always changing. The operator of the system has to make decisions continually and the utility from making a decision depends not only on the decision itself but also on the conditions of the system or the environment. For example, the operator's task may be to track a “moving target” in the sense that the target may jump from one point to another and the operator has to aim without knowing exactly where the target is, but only where it previously was. This happens, for example, in inventory systems, where there is an optimal level of inventory in hindsight, but the decision about the inventory level has to be made before the actual demand for the item is known. The “regret” of the operator is the difference between the cost that is incurred as a result of his decision and the optimal cost that could have been incurred using another decision if the conditions had been known. In the prior art, methods have been known which minimize the total regret so that it is proportional to the square root of the total amount of time.
According to one embodiment of the present invention, a method comprises: performing a step that relies on the selection of x at a time t (xt), where x is a variable involved with the step; calculating a resulting cost (ƒt(xt)) that results from selecting xt when performing the step, where ƒt is a cost function; finding a minimum possible cost (ƒt(x*t)) associated with the selection of x*; determining the difference between the resulting cost (ƒt(xt)) and the minimum possible cost (ƒt(x*t)); selecting a direction of movement from xt to xt+1; and performing a subsequent step that relies on the section of xt+1.
According to another embodiment of the present invention, a system is provided for iteratively improving a chosen solution to an online convex optimization problem. The system executing procedures for: selecting x at a time t (xt), where x is a quantity; calculating a resulting cost (ƒt(xt)) that results from selecting xt when performing the step, where ƒt is a cost function; finding a minimum possible cost (ƒt(x*t)) associated with the selection of x*; determining the difference between the resulting cost (ƒt(xt)) and the minimum possible cost (ƒt(x*t)); selecting a direction of movement from xt to xt+1.
According to another embodiment of the present invention, a computer program product for online convex optimization comprises: a computer usable medium having computer usable program code embodied therewith, the computer usable program code comprising: computer usable program code configured to: perform a step that relies on the selection of x at a time t (xt), where x is a variable involved with the step; calculate a resulting cost (ƒt(xt)) that results from selecting xt when performing the step, where ƒt is a cost function; finding a minimum possible cost (ƒt(x*t)) associated with the selection of x*; determining the difference between the resulting cost (ƒt(xt)) and the minimum possible cost (ƒt(x*t)); selecting a direction of movement from xt to xt+1; and performing a subsequent step that relies on the section of xt+1.
Embodiments of the invention provide ways to improve the speed of computing online convex optimization problems.
As described above, prior art methods of computing total regret exist so that it is proportional to the square root of the total amount of time. However, these methods required elaborate computation before each decision. More precisely, they required the solution of an optimization problem. In some real-time systems such a computational task is not practical. Hence, there is a need for much faster computational methods. Also, prior-art methods that were relatively fast were limited to a linear cost function and were not “adaptive.”
In embodiments of the invention, the above-discussed problem is formulated so that at time t the operator picks a point xt without knowing the current cost function ƒ. The resulting cost is ƒt(xt). The regret after T time stages is equal to the difference between the sum of the incurred costs ƒ1(x1)+ . . . +ƒT(xT) and the minimum possible cost with some fixed decision x*, i.e., the sum ƒ1(x*)+ . . . +ƒT(x*). The domain K in n-dimensional space, from which the operator picks xt is given by a “barrier function”β. Embodiments of the invention choose the direction of movement from xt to xt+1. In particular, these embodiments pick this direction as product of the inverse Hessian of the barrier function beta times the gradient of ƒt. This yields an algorithm that in each step requires only solving a set of linear equations in dimension n rather an optimization problem over K. The resulting regret is proportional to the square root of T log T, so it is almost optimal.
In particular, embodiments of the invention utilize a new method for regret minimization in online convex optimization. The regret of the algorithm after T time periods is almost the minimum possible. However, in n-dimensional space, during each iteration, the embodiments of the invention essentially solves a system of linear equations of order n, whereas previous techniques had to solve some constrained convex optimization problem in n dimensions and possibly many constraints. Thus, the embodiments of the invention improve running time by a factor of at least the square root of n, and much more for nontrivial domains. These embodiments are also adaptive, in the sense that the regret bounds hold not only for the time periods 1, . . . , T, but also for every sub-interval s, s+1, . . . , t.
Online Convex Optimization
In the Online Convex Optimization problem an adversary picks a sequence of convex functions ƒt: K→ t=1, 2, . . . , T, where K⊂n is convex and compact. At stage t, the player has to pick an xtεK without knowing the function ƒt. The player then incurs a cost of ƒt(xt). The setting of this disclosure is that after choosing xt, the player is informed of the entire function ƒt over K. The total cost to the player is Σt=1T ƒt(xt). Online Convex Optimization encompasses, for example, expert algorithms with arbitrary convex loss function and the problem of universal portfolio optimization.
Regret Minimization
Suppose the minimum cost over all possible single choices is attained at some x*=arguminxεK Σt=1T ƒt(x). In this case the regret resulting from the choices (x1; ƒ1, . . . , xT; ƒT) is defined as
R=R(x1;ƒ1, . . . ,xT;ƒT)≡Σt=1T[ƒt(xt)−ƒt(xt)−ƒt(x*)]
The problem of regret minimization calls for choosing the points x1, . . . , xT so as to minimize R, subject to the condition that, when xt+1 is chosen, only x1, ƒ1, . . . , xt, ƒt are known. It is known that, in the worst case, the minimum possible regret is Ω(√{square root over (T)}).
Computational Efficiency
In all the previously known algorithms that attain minimum possible regret, in each stage the algorithm must solve some constrained convex optimization problem over K, which can be prohibitive in some practical applications. In particular, if K is a convex polyhedron, the best known worst-case bound on the number of iterations of an optimization algorithm is O(√{square root over (n)}L), where L is the number of bits in the description of K, and each iteration requires solving a linear system of order n. Motivated by this shortcoming of the previous algorithms, embodiments of the present invention utilize a new method for constructing an almost-minimum-regret algorithm, which requires in each stage only solving a system of linear equations of order n, rather than solving an optimization problem over K. Thus, embodiments of the invention improve the running time at least by a factor n, and much more than that when K is more complicated, for example, a convex polyhedron with many facets. In addition, embodiments of the invention are “adaptive” in the sense that its regret is the almost the minimum possible not only over the stages 1, . . . , T but also over every sub-interval of stages s, s+1, . . . , t.
Previous Approaches
There are numerous algorithms for Online Convex Optimization, some of which attain the minimum possible regret of O(√{square root over (T)}) Most of these algorithms can be classified into the following two classes: (i) link-function algorithms, and (ii) regularized follow-the-leader algorithms.
Follow-the-Regularized-Leader Algorithms
The intuitive “Follow-The-Leader” (FTL) algorithm picks for xt+1 a minimizer of the function Ft(x)≡Σs=1tƒs(x) over K. It is known that the regret of FTL is not optimal. This fact suggested the more general “Follow-The-Regularized-Leader” (FTRL) algorithm that picks xt+1 as a minimizer of the function Ft(x)+ρ(x) over K, where ρ(x) is a certain function that serves as a “regularizer”. Different variants of the method correspond to different choices of ρ(x). The FTRL approach led to the resolution of some prediction problems, notably the resolution of the value of bandit information in. One advantage of the FTRL approach is its relatively intuitive analysis. On the negative side, FTRL algorithms are known to be “non-adaptive”, in the sense that the regret over a general sub-interval s, s+1, . . . , t may be linear in t−s rather than O(√{square root over (t−s)}) Furthermore, the running time of the algorithm in each stage may not be practical because the algorithm has to solve some optimization problem over K.
The “Link Function” Methodology
In contrast to the intuitive FTRL methodology, which relies on the entire past of history of the play, link-function algorithms use less information and proceed “incrementally.” Perhaps the easiest algorithm to describe is Linkevich's online gradient descent, which picks xt+1 to be the orthogonal projection of the point yt+1≡xt+η∇ƒt(xt) into K. Of course, xt+1 is the point in K nearest yt+1, hence its computation can be costly, for example, if K has many facets. On the other hand, link-function algorithms are adaptive (in the sense explained above) and are usually more efficiently computable than FTRL algorithms in case projections turn out to be easy. However, link-function algorithms tend to be harder to analyze.
Merging the Two Approaches
An important aspect of the embodiments of the invention is to follow the incremental-update approach, but make sure it never requires projections from the exterior of K into K (hence the name “interior point”). This is accomplished by moving from xt to xt+1 in a direction that is obtained from the gradient of ∇ƒt(xt) by a linear transformation (like in Newton's method), which depends on K and xt. The assumption is that K is specified by means of a self-concordant barrier function (see below). This particular concept was introduced to learning theory, which used these barriers as regularizers. Embodiments of the invention can be interpreted as using the barrier function as a link function rather than a regularizes.
Embodiments of the invention teach the design and analysis of a new method for online convex optimization. The regret of the algorithm is almost the minimum possible. It is adaptive and requires only to solve one system of linear equations of order n per stage. In comparison to previous work, prior minimum-regret algorithms require, in the worst case, to solve a complete optimization problem each iteration. Also, they are generally not adaptive, they works only on linear cost functions rather than the general setting, and they require the computation of the so-called analytic center of K for the starting point x1, which requires to solving a nontrivial optimization problem.
Preliminaries—Self-Concordant Barrier
We assume that K is given by means of a barrier function β: int K→, i.e., for every sequence {xk}k=1∞ int K that converges to the boundary of K, the sequence tends to infinity. We further assume that for some >0, β(x) is a self-concordant barrier, i.e., it is thrice differentiable and for every xεint K and every hεRn, the function {tilde over (ƒ)}(t)≡β(x+th) satisfies (i) |{tilde over (ƒ)}′″(0)|≦2[{tilde over (ƒ)}′(0)]3/2 (i.e., f is a self-concordant function), and also (ii) [{tilde over (ƒ)}′(0)]2≦·{tilde over (ƒ)}″(0). It follows that β(x) is strictly convex. For example, for Aεm×n and bεm, the function β(x)=−Σi=1m ln [(Ax)i−bi], (defined for x such that Ax>b is an m-self-concordant barrier for a polyhedron {xεn|Ax+b}.
The Dikin Ellipsoid
For every vεn and Aεn×n, denote ∥v∥A≡√{square root over (vTAv)}. For every hεn, denote ∥h∥x≡√{square root over (hT[∇2β(x)]h)}. The open Dikin ellipsoid of radius r centered at x, denoted by Wr(x), is the set of all y=x+hεK such that ∥h∥x2≡hT[∇2β(x)]h<r2.
Below we use the following known facts about the Dikin ellipsoid and self-concordant functions:
Proposition 1. For every xεK, W1(x)∪int K.
The next proposition provides “bounds” on the Hessian ∇2β(x+h) of β(x+h) within Dikin's ellipsoid. For A, Bεm×n, the notation AB means that A−B is positive semi-definite.
Proposition 2. For every h such that ∥h∥x<1,
(1−∥h∥x)2∇2β(x)∇2β(x+h)(1−∥h∥x)−2∇2β(x) (1)
We denote the diameter of K by Δ.
Proposition 3. If (i) β(x) is a barrier function for K, and (ii) β(x) is self-concordant, then for every xεint K all the eigenvalues of ∇2β(x) are greater than or equal to
Corollary 4. For every xεint K, all the eigenvalues of [∇2β(x)]−1 are less than or equal to Δ2.
Method of the Embodiments and Regret Bounds
We assume in this section that when the player has to pick the next point xt+1, the player recalls xt and knows ∇ƒt(xt) and ∇2β(xt). Interior-point algorithms for optimization typically utilize the Newton direction, which in the case of minimizing a function of the form Fμ(x)≡ƒ(x)−μ·β(x), while at a point x, would be n=−[∇2Fμ(x)]−1∇(Fμ)(x). However, for minimum regret online optimization, it turns out that the following direction is useful:
nt=−[∇2β(xt)]−1∇(ƒt)(xt)
i.e., the gradient factor is determined by the previous objective function ƒt, while the Hessian factor is determined by the barrier function β. Thus, when the method of the invention is used, the player picks xt+1=xt+η nt where 0<η<1 is a scalar whose value depends on T; it tends to zero as T tends to infinity. Denote gt=∇(ƒt)(xt) and Ht=∇2β(xt). Thus, nt=−Ht−1gt.
Validity. It can be proven that the algorithm generates only points in K.
Proposition 5. For every t, if xtεint K and η<(gtTHt−1gt)−1/2, then xt+1 εint K.
By corollary 4,
gtTHt−1gt≦∇2·∥gt∥2 (2)
Thus, we also have
Corollary 6. If
then xt+1εint K.
A Bound on the Gradients
We wish to express our bound on the regret with respect to bounds on the gradients of the functions selected by the adversary. Thus, we denote
G=max{∥∇ƒt(x)∥:xεK,t=1, . . . ,T}
Since the player does not know the function ƒt at the time of picking xt, and that choice depends on G, we simply assume that the adversary is restricted to choosing only functions ƒ such that ∥∇ƒt(x)∥≦G for every xεK. We note that standard techniques can be used, without harming our asymptotic regret bounds, to eliminate the requirement that the algorithm knows an upper bound G a priori.
Proposition 7. For every t, t=1, . . . , T,
ηgtT(xt−x*)≦[∇β(xt+1)−∇β(xt)]T(xt−x*)+GΔ·(3G2+4GΔ+3Δ2)·η2. (3)
A Bound Dependent on Bregman Divergence
Bregman divergence. Let x1, . . . , xT denote the sequence that is generated by the algorithm of this section. Recall that for x, yεint K, the Bregman divergence Bβ(x, y) with respect to the barrier β(x) is
Bβ(x,y)=β(x)−β(y)−[∇β(y)]T(x−y).
Regret. Given the functions ƒ1, . . . , ƒT and the choices x1, . . . , xT for any x*εK, the regret with respect to x* is defined by
R(x*)≡Σt=1Tƒt(xt)−Σt=1T(x*).
Denote
C(G,Δ)=√{square root over (3GΔ)}·(G+Δ).
Theorem 1. For every x*εK,
R(x*)≦2C(G,Δ)√{square root over (Bβ(x*,x1))}·√{square root over (T)}.
Note that as x* tends to the boundary of K, Bβ(x*,x1) tends to infinity, and hence necessarily so does D. Thus, the regret bound for x* on the boundary of K requires further analysis. This is what we describe below.
The Final Regret Bound
Proposition 8. Let β(·) be a -self-concordant barrier for K, and let xεint K and unit vector uεn be such that uT∇β(x)>0. Let tmax=tmax(x,u) be defined as
tmax=tmax(x,u)max{t|x+tuεK}≦Δ
Under these conditions,
For distinct vectors x, yεK, denote
Proposition 9. If x, yεint K are distinct, then
Definition 1. Given the initial point x1εint K and a real δ>0, the inner subset K(δ; x1) is defined by
Corollary 10. If yεK(δ; x1), then β(y)−β(x1)≦ln(1+1/δ)·.
Proposition 11. There exists a constant c such that for every x*εK,
R(x*)≦c·√{square root over (GΔ)}·(G+Δ)√{square root over (T log T)}.
The bound of the latter proposition can be improved by a suitable of choice of units as follows.
Theorem 2. There exists a constant c such that such that for every x* εK,
R(x*)≦c·√{square root over ()}·GΔ√{square root over (T log T)}.
Generalized News Vendor Problem
In one embodiment of the invention, the above-described methods are applied to the news vendor problem (NVP), which is a classic problem in operations research. In the NVP a seller of newspapers has to order a certain number of copies of the next day's newspaper without knowing exactly how many copies he could sell. The paper becomes worthless if not sold. If the vendor orders too many copies, he loses on the unsold copies. If he orders too few copies, he loses the opportunity to sell more. In the prior art, the problem is solved under an assumed probability distribution of the next day's demand. In this embodiment of the invention a generalization of this problem is employed without any assumptions on the distribution of the demands.
In the present embodiment the NVP may be applied to a situation involving an arbitrary number of perishable commodities. The vendor has to determine at each time t (t=1, 2, . . . ) the order quantities xti of commodities i=1, . . . , n. The (nonrefundable) total cost of the orders is c1xt1+ . . . +cnxtn. If the vendor later sells the quantities st1, . . . , stn, respectively, then he realizes a revenue of r1, st1+ . . . +rnstn. However, the vendor does not know the amounts st1, . . . , stn in advance, except that, necessarily,
0≦sti≦xti(i=1, . . . ,n)
The vendor has to make these decisions every time period at the end of the period, after having observed the demands dt1, . . . , dtn for the respective commodities during that period.
The amounts xt1, . . . , xtn must also satisfy some constraints. First, xti≧0 (i=1, . . . , n). Second, there is a budget constraint
b1xt1+ . . . +bnxtn≦B
Finally, there are also availability constraints xti≦ai (i=1, . . . , n).
Loss Functions
We denote x=(x1, . . . , xn). Given actual demands dt1, . . . , dtn, if the orders are X, then the cost to the vendor is equal to
If xi≠dti, then
If xi=dti, we define
Denote gt=(gt1, . . . , gtn) where
Constraints. The domain of decisions is the set P of all x=(x1, . . . , xn) in n space such that
0≦xi≦ai(i=1, . . . ,n).
We define the following “barrier function”
β(x)=−log(B−Σibixi)−Σi log xi−Σi log(ai−xi)
for all x that satisfies all the constraints strictly, i.e.,
Σibixi<B
0<xi<ai(i=1, . . . ,n).
We have
and for k≠i,
Denote by
Ht=((Ht)ik)
the Hessian matrix, where
In an embodiment of the invention, a constant η is used such that 0<η<1.
1. At time t, when it is time to choose the vector xt+1, first calculate the entries of the matrix Ht according to equations 4-6 above and the vector gt, the gradient of ƒt, as defined in equation 3 above.
2. Let nt be the solution of the following systems of linear equations:
Htnt=−gt.
3. The choice of xt+1 is
xt+1=xt+ηnt.
Referring now to
As can be seen from the above disclosure, embodiments of the invention provide techniques for online convex optimization. As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction running system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction running system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may run entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which run via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which run on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be run substantially concurrently, or the blocks may sometimes be run in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The computer system can include a display interface 106 that forwards graphics, text, and other data from the communication infrastructure 104 (or from a frame buffer not shown) for display on a display unit 108. The computer system also includes a main memory 110, preferably random access memory (RAM), and may also include a secondary memory 112. The secondary memory 112 may include, for example, a hard disk drive 114 and/or a removable storage drive 116, representing, for example, a floppy disk drive, a magnetic tape drive, or an optical disk drive. The removable storage drive 116 reads from and/or writes to a removable storage unit 118 in a manner well known to those having ordinary skill in the art. Removable storage unit 118 represents, for example, a floppy disk, a compact disc, a magnetic tape, or an optical disk, etc. which is read by and written to by removable storage drive 116. As will be appreciated, the removable storage unit 118 includes a computer readable medium having stored therein computer software and/or data.
In alternative embodiments, the secondary memory 112 may include other similar means for allowing computer programs or other instructions to be loaded into the computer system. Such means may include, for example, a removable storage unit 120 and an interface 122. Examples of such means may include a program package and package interface (such as that found in video game devices), a removable memory chip (such as an EPROM, or PROM) and associated socket, and other removable storage units 120 and interfaces 122 which allow software and data to be transferred from the removable storage unit 120 to the computer system.
The computer system may also include a communications interface 124. Communications interface 124 allows software and data to be transferred between the computer system and external devices. Examples of communications interface 124 may include a modem, a network interface (such as an Ethernet card), a communications port, or a PCMCIA slot and card, etc. Software and data transferred via communications interface 124 are in the form of signals which may be, for example, electronic, electromagnetic, optical, or other signals capable of being received by communications interface 124. These signals are provided to communications interface 124 via a communications path (i.e., channel) 126. This communications path 126 carries signals and may be implemented using wire or cable, fiber optics, a phone line, a cellular phone link, an RF link, and/or other communications channels.
In this document, the terms “computer program medium,” “computer usable medium,” and “computer readable medium” are used to generally refer to media such as main memory 110 and secondary memory 112, removable storage drive 116, and a hard disk installed in hard disk drive 114.
Computer programs (also called computer control logic) are stored in main memory 110 and/or secondary memory 112. Computer programs may also be received via communications interface 124. Such computer programs, when run, enable the computer system to perform the features of the present invention as discussed herein. In particular, the computer programs, when run, enable the processor 102 to perform the features of the computer system. Accordingly, such computer programs represent controllers of the computer system.
From the above description, it can be seen that the present invention provides a system, computer program product, and method for implementing the embodiments of the invention. References in the claims to an element in the singular is not intended to mean “one and only” unless explicitly so stated, but rather “one or more.” All structural and functional equivalents to the elements of the above-described exemplary embodiment that are currently known or later come to be known to those of ordinary skill in the art are intended to be encompassed by the present claims. No claim element herein is to be construed under the provisions of 35 U.S.C. section 112, sixth paragraph, unless the element is expressly recited using the phrase “means for” or “step for.”
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
Number | Name | Date | Kind |
---|---|---|---|
7184992 | Polyak et al. | Feb 2007 | B1 |
7216004 | Kohn et al. | May 2007 | B2 |
20050102044 | Kohn et al. | May 2005 | A1 |
20050257178 | Daems et al. | Nov 2005 | A1 |
20060112049 | Mehrotra et al. | May 2006 | A1 |
20080134193 | Corley et al. | Jun 2008 | A1 |
20080279434 | Cassill | Nov 2008 | A1 |
20080304516 | Feng et al. | Dec 2008 | A1 |
20080306891 | Hazan et al. | Dec 2008 | A1 |
20090280856 | Ohwatari et al. | Nov 2009 | A1 |
Entry |
---|
Flaxman et al., “Online convex optimization in the bandit setting: Gradient descent without a gradient,” Symposium on Discrete Algorithms, Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, Vancouver, British Columbia, SESSION: Session 4C pp. 385-394, 2005, ISBN:0-89871-585. |
Hazan et al., “Logarithmic Regret Algorithms for Online Convex Optimization,” Machine Learning Journal vol. 69, Issue 2-3 Paaes: 169-192 (Dec. 2007). |
Zhang et al., “Single and Multi-Period Optimal Inventory Control Models with Rish-Averse Constraints”, European Journal of Operational Research 199, 2009, pp. 420-434. |
J. Abernethy, E. Hazan, and A. Rakhlin, Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization, Proceedings of the 21st Annual Conference on Learning Theory (COLT) (R. Servedio and T. Zhang, eds.), Springer, Berlin, 2008, http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-18.html, pp. 264-274. |
N. Cesa-Bianchi and G. Lugosi, Prediction, learning, and games, Cambridge University Press, 2006. |
T. Cover, Universal portfolios, Math. Finance 1 (1991), 1-19. |
A .J. Grove, N. Littlestone, and D. Schuurmans, General convergence results for linear discriminant updates, Machine Learning 43 (2001), No. 3, 173-210. |
E. Hazan, A. Kalai, S. Kale, and A. Agarwal, Logarithmic Regret Algorithms for Online Convex Optimization, Learning Theory: 19th Annual Conference on Learning Theory (COLT) (H. U. Simon and G. Lugosi, eds.), Springer, Berlin, 2006, http://www.springerlink.com/content/m1h022028472281v/, pp. 499-513. |
A. T. Kalai and S. Vempala, Efficient algorithms for online decision problems, Journal of Computer and System Sciences 71(3) (2005), 291-307. |
J. Kivinen and M. K. Warmuth, Relative loss bounds for multidimensional regression problems, Machine Learning 45 (2001), No. 3, 301-329. |
N. Littlestone and M. K. Warmuth, The weighted majority algorithm, Proceedings of the 30th Annual Symposium on the Foundations of Computer Science, 1989, pp. 256-266. |
A. Nemirovski, Interior point polynomial time methods in convex programming, Tech. Report 8813, Georgia Institute of Technology, 2004, http://www2.isye.gatech.edu/˜ nemirovs/Lect IPM.pdf. |
M. A. Zinkevich, Online Convex Programming and Generalized Infinitesimal Gradient Ascent, Proceedings of the 20th International Conference on Machine Learning (ICML) (T. Fawcett and N. Mishra, eds.), 2003, pp. 928-936. |
Hazan et al., “Logarithmic Regret Algorithms for Online Convex Optimization,” Machine Learning Journal vol. 69 , Issue 2-3 pp. 169-192 (Dec. 2007). |
Flaxman et al., “Online convex optimization in the bandit setting: Gradient descent without a gradient,” Symposium on Discrete Algorithms, Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, Vancouver, British Columbia, SESSION: Session 4C pp. 385-394, 2005, ISBN:0-89871-585-7. |
Number | Date | Country | |
---|---|---|---|
20120005142 A1 | Jan 2012 | US |