The Figures depict embodiments of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.
It is to be understood that
Turning now to
As described in greater detail below, the order determination manager 112 process a set 201 of items 111 (e.g., the results of a search query 105 as illustrated in
The order determination manager 112 determines state information 203 for each item 111 based on certain properties (determined, e.g., by user activity), as described in greater detail below. The order determination manager 112 measures the transition rates 205 of state 203 change for the items 111. The order determination manager 112 updates the ranking at discrete times based on the state 205 of the items 111, the state transition rates 205 of the items 111 and a discount rate 209 which is a function of how far into the future to account for when determining ranking the items 111.
More specifically, consider that the order determination manager 112 orders n different items 111 for a plurality of users 101, each of whom can only display up to k items 111 at any given time, where k<n. Since an item 111 displayed to a user 101 has a higher probability of being chosen than when it is not displayed, these k items 111 can be thought of as the “top list 211.” The order determination manager 112 can update its top list 211 at discrete times t=0, 1, 2, . . . .
By tracking properties for each item 111, such as its reputation, history, age, etc., the order determination manager 112 can determine that the item 111 is in a “state” 203 defined by those properties. Let E be the set of all possible states 203, i.e., all possible combinations of those trackable properties. In general, the state 203 of an item 111 may change as time goes on. As an example, on a software download site the number of downloads, or the average rating of a particular package, may vary from week to week.
It is to be understood that in various embodiments of the present invention, the order determination manager 102 uses various heuristics to determine the order of the items 111. Each such heuristic takes into account the state 203 of each item 111, the transition rates 205 of items 111 between states 203 and a discount rate 209. It will be readily understood by those of ordinary skill in the relevant art in light of this specification that the properties to use in order to determine states 203 as well as the discount rate to apply are variable design parameters, which can be set as desired in different embodiments of the present invention.
It can be assumed within the context of one embodiment of the present invention that the state 203 of each item 111 changes according to a Markov process independent of the state 111 of other items 111, with transition probabilities {Pij1:i, j εE} if the item 111 is on the top list 211, and {Pij0:i, j εE} if it is not. It can also be assumed that an item 111 being on the top list 211 encourages more users 101 to select it, and consequently accelerates its transition from one state 203 to another. Conversely, when an item transitions away from the top list 211, its rate of change slows down by an amount εi which is less than one. This dual speed assumption can be stated as
Consider the total expected utility ri obtained in one time step by those users 101 who decide to access an item 111 on the top list 211 which has state i. This utility may depend on many factors, such as the total expected number of users 101 choosing the item 111 at a given time step, or the expected quality of the item 111. Since the definition of “state” 203 can be expanded to include these factors, the utility ri is uniquely determined by the item state i. In other words, we can assume that r=(ri)iεE is an |E|-dimensional constant vector known by the order determination manager 112.
The order determination manager 112 can maximize the total expected utility of all users 101:
where im(t) is the state 203 of item m at time t, and
where 0<β≦1 is the future discount factor 209. A solution is thus to find the optimal strategy, υ, in the space υ of stationary strategies (strategies that depend on current item states only). This strategy can then be translated into the set of offerings that are to appear in the top list 211.
The model described above is essentially a dual-speed restless bandit problem. Dual-speed restless bandit problems are discussed, for example, in P. Whittle (1988) Restless bandits: activity allocation in a changing world, J. Appl. Prob., 25A, pp 287-298 and K. D. Glazebrook, J. Niño-Mora and P. S. Ansell (2002) Index policies for a class of discounted restless bandits. Adv. Appl. Prob., 34, 754-774.
The model described above is restless because changes of state can also occur when the items are not displayed in the top list 211, and dual speed because those changes do happen at a different speed than those on the top list 211. As is known by those of ordinary skill in the relevant art, Bertsimas and Niño-Mora have demonstrated that an optimal solution is available for the dual-speed restless bandit problem. This solution is discussed in, e.g., J. Niño-Mora (2001) Restless bandits, partial conservation laws and indexability. Adv. Appl. Prob., 33, 76-98, as well as the Glazebrook et al. document cited above.
Specifically, it is possible to attach an index 213 to each item state 203, so that the top list 211 is the ordering including those items 111 with the largest indices 213. This way the user value gets maximized. It is worth remarking that it is not obvious why the relative importance of the states 203 can be measured by one independent index 213. In fact, for a general restless bandit problem without the dual-speed assumption, such a set of indices 213 may not exist.
Nevertheless, Bertsimas and Niño-Mora have shown that a relaxed version of the dual-speed problem is always indexable (i.e. such indices 213 always exist) and also proposed an efficient adaptive greedy heuristic to compute these indices 213. By relaxed we mean that instead of displaying exactly k items 111 at each time, k items 111 on average are displayed. For this relaxed problem, it can be shown that there exists a set of indices {Gi}iεE and a Lagrange multiplier γ such that the optimal strategy is to always display those items 111 whose G-index is greater than γ. Note that in situations where the top list 211 can have variations in the number of items 111, the relaxed situation is the one that applies. In embodiments that apply the limit of no variations, while the solution is known to be suboptimal, the solution is a good approximation to the optimal one, and thus still has great utility.
In order to apply the Bertsimas and Niño-Mora heuristic in this specific context, the order determination manager 112 first calculates a set of constants ASi, which are herein defined. Assume that E is finite. For any subset SεE, we define the S-active policy υs to be the strategy that recommends all items 111 whose state 203 is in S. Now consider an item 111 that starts from an initial state X(0)=i. Under the action implied by strategy υs, its total occupancy time in S is given by
The variables {ViS}iεE can be solved from the set of linear equations above. A matrix of constants {AiS}iεE,S⊂E is defined by means of ViS as follows:
Once the order determination manager 112 computes the G-index for each state using this heuristic, the strategy is to display the k items 111 whose states 203 have the largest G-indices. For our dual-speed restless bandit problem, it follows that AiS>0 for all iεE and S⊂E, so that the relaxed version of the problem is indexable. The table above also provides a good heuristic for the unrelaxed problem.
Turning to
In addition to those 25 states 203 there is one more state, 0, which we call the “unknown” state. Each item 111 initially starts in this state 203, as it has never been either accessed or rated. We assume that occasionally an item 111 will “die,” and if that happens it is immediately replaced by a new item 111. This is equivalent to assuming that there is a small transition probability from each of the 25 states to the unknown state, the entering of which implies starting over. State 0 thus serves as both the sink and the source.
The transition probabilities are assumed to be as follows:
which expresses the fact that displaying an item 111 on the top list 211 accelerates its transition speed by ten times. Note the assumption that an item's access level tends to increase more than to decrease. The states 203 and the transition probabilities are illustrated in
The order determination manager 112 sets the reward of each state 203 to be
The G-index rankings of the 26 states 203 are calculated using the above described Bertsimas-Niño-Mora heuristic. The result is shown in
The result of this example is by no means trivial. For example, it is not obvious that the unknown state 203 which gives no reward should have higher display priority than state (2,2), but lower priority than (3,1). This effect is due to the fact that the heuristic gives high index values to potentially valuable states 203. The mechanics of this example can be extended to larger systems and used to compute the transition probabilities from actual data from a portal.
This solution utilizes on the computation of a set of indices 213, each allocated to each item state 203 in a list, which can be computed by accessing the rates at which items 111 are visited and the rankings they receive from users 101. These rates determine the transition probabilities that are then used as inputs into the actual computation of the index 213 for each state 203. The actual computation of these indices 213 can be performed by mapping the problem of optimizing the information received from any other digital content to that of the optimal allocation of effort to a number of competing projects. Thus, as noted above, the problem to solve can be formulated as a dual-speed restless bandit problem, which is a special case of the restless multi-arm bandit problem. By specially applying in this context the computationally efficient heuristic developed by Bertsimas and Niño-Mora, it is possible to calculate an index 213 for each item state 203.
This mechanism can be used to solve a multiplicity of problems, ranging from the decision of which search results to display on the top list page 211 of a search engine, to the menu of items that a portal decides to present to users or the order in which a journal presents its content to users. Other applications include determining what to display in devices with a small visual real estate (e.g., cell phones, personal digital assistants), the relevant information that should be presented to analysts confronted with mountains of data, how to sort through blogs and other forms of user generated media, and the determination of how to best optimize movie and video directories. Another area of application is that of instrumentation, the purpose of which is to inform the user of the state of the world in which it is embedded. Furthermore, advertising is another potential beneficiary of this technology, for it could use click patterns from visitors to given portals to decide on which ads to present at given times. It is to be further understood that embodiments of the present invention can not only determine ordering of items to display in a limited space, but can also determine items to present in a limited time, for example which television or radio commercials to broadcast.
As will be understood by those familiar with the art, the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. Likewise, the particular naming and division of the modules, agents, managers, functions, procedures, actions, layers, features, attributes, methodologies and other aspects are not mandatory or significant, and the mechanisms that implement the invention or its features may have different names, divisions and/or formats. Furthermore, as will be apparent to one of ordinary skill in the relevant art, the modules, agents, managers, functions, procedures, actions, layers, features, attributes, methodologies and other aspects of the invention can be implemented as software, hardware, firmware or any combination of the three. Of course, wherever a component of the present invention is implemented as software, the component can be implemented as a script, as a standalone program, as part of a larger program, as a plurality of separate scripts and/or programs, as a statically or dynamically linked library, as a kernel loadable module, as a device driver, and/or in every and any other way known now or in the future to those of skill in the art of computer programming. Additionally, the present invention is in no way limited to implementation in any specific programming language, or for any specific operating system or environment. Accordingly, the disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.
This patent application claims priority under 35 U.S.C. §119 from U.S. provisional patent application No. 60/801,911 filed May 19, 2006 entitled “A System And Method For Selecting And Displaying Most Valuable Information,” with inventors Bernardo Huberman and Fang Wu, and which is hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
60801911 | May 2006 | US |