Companies can own thousands (and in some cases millions) of related web pages in connection with advertisement of goods and/or services. Web pages that belong to various departments or divisions within a given company can potentially offer different products or services, but these web pages are generally part of a larger web page structure that constitutes the website, which belongs to the company as a whole. As a result, the individual web pages are linked together using hyperlinks that also must be generated to meet both the needs of the organization and those of the individual departments or divisions.
One problem that arises when attempting to create a hyperlink structure between large numbers of pages is optimization. Hyperlinks on a web page allow a user to navigate to different pages within the web site in order to locate content of interest. Accordingly, it is beneficial for the owner of a website to select hyperlinks displayed on the page such that a user would find them useful whilst generating the maximum revenue possible for the owner of the website. Guessing and subsequently selecting the hyperlinks that are most likely to be followed in order to maximize revenue can be difficult and non-optimal if performed naively, yet that is the approach by which many sites proceed.
The claimed subject matter generally relates to optimizing website design through automated selection and placement of hyperlinks associated therewith to maximize revenue generation for the website. More specifically, described herein are systems/methods that are employed to maximize revenue generated from a web site based on hyperlinks that are placed on respective web pages either through revenue generated from advertisements or sale of products listed on the web pages. Conventional systems rely on manually updating hyperlinks associated with a web page in accordance with current contemplations as to what particular hyperlinks would be most beneficial, which is a time-consuming and imperfect task. As a result, such conventional systems are subject to significant opportunity costs associated with loss of potential revenue (and lost man-hours).
Typically, web pages generate varying amounts of revenue, for example, through advertisements and/or product sales. Additionally, web pages often display hyperlinks to other pages on the web site. Each possible hyperlink has a transition probability representing the probability that a surfer clicks on the hyperlink conditional on the other links on the page. A web designer should select a sub-graph which maximizes expected revenue of a random walk. The stated problem has a seemingly complex nature, but in a very general setting, this difficulty can be formulated as a problem of computing a fixed point of a function, which allows for approximating an optimal solution to within an arbitrary degree of precision in polynomial time. The problem can also be formulated as a mathematical program which is reduced to a linear program. The linear program can be rounded such that a subset of variables of the mathematical program (representing link existence) is integral—this solution then describes the optimal web site design.
To aid in maximizing revenue for a website, a graph optimization system is provided that can be integrated within a revenue maximization system or communicatively coupled thereto as a non-native tool. The graph optimization system can receive a representative graph that comprises nodes and edges corresponding to web pages and hyperlinks, respectively, and can compute expected revenue of random walks through the graph. The graph optimization component can further select a sub-graph through the graph that yields maximum expected revenue. In accordance therewith, once a revenue maximizing sub-graph has been selected, the sub-graph can be provided to the revenue maximization system (e.g., as data that is representative of a graph) for website design.
A computation component can compute expected revenue of a random walk within a graph to aid in determining sub-graph(s) that are expected to result in maximum revenue for the website. This can be accomplished by iterating through the graph and adding edges until the random walk reaches a fixed length. By computing the expected revenue of a random walk that originates at each node of the graph, the computation component develops a sub-graph that can be used to determine the maximum expected revenue sub-graph within the original graph. Moreover, a selection component can be employed to determine a maximum expected revenue of a random walk originating from each node of the graph by extending the walk received from the computation component one additional edge such that the new random walk maximizes the expected revenue from a specified node. Additionally, a validation component can be utilized to constrain variables associated with each node and edge of the graph (e.g. the expected revenue of an edge). By constraining the variables while attempting to maximize the expected revenue of the walk through the graph, the sub-graph yielding the maximum expected revenue can be identified.
To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings. These aspects are indicative of various ways in which the claimed subject matter may be practiced, all of which are intended to be within the scope of the claimed subject matter. Other advantages and novel features may become apparent from the following detailed description when considered in conjunction with the drawings.
The various aspects of the claimed subject matter are now described with reference to the annexed drawings, wherein like numerals refer to like or corresponding elements throughout. It should be understood, however, that the drawings and detailed description relating thereto are not intended to limit the claimed subject matter to the particular form disclosed. Rather, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the claimed subject matter.
As used in this application, the terms “component” and “system” and the like are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an instance, an executable, a thread of execution, a program and/or a computer. By way of illustration, both an application running on a computer and the computer can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers. The word “exemplary” is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over the other aspects or designs.
Furthermore, all or portions of the subject innovation may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer to implement the disclosed innovation. The term “article of manufacture” as used herein is intended to encompass a computer program accessible from any computer-readable device, carrier, or media. For example, computer readable media can include but are not limited to magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips . . . ), optical disks (e.g., compact disk (CD), digital versatile disk (DVD) . . . ), smart cards, and flash memory devices (e.g., card, stick, key drive . . . ). Additionally it should be appreciated that a carrier wave can be employed to carry computer-readable electronic data such as those used in transmitting and receiving electronic mail or in accessing a network such as the Internet or a local area network (LAN). Of course, those skilled in the art will recognize many modifications may be made to this configuration without departing from the scope or spirit of the claimed subject matter.
It should also be noted and appreciated that although various aspects of the claimed subject matter are described with respect to revenue generation through an optimization of the hyperlink structure to other web pages within the same web site, the claimed subject matter is not limited thereto. Disclosed aspects can also be employed with other types of systems that have a structure that can be expressed as a graph of nodes and edges.
Further yet, various aspects are described solely with respect to revenue generation through web pages and hyperlinks thereto for purposes of brevity. However, it should be noted that other revenue generation schemes are also contemplated and are to be considered within the scope of claimed subject matter including but not limited to revenue generated through the placement of advertisements on web pages.
The claimed subject matter generally addresses a difficulty of hyperlink placement on web pages within the larger structure of an entire website, and can eliminate the onerous and inefficient task of manually selecting and placing said hyperlinks. Moreover, when selecting hyperlinks to place on a website/web page, one does not often consider that different hyperlinks can have different potential for revenue generation. By modeling these aspects with an approximation algorithm or linear program, an efficient solution that uses the disparate revenue values associated with each web page and hyperlink to make determinations regarding the placement of hyperlinks can be achieved.
Prior to discussing various high-level embodiments of the invention in connection with the accompanying figures, a discussion of a model, algorithms, corresponding theorems and techniques will be described in order to provide context for better appreciating and understanding the invention.
Referring initially to
The computation component 110 can store data related to the website and its organization in the website data store 130 that is communicatively coupled to the computation component 110. The system 100 can also include a selection component 120 that is communicatively coupled to the computation component 110 and the website data store 130, wherein the selection component 120 can identify an optimized graph 140. The optimized graph 140 can also be a directed graph and is typically representative of a website design that will facilitate maximizing revenue. For example, the revenue generated by a website can be maximized by optimizing the hyperlink structure between individual web pages. The optimized graph 140 can denote the revenue maximizing sub-graph within the graph 105.
Revenue generation though a website can be accomplished through product purchases or advertisements, but both have a quantifiable expected revenue value that is associated with the web page. Such values related to the graph 105, expressed as variables, can be generated by the computation component 110 or from, e.g., empirical data and input to the data store 130. The expected revenue values can be retrieved from the website data store 130 by the computation component 110 or the selection component 120. These variables can include a probability pij,S corresponding to whether a particular edge of the graph 105 exists and will likely be followed by the user, a variable t corresponding to the number of steps taken for each random walk, and a revenue variable rij that is associated with that particular edge. More specifically, the revenue variable can represent the expected revenue generated when a user browsing the website visits page j via a hyperlink contained on page i.
By computing the expected revenue over random walks through the graph 105, the sub-graph that is expected to maximize the revenue of the website can be identified. The selection component 120 can receive or retrieve data corresponding to random walk(s) through the graph 105 from the computation component 110, including the node from which the random walk originates and the revenue generated along that random walk. Since each node within the graph 105 represents a web page, and selection component 120 can successively iterate through the potential maximum length random walks from a given node and selects the sub-graph composed of the random walks that yields the maximum revenue according to variables associated with the graph 105. Based on this and other data, including any data retrieved from the website data store 130, the selection component 120 can maximize the revenue of a sub-graph within the graph 105 and output this as optimized graph 140.
Thus, the system 100 can receive a directed graph 105 corresponding to a website, and analyzes nodes and edges associated with the directed graph 105, where the nodes represent web pages and the edges represent links of respective web pages with quantifiable expected revenue values. The analysis can involve identifying revenue maximizing random walks associated with the respective nodes and edges. Once revenue maximizing walks are identified, a sub-graph (e.g., optimized directed graph 140) is generated that comprises the revenue maximizing random walks over the directed graph 105.
In accordance with one aspect of the claimed subject matter, a random walk through the graph 105 can represent to a web surfer traversing hyperlinks on the website. For each page j, there is a probability pj that the surfer starts surfing from page j. For each page i, set S⊂N, {i} of other pages, and page jεS, there is a probability pij,S that a surfer on page i follows a hyperlink to page j, assuming that the set of pages linked from page i is S. It is assumed that for all i and S⊂N, {i}, ΣjεSpij,S≦1−δ for some positive constant δ>0, e.g., in each step there is a non-zero probability that the surfer exits the web site. This is a reasonable assumption, in connection with the analysis of the iterative algorithm described infra in connection with selection component 120.
An expected revenue for a random walk on the web site can be defined by assigning a revenue rj to each page j (this would correspond to the expected revenue that a surfer visiting page j would generate for the web site owner, perhaps from the advertisement on the page, by buying a product on the page, etc.). Thus, the expected revenue of a random walk can be defined as the sum, over all j, of rj times the expected number of times that the random walk visits j.
It should be appreciated that in one aspect, revenues are assigned to edges instead of vertices. For example, for each hyperlink ij, there a value rij representing the expected revenue generated for page j by a web surfer who has followed link ij. The total revenue is defined as the sum, over all edges ij in the graph 105, of rij times the expected number of times the random walk traverses the edge ij. It should be noted that utilizing edges rather than vertices can yield a strictly stronger model, since setting rij=rj for all i would be equivalent to assigning revenues to vertices (when adding the value Σjpjrj for the revenue of the first page the surfer visits). However, assigning revenues to edges enables modeling situations where the conversion rate of a user depends on the web page she is coming from, and can be useful in modeling content-related websites.
It should also be noted that total revenue can be defined by multiplying rij's by the expected number of times the random walk takes the corresponding edge, as opposed to the probability that the random walk takes a particular edge. This means that if the random walk visits a vertex twice, it will benefit the web site owner twice. This is a realistic assumption in many situations, e.g., where the revenue is generated from “per-impression” advertisements. The above model for representing a website as a directed graph 105 is can allow for situations where the probability that a surfer clicks on a link to page j placed on page i depends not only on i and j, but also on the set of other links on the page i. In economic terminology, this means that the graph 105 can model externalities among the links placed on a page i.
An interesting and important special case is the case of no externalities. In accordance with another aspect of the claimed subject matter, each page has limited real-estate in which it can display links, and so each node i can have out-degree at most ki (a parameter). For each i,jεN, there is a probability pij that a surfer on page i follows a hyperlink to page j, if such a link exists. It is assumed that for all i, and for any set S of ki pages, the sum ΣjεSpij≦1δ, so these probabilities define a random walk with exit probability at least δ in each step. In this model there is still an externality among the links, since placing each link further limits the number of other links that can be placed on the page. However, this is the only form of externality allowed in this case.
Turning now to
In another aspect of the claimed subject matter, the expected revenue value rij could be replaced with a cost cij associated with an edge of the graph 105. In accordance therewith, the system could employ a graph (e.g., graph 105), that is, for example, associated with an advertising system that utilizes a “per click” or “per view” cost structure. As such, the cost of traversing a link between two web pages would incur some cost rather than generating revenue. Adjusting the maximization objective to represent the cost of edges rather than the generated revenue appropriately adjusts the system for this alternate embodiment.
Still referring to
Expressed alternatively: For t:=1 to T do for every i, let
Rit:=maxS⊂N{ΣjεSpij,S(Rjt−1+rij)}.
The aggregation component 230 can compute the revenue along random walks of length T for each node i of the graph 105 through the other nodes in S. After the set of random walks from node i has been computed, the sub-graph composed of the random walks with the maximum expected revenue can be identified and transmitted to the selection component 120. It should be noted that there is the possibility that certain hyperlinks should might be constrained to always or never be contained on a website, regardless of the expected revenue associated with said hyperlinks. By adjusting the probability of such hyperlinks, the optimized sub-graph through the graph 105 can always or never include certain hyperlinks based on preferences and adjustments to the system. For example, a given website might always contain a link to another website or always exclude links to another website based on content or some other consideration. By fixing the transitional probability of the link between web pages represented by nodes within the graph 105, certain links will always (e.g., setting the probability to 1) or never (e.g., setting the probability to 0) be included in the graph 105. Because of the so-called PageRank system for sorting web page search results, which attempts to ascertain the probability of an individual web page in the stationary distribution over a random walk on the web, it is contemplated that a fixed link for each of the web pages within a larger website should be the web page with the highest entrance probability.
With reference now to
For instance, for every i, it can be assumed that Si:=argmaxS⊂N{ΣjεSpij,S(RjT+rij)}. By iterating through the possible nodes, j, the comparison component 320 can generate the set of possible random walks from i of length T+1, and the argmax function selects the maximal expected revenue random walk from that set. Thus the revenue generated along the random walk is maximal for all jεS, and the comparison component 320 selects the maximum revenue generating walk originating from i. It should be further noted that this procedure for determining the random walk that generates the maximum expected revenue for each node i can be repeated for each i, such that the set of such random walks is computed for the graph 105. Such data can be stored in the website data store 130 and output in the form of optimized sub-graph 140 that maximizes revenue within the original graph 105.
In accordance with one aspect of the claimed subject matter, an efficient iterative algorithm to compute the revenue-maximizing hyper-link structure can be employed. The iterative algorithm can begin with the following lemma, which computes the revenue of a given graph (e.g., graph 105): Let G(N,E) be a directed graph and δ+(i) denote the set of vertices that have an edge from i in G. Also, let Ri denote the expected revenue of a random walk in G that starts from node i. Then {Ri}iεN is the unique solution of the system of equations:
It is readily apparent that R is a solution of this system of equations. Therefore, in terms of proof for the solution, it is enough to show that this solution is unique. This follows from the fact that the matrix of coefficients of this system has −1 along the main diagonal, and on each row, the sum of the off-diagonal entries is Σjεδ
Given the values of pij,S's and ri,j's, we define a function φ:RnRn as follows: for a vector R=(R1,R2 . . . Rn), φ(R) is a vector whose i'th component is φi(R)=maxS⊂N{ΣjεSpij,S(Rj+rij)}.
In accordance with another aspect, a second lemma can be provided. The following lemma assumes that the starting probabilities pi are all non-zero. It will later be seen that there is a graph (e.g., graph 140) which is optimal with respect to any set of starting probabilities, and therefore this assumption serves only to remove degenerate cases.
Assume for each i, pi>0. Let G* be the revenue-maximizing graph 140, and Ri* be the expected revenue of a random walk in G* that starts from node i. Then R* is the unique fixed point of the function φ. Proof for the second lemma is based on a theorem which shows that every map that is contraction of a metric space has a unique fixed point and is shown below. Therefore, by showing that f is a contraction under the l∞ norm, the proof is supplied. However, first the definition of an increasing function and a contraction are given:
Definition of an increasing function: For two vectors x,x′εRn, we say x≦x′ if xi≦x′i for all i. We say that a function f:RnRn is increasing if for every x,x′εRn, if x≦x′, then f(x)≦f(x′).
Definition for a contraction: Let X be a metric space, with metric d. If f maps X into X and if there is a constant c<1 such that d(f(x),f(y))≦cd(x,y) for all x,yεX, then f is said to be a contraction of X into X.
In accordance with yet another aspect, a third lemma can be provided. The following lemma is a strengthening of the contraction principle (in the case of increasing functions). Let f:RnRn be a function that is increasing. Assume f is a contraction of Rn under some metric. Then there exists one and only one x*εRn such that f(x*)=x*. Furthermore, for every vector xεRn satisfying x≧f(x), we have x≧x*. Similarly, for every vector xεRn satisfying x≦f(x), we have x≦x*. To prove the third lemma, define a sequence x1, x2 . . . as follows: x1=x, and xi+1, =f(xi) for every i≧1. Since f is increasing and x≧f(x), by induction we have xi≧x for every i. Since f is a contraction, the distance between xi and xi+1, tends to zero and therefore this sequence must have a limit. Let x* be any such limit point. Since xi≧x for all i, we have x*≧x. Also, since f is a contraction, it must be continuous, and therefore the limit of the sequence f(x1), f(x2), . . . is f(x*). But this is limit x*. Therefore, f(x*)=x*. Furthermore, if there is another x′εn such that f(x′)=x′, then we have d(x, x′)=d(f(x)−f(x′))≦cd(x,y), which is a contradiction. Hence, f has a unique fixed point x*≧x. The other part can be proved similarly.
It remains to show that φ satisfies the conditions of the above lemma, which can be illustrated by the following:
Therefore, ∥φ(x)−φ(y)∥∞=maxi|φi(x)−φiy|≦(1−δ)D. Hence φ is a contraction.
In accordance another aspect, a fourth lemma can be employed. The fourth lemma provides that a function φ defined supra is increasing, and is a contraction of n with respect to the metric l∞. Accordingly, proof of the second lemma can now be supplied. Since the third and fourth lemmas imply that φ has a unique fixed point, it can be shown that this fixed point is R*. First, we show that R*≦φ(R*), because the first lemma provides that for every i, Ri*=Σjεδ
In accordance with yet another aspect, the iterative algorithm can now be provided. One idea of this algorithm is to start from the vector 0 and apply the function φ iteratively. It is readily apparent that this gives a sequence that converges to R*. It is shown that if this process stops after T steps, the resulting vector gives a graph (e.g., graph 140) that has revenue close to R*. The algorithm is presented in detail below.
In accordance with still another aspect of the claimed subject matter, a first theorem can be provided. Let Δmax:=maxi,jrij and Δmin:=mini,j,Spij,Srij, and ε>0 be given. Then the solution provided by the iterative algorithm after
iterations is within a 1+ε factor of the optimal revenue. Proof for the first theorem can be as follows: According to the fourth lemma above, the function f contracts the % distance by a factor of 1−δ. Therefore, by induction on t, we have ∥Rt−Rt−1∥∞≦(1−δ)t−1∥R1∥∞≦(1−δ)tΔmax. Let R* be the limit of Rt (note that even though the algorithm only defines Rt for t≦T, we can define this sequence beyond T), which by the second lemma gives the optimal revenue starting from each node. By the above inequality, we obtain ∥Rt−R*∥∞≦(1−δ)t+1δ−1Δmax.
It can also be shown that the graph
Thus:
When examining ε′=(1−δ)T+1δ−1Δmax/Δmin, the above inequality implies that
for all i. Therefore, by the third lemma, the fixed point of Ψ, which is
we obtain ε′<ε and the first theorem provided supra follows. It is to be appreciated that in some cases Δmin can be replaced at runtime of the algorithm by miniRi*. As an addition to or alternative to the iterative algorithm described supra, an alternative algorithm (e.g., linear programming algorithm) is presented for (exactly) computing the revenue-maximizing hyperlink structure. For simplicity of presentation, techniques are described in the case of no externalities, however it is to be appreciated this need not be the case. The linear programming algorithm can first solve a linear program describing the optimal structure and then can proceed to round it. Since no factors need be lost in the rounding, the algorithm can compute an exact optimal solution.
One optimization question facing, e.g., a web designer in this setting is to find a sub-graph (e.g., graph 140) of the complete graph (e.g., graph 105) in which each node has degree at most ki and the total revenue is maximized. This can be formulated as a mathematical program as follows. Let xi be a variable representing the expected number of times a web surfer encounters node i and yij be an indicator variable for the existence of hyperlink ij. Thus, the expected number of times a web surfer traverses link ij is simply xipijyij. Relaxing the integrality constraint on yij, the problem then becomes:
Constraint 3 encodes the “conservation of flow”: the expected number of times xj a surfer visits node j can not be more than the expected number of times pj he starts surfing from j plus the expected number of times ΣiεNxipijyij that he enters j from a neighboring node. Constraint 4 encodes the out-degree constraint on a node i.
This mathematical program can be transformed to a linear program by performing the change of variables zij=xiyij. This provides the program
which is linear in the variables xi and zij. In the next section, it is shown how to round an optimal fractional solution (xi, zij) to linear program equation (5) to a solution in which zij/xiε{0,1} for all i,jεN.
Consider an optimal fractional solution to equation (5). For all iεN such that xi>0 and all jεN, define yij=zij/xi. Notice if yijε{0,1} for all i,jεN, then these yij can be used to define a feasible hyperlink structure with optimal revenue.
Otherwise, let G=(N,E) be a graph where edge ij exists if yij>0 and has transitional probability pijyij. Consider an arbitrary node i0εN with at least one fractional out-going edge, i.e. for at least one j, 0<yi
Accordingly, a fifth lemma can be provided. For example, there is a graph G′ with total expected revenue equal to G in which i0 has exactly ki
Consider the graph Gl=(N, El) where i0 only has links in Fl. In other words, El=E−{yi
In order to prove that for some l, the revenue Rl of Gl is at least the total revenue of G, the total revenue R of G can be written in terms of Rl as follows: by linearity of expectation, the expected revenue that a random walk in G starting at i0 collects before returning to i0 is simply ΣlλlR′l. Also, the probability of returning to i0 is Σlλlpl. Therefore, R=ΣlλlR′l+ΣlλlplR, and so:
Using the fact that Σlλl=1, R an be re-written as
where we restrict the summation to the vertices Fl such that λl>0. The fifth lemma then follows from the fact that (Σlal)/(Σlbl)≦maxl(al/bl) for any two sequences of positive real numbers {al} and {bl} Proceeding now to “fix” iteratively all nodes i with fractional out-links to get an integral graph G with optimal revenue (e.g., graph 140).
It is to be understood and appreciated that the results provided above in the case of no externalities can be extended to the general case of extant externalities by using the following mathematical programming formulation. Let yi,S be an indicator variable for the event that page i chooses to link to pages in S. As before, xi represents the expected number of times a surfer visits page i. By convention, we define pij,S=0 for j∉S.
Game Theoretic Questions
As detailed supra, graph 105 can represent a model of an entire website. In many situations, especially for large companies, it is often the case that subsets of the web pages constituting the entire website are controlled by distinct (and sometimes even competing) profit centers, each responsible for their own profit and loss account. Accordingly, it may not be reasonable to expect that a particular profit center, or group of profit centers, will comply with the optimal web site design (e.g., optimized graph 140) at it own expense. That is, while an optimized graph 140 may decidedly yield higher revenue for the entire website, the optimized graph 140 may not include hyperlinks (edges) of one particular profit center, therefore precluding potential revenue for that particular profit center. One approach to alleviate discord brought about by the competing interests is to divide the total revenue of the website among the profit centers to ensure stability. This implies that there is always a way to divide revenue among profit centers such that the optimal web site design (e.g., optimal graph 140) is stable in that each profit center can receive a total revenue at least as large as the revenue it would be able to extract as a coalition.
Since cooperative game theory studies games in which the primitives are actions taken by coalitions of players, such a setting can be interpreted as a cooperative game where the nodes of the graph 105 are the players. Thus, each web page is owned by an individual self-motivated agent such as a profit center within a company. This individual agent seeks hyperlinks that maximize its revenue, but may cooperate with other agents in doing so and thereby capitalize on the induced externalities between links. As such, the game can be considered both in transferable and non-transferable utility settings. In a transferable utility setting, the value generated by a coalition may be distributed in an arbitrary manner among the members of the coalition whereas in a not-transferable utility setting, each node in a coalition receives only the revenue it generates.
Cooperative Game with Transferable Utility (TU)
In a TU game, one underlying assumption is that the revenue generated by a coalition may be shared among its members in any manner. A TU game is defined by a value function v, which assigns to every possible coalition of players the value they can achieve. The value v(S) of subset S of nodes can be the value of the corresponding linear program equation (5) detailed above with variables restricted to the set S. It is known that relevant stable solutions of the game are in the core. A solution is in the core of a coalition game with TU if for all coalitions S, ΣiεSξi≧v(S). Thus, the core is described by a set of linear inequalities. Hence, a set of payoffs ξi is in the core if ΣiεNξi=v(N) and for all S⊂N, ΣiεSξi≧v(S). Proof that the game has a non-empty core is already known, however a standard proof based on linear programming duality is provided below. In order to write the dual of equation (5), variables αi, βii, and γij correspond to the first, second, and third inequality, respectively. The dual is then:
Hence, the payoffs ξi=αipi are in the core. It is readily apparent that ΣiεNξi=ΣiεNαipi=v(N) by the linear programming duality. Moreover, to prove for all S⊂N. ΣiεSξi≧v(S), it is only necessary to show that the optimal solution (αi, βi, γij) to equation (7) is a feasible solution to equation (7) restricted to players in S. This follows easily as the inequalities of equation (7) restricted to the players in S are a subset of those in equation (7). Therefore, the game has a non-empty core, and the solution can be found in polynomial time.
Cooperative Game with Nona-Tranesferable Utility (NTU)
Since TU games assume that the players are able to distribute the total revenue in any manner, it is to be appreciated that such an assumption is not always reasonable. For example, the performance of a profit center is often measured in terms of the amount of revenue it generates for the company, and there is no mechanism through which profit centers may share revenue prior to review. A NTU game can generalize TU games by studying situations such as these in which not all payoff vectors are feasible for a coalition.
A NTU game can consist of a set of N of players for each coalition N⊂S a set (S)⊂|S| of feasible payoff vectors for that coalition. The sets (S) are assumed to satisfy some mild assumptions, namely: 1) that (S) is closed; 2) if vε(S), then for all v′|S| with v′≦v (coordinate-wise), v′ε(S); and 3) the set of vectors in (S) in which each player receives at least the utility that player can achieve individually is a nonempty, bounded set. Intuitively, a solution to an NTU game with payoffs vε(N) is stable (e.g., in the core) if no coalition S can withdraw and achieve a payoff vector v′ε(S) such that each member of S improves his payoff. For notational convenience, v|S can denote the vector |S| whose coordinates are the coordinates of v restricted to the players in S. A vector vε(N) is in the core of the NTU game if there is no coalition S and vector v′ε(S) such that v′>v|S (coordinate-wise). To consider the conditions under which an NTU game has a nonempty core, let λS be a fractional partition λS of players, e.g., a set of coefficients 0≦λS≦1 of subsets of N such that for all players i, ΣS:iεSλS=1. An NTU game is called balanced if, for every fractional partition λS, a vector vε|N| must be in (N) if v|S ε(S) for all S with λS>0.
Accordingly, a second theorem can be provided that states a cooperative game with NTU has a nonempty core if and only if it is balanced. In the situation described above with competing profit centers, the set (S) consists of the payoff vectors v where vi is (at most) the revenue of i in some hyperlink structure on S. More formally, vε(S) if and only if there is a (fractional graph G on nodes S such that for each player iεS, vi is at most the expected revenue of i in G. Alternatively, this condition can be stated using program 2: vε(S) if and only if there is a feasible solution (xi,yij) to program 2 such that for each player iεS, vi is at most Σj·(xj, pjiyji) (the expected revenue of i). These sets (S) satisfy the assumptions stated above, and so the game is an NTU game.
In addition, a third theorem can be set forth that states there is a fractional graph in the core of the website game. Fractional graphs can be though of as the result of mixed strategies in hyperlink selection. In other words, if a node i is allowed to have fractional out-links of total weight at most ki (or probabilistically select ki links according to their fractional weight), then the core is nonempty. It should be appreciated that the efficient (e.g., revenue-maximizing) graph is in the TU core, this may not be the case for the NTU core. In fact, the solutions in the NTU core may be arbitrarily inefficient.
Turning to
where xj is the number of times a web page is accessed, which is less than pj, the expected number of times the user starts from node j, plus the expected number of times ΣiεNxipijyij that the user visits node j from a neighboring node; xipijyij is the expected number of times a web surfer traverses links ij,
xi represents the expected number of times a web surfer encounters a node i,
pij represents the probability that a surfer on page i follows a hyperlink to page j, and
yij expresses the existence of an edge (hyperlink) between nodes i and j.
The verification component 410 can include a degree constraint component 430 that applies a constraint to the number of edges that are incident to a node i, which is to say that there is a limit on the number of hyperlinks on a given page. The component 430 can also constrain the variable yij to be less than the number of incident edges, ki.
For example, the functionality of component 430 can be expressed as:
The verification component 410 can further include an edge constraint component 440, which constrains the variable yij. Because yij expresses the existence of an edge between nodes i and j, the expression ∀i,jεN: 0≧yij≦1 should hold true when determining the revenue maximizing random walk through the graph 105. Relaxing the constraint on yij, such that the value of yij is not limited to {0, 1} allows the selection component 110 to generate the optimal sub-graph (i.e. random walk that generates the maximum revenue) through the graph 105 received by the computation component 110. The relaxation of this constraint allows 0<yi
It should be appreciated that the constraint values applied can either be generated by the components 420, 430, and 440 according to inputs or retrieved from the data store 130, which is coupled to the components 420, 430, and 440. Additionally, it is contemplated that in an embodiment of the present invention, the systems presented supra can be applied to subsets of the larger graph 105 so that the maximum revenue sub-graph can be solved for subsets of the links. Such an approach would be advantageous if the system were to dynamically generate links for individual web pages based on the demographics of a user browsing the web page for example. As a result, the maximum revenue sub-graph for a particular user could be determined and used to display links between web pages in order to provide the most relevant and useful information to the user. By utilizing a subset of the links, the aforementioned architecture is able to utilize those links that are considered to be relevant to a particular user based on known or inferred characteristics or preferences.
The aforementioned systems have been described with respect to interaction between several components. It should be appreciated that such systems and components can include those components or sub-components specified therein, some of the specified components or sub-components, and/or additional components. Sub-components could also be implemented as components communicatively coupled to other components rather than included within parent components. Further yet, one or more components and/or sub-components may be combined into a single component providing aggregate functionality. The components may also interact with one or more other components not specifically described herein for the sake of brevity, but known by those of skill in the art.
In view of the exemplary systems described supra, methodologies that may be implemented in accordance with the disclosed subject matter will be better appreciated with reference to the flow charts of
Turning to
At 530, maximum revenue random walks originating from nodes of the directed graph are determined. This determination is a maximization problem where the probability that an edge exists in the graph and the expected revenue along a pre-existing walk allows the extension of the walk to create a new maximum expected revenue walk originating from a specified node. It should be mentioned that this problem applies to each of the nodes within the graph, and the determination of the maximum expected revenue random walk can be made iteratively for each node. At 540, the maximum expected revenue random walks through the graph, which represent a sub-graph of the original graph, are output such that nodes and edges of the sub-graph correspond to the revenue maximizing random walk through the original graph.
At 620, the variables corresponding to the expected revenue, number of times a node is visited along a random walk, the existence of an edge between two nodes, and the probability associated with a given edge are verified to ensure that they are within certain values. Expressed alternatively, the variables are subject to constraints that ensure that the values used to maximize the expected revenue along a random walk through the graph are feasible given the structure of the original graph.
At 630, the revenue of a random walk through the graph is computed, such that the summation of the expected revenues associated with the edges along the random walk represents the maximum expected revenue within the graph. The expected revenue associated with the identified sub-graph is computed using the expected revenue of a hyperlink, the number of times a node is visited, and the existence and probability of a given edge within the graph.
Turning to
At 730, probability and revenue values are assigned to corresponding nodes and edges within the graph. The values assigned to individual edges and nodes result from the analysis conducted on the stored data and any data contained in the graph itself. At 740, probability and revenue values assigned to individual nodes and edges of the graph are used to calculate revenue over random walks through the graph. An expected revenue value for a random walk originating from each node is computed by iterating through all the nodes of the graph. At 750, the random walk from each node is extended by one edge, which increases the expected revenue from each node of the graph along that random walk, and using the probability associated with each edge, the new expected revenue for a random walk from a specified node can be computed. At 760, the maximum expected revenue from each node along a given random walk can be selected, and the graph containing the random walks from each of the nodes of the original graph can be output.
Additionally, it should be further appreciated that the methodologies disclosed hereinafter and throughout this specification are capable of being stored on an article of manufacture to facilitate transporting and transferring such methodologies to computers. The term article of manufacture, as used herein, is intended to encompass a computer program accessible from any computer-readable device, carrier, or media.
In order to provide a context for the various aspects of the disclosed subject matter,
With reference to
The system bus 918 can be any of several types of bus structure(s) including the memory bus or memory controller, a peripheral bus or external bus, and/or a local bus using any variety of available bus architectures including, but not limited to, Industrial Standard Architecture (ISA), Micro-Channel Architecture (MSA), Extended ISA (EISA), Intelligent Drive Electronics (IDE), VESA Local Bus (VLB), Peripheral Component Interconnect (PCI), Card Bus, Universal Serial Bus (USB), Advanced Graphics Port (AGP), Personal Computer Memory Card International Association bus (PCMCIA), Firewire (IEEE 1394), and Small Computer Systems Interface (SCSI).
The system memory 916 includes volatile memory 920 and nonvolatile memory 922. The basic input/output system (BIOS), containing the basic routines to transfer information between elements within the computer 912, such as during start-up, is stored in nonvolatile memory 922. By way of illustration, and not limitation, nonvolatile memory 922 can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory 920 includes random access memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in many forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), Rambus direct RAM (RDRAM), direct Rambus dynamic RAM (DRDRAM), and Rambus dynamic RAM (RDRAM).
Computer 912 also includes removable/non-removable, volatile/non-volatile computer storage media.
It is to be appreciated that
Computer 912 can operate in a networked environment using logical connections to one or more remote computers, such as remote computer(s) 944. The remote computer(s) 944 can be a personal computer, a server, a router, a network PC, a workstation, a microprocessor based appliance, a peer device or other common network node and the like, and typically includes many or all of the elements described relative to computer 912. For purposes of brevity, only a memory storage device 946 is illustrated with remote computer(s) 944. Remote computer(s) 944 is logically connected to computer 912 through a network interface 948 and then physically connected via communication connection 950. Network interface 948 encompasses wire and/or wireless communication networks such as local-area networks (LAN) and wide-area networks (WAN). LAN technologies include Fiber Distributed Data Interface (FDDI), Copper Distributed Data Interface (CDDI), Ethernet, Token Ring and the like. WAN technologies include, but are not limited to, point-to-point links, circuit switching networks like Integrated Services Digital Networks (ISDN) and variations thereon, packet switching networks, and Digital Subscriber Lines (DSL).
Communication connection(s) 950 refers to the hardware/software employed to connect the network interface 948 to the bus 918. While communication connection 950 is shown for illustrative clarity inside computer 912, it can also be external to computer 912. The hardware/software necessary for connection to the network interface 948 includes, for exemplary purposes only, internal and external technologies such as, modems including regular telephone grade modems, cable modems and DSL modems, ISDN adapters, and Ethernet cards.
The system 1000 includes a communication framework 1050 that can be employed to facilitate communications between the client(s) 1010 and the server(s) 1030. The client(s) 1010 are operatively connected to one or more client data store(s) 1060 that can be employed to store information local to the client(s) 1010. Similarly, the server(s) 1030 are operatively connected to one or more server data store(s) 1040 that can be employed to store information local to the servers 1030.
What has been described above includes examples of aspects of the claimed subject matter. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the claimed subject matter, but one of ordinary skill in the art may recognize that many further combinations and permutations of the disclosed subject matter are possible. Accordingly, the disclosed subject matter is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims. Furthermore, to the extent that the terms “includes,” “has” or “having” or variations thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising” as “comprising” is interpreted when employed as a transitional word in a claim.
This application claims the benefit of Provisional U.S. Patent Application Ser. No. 60/776,978, filed Feb. 27, 2006, entitled “DESIGNING HYPERLINK STRUCTURES”, the entirety of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
60776978 | Feb 2006 | US |