Claims
- 1. On a computer network having a set of Web pages V and a set of links E between those Web pages represented as an undirected neighborhood graph GN(VN,EN), the computer network further including seed Web pages va and vb in V, a method executable on the computer network for estimating associations between va and vb and other Web pages viεV, the method comprising the steps of:constructing a directed random walk graph by creating a new vi′ in V for each viεVN, and creating two directed edges e′2×k=<vi′,vj′) and e′2×k−1=<vj′,vi′> in E for each ek=<vi,vj>εEN wherein both vi and Vj are in VN, computing a penalty value penalty(vi′) for all vertices vi′εV, and constructing a |V|×|V| transition matrix T, where T[j,i] represents a transition value for each directed edge in E denoting a likelihood of moving to vertex vi from vertex vj; and calculating a steady state distribution convergence vector t of T, wherein for each viεV, t[i] represents the association between the seed Web pages and vi.
- 2. A method as recited in claim 1, the step of computing a penalty value penalty(vi′) for all vertices vi′εV comprising the steps of:computing sdist(vi′,va′) as a shortest distance in GN between vi′ and the vertex va′ corresponding to va; computing sdist(vi′,vb′) as the shortest distance in GN between vi′ and the vertex vb′ corresponding to vb; and computing the penalty value as penalty (υi′)=sdist(vi′, va′)+sdist(υi′,υb′).
- 3. A method as recited in claim 2, the step of constructing a |V|×|V| transition matrix T comprising the steps of:resetting T[j,i]=0.0 for all vertices υi′εV and for all (υi′,υj′)∉E; and solving the following set of linear equations for all vertices υi′εV, L(υi′)={∑(υi′,υj′)∈ε T[j,i]=1.0}⋃{T[j,i]x penalty(υj′)=T[k,i]x penalty(υk′)❘(υi′,υj′ )∈E and (υi′,υk′ )∈E}.
- 4. A method as recited in claim 3, the step of calculating a steady state distribution (convergence vector) t of T comprising solving a linear equation (I−T)t=0, where I is a unit matrix, and ∑1≤i≤&LeftBracketingBar;vw&RightBracketingBar; t[i]=1.0.
- 5. A method as recited in claim 2, the computer network further including a set of seed Web pages |S|≧2 in V, the step of computing a penalty value penalty(vi′) for all vertices vi′εV comprising computing the penalty value as penalty(vi′)=∑uj∈S sdist (vi′,vj′).
- 6. A method as recited in claim 2, the computer network further including a set of seed Web pages |S|≧2 in V, the step of computing a penalty value penalty(vi′) for all vertices vi′εV comprising computing the penalty value aspenalty(vi′)=length (minimum_steiner_tree(S∪{vi′})).
- 7. A method as recited in claim 6, further including the step of defining a relevant neighborhood of GN(VN,EN) for constructing the random walk graph as a set of vertices, VN=VGU (S,d), that are reachable either from the vertices in S in d edge traversals such that ∀v1∈VGu(S,d)⩔vj∈SreachableGu(vj,vi,d).
- 8. A method as recited in claim 1, further including the step of defining a relevant neighborhood of GN(VN,EN) for constructing the random walk graph as a set of vertices, VN=VGu(va,vb,d), that are reachable either from va or vb in d edge traversals such that ∀viεVGu(va,vb,d) reachableGu(va,vi,d) ν reachableGu(vb,vi,d).
- 9. A method as recited in claim 1, each Web page viεV having a known relevance value for a particular topic relevance(v,topic), the method further including the step of adjusting the penalty value penalty(vi′) for all vertices vi′εV by dividing penalty(vi′) by relevance(v,topic).
- 10. A method as recited in claim 1, further including the step of pre-fetching Web pages into a memory in decreasing order of t[i].
- 11. On a computer network having a set of Web pages V and a set of links between those Web pages E modeled as a directed graph G(V,E), each Web page viεV comprising a pair <Ov,av>, where Ov is a set of media objects including a main HTML file and av is a page author, and where each object oεOv has a known end-user preference upref(u) for an end-user u and a page author preference apref(av) for a page author av, an end-user u accessing at a seed Web page vc, a method executable on the computer network for estimating an association between the media objects and the seed Web page, the method comprising the steps of:calculating a page preference weight pref(u,v) for each Web page vi by applying preference rules defined by upref(u) and apref(av) to the contents of Ov calculating an object preference weight pref(u,o,v) for each object oεOv by applying the preference rules defined by upref(u) and apref(av) to the contents of Ov; generating a random walk graph Gw having a set of vertices Vw and a set of edges Ew; calculating a page gain gain(u,v) by finding a steady state distribution convergence vector of the random walk graph; and calculating an object gain gain(u,o) for each object as gain(u,o)=∑o∈Ov gain(u,v)×pref(u,o,v),wherein the object gain represents an association between the object and the seed Web page.
- 12. A method as recited in claim 11, the step of generating a random walk graph Gw having a set of vertices Vw and a set of edges Ew comprising the steps of:creating a vertex vi in Vw for each Web page in V; creating two edges e′j=<v′a,v′b> and e″j=<v′b,v′a> in Ew for each edge ej=<va, vb> in E; and assigning an edge weight w(e)=s(u, vj) to each edge ej=<va, vb> in E.
- 13. A method as recited in claim 12, wherein s(u,vj) is a known stickiness value for each Web page vjεV.
- 14. A method as recited in claim 12, wherein s(u,vj) is assigned a unit value for each edge ej=<va,vb> in E.
- 15. A method as recited in claim 12, wherein s(u,vj) is assigned a larger value than the unit value for each edge ej=<va,vb> in E that crosses a domain boundary.
- 16. A method as recited in claim 12, the step of calculating a page gain gain(u,v) by finding a steady state distribution (convergence vector) of the random walk graph comprising the steps of:finding a shortest distance shortest (vc, vi) from vertex vc to all vertices viεV while taking into account the edge weight using a shortest path algorithm; for each vertex vεVw, calculating a penalty penalty(u,v)=shortest(vc,vi)/(pref(u,v)+1), and calculating a unit probability unit(u,v) by solving ∑⟨v,vi⟩∈Ew unit(u,v)penalty(u,vi)=1;calculating prob(u)(vj|vi)=unit(u,vj)penalty(u,vi) for each edge e=ε<vi,vj>εVw; and calculating gain(u,v) by finding a steady state distribution (convergence vector) t of T, wherein for each viεV, t[i] represents the association between the seed Web pages and vi, where T is a matrix of transition values prob(u)(vj|vi).
- 17. A method as recited in claim 16, the step of calculating gain(u,v) by finding a steady state distribution (convergence vector) t of T comprising solving a linear equation (I−T)t=0, where I is a unit matrix, and Σ1≦i≦|vw| gain(u,v)=1.0.
- 18. A method as recited in claim 17, wherein each object oεOv has a known size size(o) and, having an available pre-fetch bandwidth Pu and a pre-fetch duration δt in which a server may pre-fetch objects into a memory, the method further includes the step of identifying a set of objects Os highly associated with the end-user or seed Web page, the step comprising:defining a cost of each object as cost(o)=size(o); and identifying a subset Os of Ov such that ΣoεOs cost(o)≦Pu×δt and ΣoεOs gain(u,o) is maximized.
- 19. A method as recited in claim 18, further including the step of pre-fetching or refreshing objects from Os into the memory.
- 20. A method as recited in claim 18, where each object oεOv has a known expiration time expire(o), the method further including the step of refining the set of objects Os by removing those objects that will expire before their earliest time to view, the step comprising:finding a shortest path shortest_path(vc,vi) from vertex vc to all vertices viεV while taking into account the edge weight using a shortest path algorithm; calculating a measure of an earliest time that a page p may be accessed as earliest(u,p)=shortest_path(vc, vp); calculating a measure of an earliest time that an object o may be needed as earliest(u,o)=min{earliest(u,p)|pεpages(o)}; and eliminating those objects from Ov in which expire(o)<earliest(u,o) before identifying the set of objects Os.
- 21. A method as recited in claim 17, wherein each object oεOv has a known size size(o) and, having an available pre-fetch bandwidth Pu and a pre-fetch duration δt in which a server may pre-fetch objects into a memory, and given a set of users U or seed Web pages, the method further includes the step of identifying a set of objects Os highly associated with the set of users or seed Web pages, the step comprising:calculating an object gain gain(o) for each object as gain(o)=∑u∈U ∑o∈Ov gain(u,v)×pref(u,o,v),wherein the object gain represents an association between the object and the set of end-users or seed Web pages;defining a cost of each object as cost(o)=size(o); and identifying a subset Os of Ov such that ∑o∈Os cost(o)≤PU×δ t and ∑o∈Os gain(o)is maximized.
- 22. On a computer network having a set of Web pages V and a set of links E between those Web pages, a system for estimating associations between seed Web pages va and vb and other Web pages viεV, comprising:memory for storing a location of the seed Web pages va and vb and the other Web pages vi in V; and a processor programmed for modeling the computer network as an undirected neighborhood graph GN(VN,EN), and programmed for constructing a directed random walk graph by creating a new vi′ in V for each viεVN, and creating two directed edges e′2×k=<vi′,vj′> and e′2×k+1=<vj′,vi′> in E for each ek=<vi,vj>εEN wherein both vi and vj are in VN, computing a penalty value penalty(vi′) for all vertices vi′εV, and constructing a |V|×|V| transition matrix T, where T[j,i] represents a transition value for each directed edge in E denoting a likelihood of moving to vertex vi from vertex vj; and calculating a steady state distribution convergence vector t of T, wherein for each vεV, t[i] represents the association between the seed Web pages and Vi.
- 23. A system as recited in claim 22, the processor further programmed for computing a penalty value penalty(vi′) for all vertices vi′εV by:computing sdist(vi′,va′) as a shortest distance in GN between vi′ and the vertex va′ corresponding to va; computing sdist(vi′,vb′) as the shortest distance in GN between vi′ and the vertex vb′ corresponding to vb; and computing the penalty value as penalty(υi′)=sdist(vi′,va′)+sdist(υi′,υb′).
- 24. A system as recited in claim 23, the processor further programmed for constructing a |V|×|V| transition matrix T by:resetting T[j,i]=0.0 for all vertices υi′εV and for all (υi′,υj′)∉E; and solving the following set of linear equations for all vertices vi′εV, L(υi′)={∑(υi′,υj′)∈ε T[j,i]=1.0}⋃{T[j,i]x penalty(υj′)=T[k,i]x penalty(υk′)❘(υi′,υj′ )∈E and (υi′,υk′ )∈E}.
- 25. A system as recited in claim 24, the processor further programmed for calculating a steady state distribution (convergence vector) t of T by solving a linear equation (I−T)t=0, where I is a unit matrix, and ∑1≤i≤vw&RightBracketingBar; t[i]=1.0.
- 26. A system as recited in claim 23, the computer network further including a set of seed Web pages |S|≧2 in V, the processor further programmed for computing a penalty value penalty(vi′) for all vertices vi′∈V as penalty(vi′)=∑uj∈S sdist (vi′,vj′).
- 27. A system as recited in claim 23, the computer network further including a set of seed Web pages |S|≧2 in V, the processor further programmed for computing a penalty value penalty(vi′) for all vertices viεV aspenalty(vi′)=length(minimum_steiner_tree (S∪{vi′})).
- 28. A system as recited in claim 27, the processor further programmed for defining a relevant neighborhood of GN(VN, EN) for constructing the random walk graph as a set of vertices, VN=VGU(S,d), that are reachable either from the vertices in S in d edge traversals such that ∀v1∈VGu(S,d)⩔vj∈SreachableGu(vj,vi,d).
- 29. A system as recited in claim 22, the processor further programmed for defining a relevant neighborhood of GN(VN,EN) for constructing the random walk graph as a set of vertices, VN=VGu(va,vb,d), that are reachable either from va or vb in d edge traversals such that ∀viεVGu(va,vb,d)reachableGu(va,vi,d)v reachableGu(vb,vi,d).
- 30. A system as recited in claim 22, each Web page viεV having a known relevance value for a particular topic relevance(v,topic), the processor further programmed for adjusting the penalty value penalty(vi′) for all vertices vi′εV by dividing penalty(vi′) by relevance(v,topic).
- 31. A system as recited in claim 22, the processor further programmed for pre-fetching Web pages into the memory in decreasing order of t[i].
- 32. On a computer network having a set of Web pages V and a set of links between those Web pages E, each Web page viεV comprising a pair <Ovav>, where Ov is a set of media objects including a main HTML file and av is a page author, a system for estimating an association between the media objects and a seed Web page vc corresponding to a current location of an end-user u, comprising:memory for storing, for each object oεOv, a known end-user preference upref(u) for the end-user u, a page author preference apref(av) for the page author av, and a location of the seed Web page vc; and a processor programmed for modeling the computer network as a directed graph G(V,E), and programmed for calculating a page preference weight pref(u,v) for an end user u for each Web page vi by applying preference rules defined by upref(u) and apref(av) to the contents of Ov, calculating an object preference weight pref(u,o,v) for each object oεOv by applying the preference rules defined by upref(u) and apref(av) to the contents of Ov, generating a random walk graph Gw having a set of vertices Vw and a set of edges Ew, calculating a page gain gain(u,v) by finding a steady state distribution convergence vector of the random walk graph, and calculating an object gain gain(u,o) for each object as gain (u,o)=∑o∈Ov gain (u,v)×pref(u,o,v),wherein the object gain represents an association between the object and the end-user or seed Web page.
- 33. A system as recited in claim 32, the processor further programmed for generating a random walk graph Gw having a set of vertices Vw and a set of edges Ew by:creating a vertex vi in Vw for each Web page in V; creating two edges e′j=<v′a,v′b> and e″j=<v′b,v′a> in Ew for each edge ej=(va, vb) in E; and assigning an edge weight w(e)=s(u,vj) to each edge ej=<va,vb> in E.
- 34. A system as recited in claim 33, wherein s(u,vj) is a known stickiness value for each Web page vjεV.
- 35. A system as recited in claim 33, wherein s(u,vj) is assigned a unit value for each edge ej=<va,vb> in E.
- 36. A system as recited in claim 33, wherein s(u,vj) is assigned a larger value than the unit value for each edge ej=<va, vb> in E that crosses a domain boundary.
- 37. A system as recited in claim 33, the processor further programmed for calculating a page gain gain(u,v) by finding a steady state distribution (convergence vector) of the random walk graph by:finding a shortest distance shortest (vc, vi) from vertex vc to all vertices viεV while taking into account the edge weight using a shortest path algorithm; for each vertex vεVw, calculating a penalty penalty(u,v)=shortest(vc,vi)/(pref(u,v)+1), and calculating a unit probability unit(u,v) by solving ∑⟨v,vi⟩∈Ew unit(u,v)penalty(u,vi)=1;calculating prob(u)(vj|vi)=unit(u,vj)penalty(u,vi) for each edge e=ε<vi,vj>εVw; and calculating gain(u,v) by finding a steady state distribution (convergence vector) of T, where T is a matrix of transition values prob(u)(vj|vi), and ∑1≤i≤vui′| gain(u,v)=1.0.
- 38. A system as recited in claim 37, the processor further programmed for calculating gain(u,v) by finding a steady state distribution (convergence vector) t of T by solving a linear equation (I−T)t=0, where I is a unit matrix, and ∑1≤i≤&LeftBracketingBar;vu&RightBracketingBar; gain(u,v)=1.0.
- 39. A system as recited in claim 38:the memory for storing a known size size(o) for each object oεOv; and the processor having an available pre-fetch bandwidth Pu and a pre-fetch duration δt for pre-fetching objects into a memory, and further programmed for identifying a set of objects Os highly associated with the end-user or seed Web page by defining a cost of each object as cost(o)=size(o), and identifying a subset Os of Ov such that ΣoεOscost(o)≦Pu×δt and ΣoεOsgain(u,o) is maximized.
- 40. A system as recited in claim 39, the processor further programmed for pre-fetching or refreshing objects from Os into the memory.
- 41. A system as recited in claim 39:the memory for storing a known expiration time expire(o) for each object oεOv; and the processor further programmed for refining the set of objects Os by removing those objects that will expire before their earliest time to view by finding a shortest path shortest_path(vc,vi) from vertex vc to all vertices viεV while taking into account the edge weight using a shortest path algorithm, calculating a measure of an earliest time that a page p may be accessed as earliest(u,p)=shortest_path(vc,vp), calculating a measure of an earliest time that an object o may be needed as earliest(u,o)=min{earliest(u,p)|pεpages(o)}, and eliminating those objects from Ov in which expire(o)<earliest(u,o) before identifying the set of objects Os.
- 42. A system as recited in claim 38:the memory for storing a known size size(o) for each object oεOv; and the processor having an available pre-fetch bandwidth PU and a pre-fetch duration δt in which a server may pre-fetch objects into a memory, and further programmed for identifying a set of objects Os highly associated with a set of users U or seed Web pages by calculating an object gain gain(o) for each object as gain (o)=∑u∈U ∑o∈Ov gain (u,v)×pref(u,o,v), wherein the object gain represents an association between the object and the set of end-users or seed Web pages, defining a cost of each object as cost(o)=size(o), and identifying a subset Os of Ov such that ∑o∈Os cost(o)≤PU×δ t and ∑o∈Os gain(o) is maximized.
CROSS-REFERENCE TO RELATED APPLICATIONS
Embodiments of the present invention claim priority from U.S. Provisional Application Serial No. 60/195,640 entitled “Random Walks for Mining the Web Page Associations and Usage in User-Oriented Web Page Refresh and Pre-Fetch Scheduling,” filed Apr. 7, 2000. The content of this application is incorporated by reference herein.
US Referenced Citations (4)
Number |
Name |
Date |
Kind |
6098064 |
Pirolli et al. |
Aug 2000 |
A |
6275858 |
Bates et al. |
Aug 2001 |
B1 |
6418433 |
Chakrabarti et al. |
Jul 2002 |
B1 |
6446061 |
Doerre et al. |
Sep 2002 |
B1 |
Non-Patent Literature Citations (1)
Entry |
Chakrabarti et al. Mining the Web's Link Structure, Aug. 1999, Computer, vol. 32, No. 8, pp. 60-67. |
Provisional Applications (1)
|
Number |
Date |
Country |
|
60/195640 |
Apr 2000 |
US |