The slots in any of the topics section 16, the contents section 18, and the advertisements section 20 may be filled with different user-selectable objects over time. For example, the slots of the topics section 16 may be populated with various topical user-selectable contents that relate to different topics (e.g., entertainment, politics, finance, nature); the slots of the contents section 20 may be filled with various content-based objects (e.g., stories, articles, and other information available on the World Wide Web); and the slots of the advertisements section 20 may be filled with various advertisements. Although a variety of different methods made by used to populate the variable content sections of the web page 10 with different user-selectable contents over time, both the owner and the users of the web site typically benefit by prioritizing these user-selectable contents in a way that increases the number of times the contents are selected (or clicked on) by the users: the owner typically benefits by increasing the revenues and the popularity of the web site; and the users benefit by being able to quickly access information that is most likely to be relevant to the users' interests.
For this reason, content providers vie for users' limited attention by resorting to a number of strategies aimed at maximizing the number of clicks devoted to their web sites. These strategies range from data personalization and short videos to the dynamic rearrangement of items in a given page, to name a few. In all these cases, the ultimate goal is the same: to draw the attention of the visitor to a website before she proceeds to the next one. A variety of different factors, such as the location and size of the user-selectable content on a web page, affect the amount of attention that a particular user-selectable content will receive. For example, user-selectable contents appearing at the top of a web page typically will generate more page clicks than user-selectable contents appearing at the bottom of the web page. The goal for many content providers is to optimize these factors so as to maximize the number of clicks on the web page. Most solutions to the problem of website relevance are based on either page rank (like the Google algorithm) or heuristics used by the editors of the page. Neither of these strategies, however, can guarantee a maximum number of clicks per interval of time.
What are needed are improved systems and methods of populating variable content slots on a web page.
In one aspect, the invention features a method in accordance with which a respective novelty value is ascertained for each of multiple user-selectable contents. Each of the novelty values represents a level of newness of the respective user-selectable content in relation to the other user-selectable contents. A respective novelty decay value is calculated for each of the user-selectable contents as a decreasing function of the respective novelty value. A prioritization order of the user-selectable contents in respective prioritized positions on a web page is determined based on the novelty decay values.
The invention also features apparatus operable to implement the inventive methods described above and computer-readable media storing computer-readable instructions causing a computer to implement the inventive methods described above.
Other features and advantages of the invention will become apparent from the following description, including the drawings and the claims.
In the following description, like reference numbers are used to identify like elements. Furthermore, the drawings are intended to illustrate major features of exemplary embodiments in a diagrammatic manner. The drawings are not intended to depict every feature of actual embodiments nor relative dimensions of the depicted elements, and are not drawn to scale.
The term “user-selectable content” refers broadly to any visually perceptible element (e.g., images and text) of a web page that is associated with a respective interface object (e.g., a link to a network resource or other control that is detectable by a web server) that is responsive to a user's execution command (e.g., click) with respect to the user-selectable content. The term “click” refers to the act or operation of entering or inputting an execution command (e.g., clicking the left computer mouse button).
A “link” refers to an object (e.g., a piece of text, an image or an area of an image) that loads a hypertext link reference into a target window when selected. A link typically includes an identifier or connection handle (e.g., a uniform resource identifier (URI)) that can be used to establish a network connection with a communicant, resource, or service on a network node.
As used herein, the term “web page” refers to any type of resource of information (e.g., a document, such as an HTML or XHTML document) that is suitable for the World Wide Web and can be accessed through a web browser. A web page typically contains information, graphics, and hyperlinks to other web pages and files. A “web site” includes one or more web pages that are made available through what appears to users as a single web server.
A “slot” refers to a position on a web page that contains user-selectable content that can be changed dynamically (e.g., each time the web page is refreshed).
A “computer” is a machine that processes data according to machine-readable instructions (e.g., software) that are stored on a machine-readable medium either temporarily or permanently. A set of such instructions that performs a particular task is referred to as a program or software program. A “server” is a host computer on a network that responds to requests for information or service. A “client” is a computer on a network that requests information or service from a server.
The term “machine-readable medium” refers to any medium capable carrying information that is readable by a machine (e.g., a computer). Storage devices suitable for tangibly embodying these instructions and data include, but are not limited to, all forms of non-volatile computer-readable memory, including, for example, semiconductor memory devices, such as EPROM, EEPROM, and Flash memory devices, magnetic disks such as internal hard disks and removable hard disks, magneto-optical disks, DVD-ROM/RAM, and CDROM/RAM.
A “network node” is a junction or connection point in a communications network. Exemplary network nodes include, but not limited to, a terminal, a computer, and a network switch. A “network connection” is a communication channel between two communicating network nodes.
A “resource” is network data object or service that can be identified by a link. A resource may have multiple representations (e.g., multiple languages, data formats, size, and resolutions).
A “predicate” is a conditional part of a rule. An “access control predicate” is a predicate that conditions access (typically to a resource) on satisfaction of one or more criteria.
As used herein, the term “includes” means includes but not limited to, the term “including” means including but not limited to. The term “based on” means based at least in part on.
The embodiments that are described herein provide methods and apparatus for populating variable content slots on web pages with user-selectable contents (e.g., advertisements, topic files, and other variable contents) in a way that increases the attention that is drawn to the web page. These embodiments provide a principled way of prioritizing user-selectable contents when designing dynamic websites. In some embodiments, the rates with which novelty and popularity evolve within the website are translated into a prioritization ordering of the user-selectable contents. Some embodiments, are designed to guarantee a maximal level of attention (e.g., a maximum number of clicks per interval of time) when deciding between strategies (or procedures) for ordering user-selectable contents on a web page.
The web site 32 typically is hosted by a web server. In some embodiments, the content prioritization system 30 is implemented on the web server that hosts the web site 34. In other embodiments, the content prioritization system 30 is implemented on another server that responds to requests from the web server for a prioritized ordering of the selected ones of the user-selectable contents 34 on the one or more pages of the web site 34. In these embodiments, the user-selectable contents 34 may be selected by the web server, the content prioritization system 30, or another server (e.g., an advertisement server).
A user 38 interacts with the web site 34 by sending a request 40 to the web server for a page of the web site 34. In response, the web server returns the requested page 42 to the user 38. Historical data characterizing the user's interactions with the web site, including user selections of user-selectable contents on the one or more web pages, are collected and analyzed using analytical methods (e.g., the methods provided by Google® analytics software). This data may be collected and analyzed by the web server or by another server. The results 39 of the analysis of the relevant historical data typically are transmitted to the content prioritization system 30 for use in determining the prioritization ordering of the user-selectable contents 34.
The web server typically refreshes the web page 42 on a regular cycle (e.g., every five minutes). In some embodiments, the content prioritization system 30 determines a prioritization order of the selected user-selectable contents during each refresh period. On each web page, the variable content slots typically are prioritized by the likely amounts of attention that user-selectable contents are expected to receive from users when they are placed in those slots. In some embodiments, the variable content slots are prioritized by their respective positions on the web page. For example, a user-selectable content in a variable content slot at the top of a web page typically draws more attention than a similar user-selectable content. If the prioritization ordering of the contents changes, the user-selectable contents in the variable content slots of the web page are changed as needed in the following refresh of the page to reflect the changed prioritization order.
The elements of the method of
A. Ascertaining Novelty Values and Popularity Values
The content prioritization system 30 ascertains for each of the user-selectable contents a respective novelty value representing a level of newness of the graphic image in relation to the other user-selectable contents (
In some embodiments, the content prioritization system 30 additionally ascertains a respective popularity value for each of the user-selectable contents. Each of the popularity values represents a level of popularity of the user-selectable contents in relation to the other user-selectable contents. The process of ascertaining the respective popularity values typically is based on respective counts of user selections of the link associated with the user-selectable content. For example, in the illustrated embodiments, the popularity values are given by the total numbers of clicks (Nt) generated from the respective user-selectable contents in each period t
B. Calculating Novelty Decay Values
1. Introduction
The content prioritization system 30 calculates for each of the user-selectable contents a respective novelty decay value as a decreasing function of the respective novelty value (
In some embodiments, the content prioritization system 30 calculates the respective novelty decay values by calculating each of the respective novelty decay values as a decreasing exponential function of the respective novelty value. In some of these embodiments, this process involves, for each of the user-selectable contents (j) calculating the respective novelty decay value (rj(tj)) in accordance with equation (1):
r
j(tj)=a·e−d(t
where tj is the respective novelty value, d(tj)=α(tj)a, a is a weighting factor, and α and β are parameters that have respective values. In some embodiments, the values of the parameters α and β are determined based on a statistical evaluation of historical data characterizing user selections of user-selectable contents on the web page.
2. Location Matters
The location of a link in a page determines the overall number of clicks in a given time interval. In particular, the order in which user-selectable contents are placed within a web page (e.g. the news stories of digg.com) determines the number of clicks within a certain time frame. Assume that time flows discretely as t=1, 2, . . . minutes. Let Nt denote the number of clicks (or, for example, the digg number of a story in digg.com) that appeared on the website t minutes ago (in this case we say that the story has lifetime t). The growth of Nt satisfies the following stochastic equation:
N
t+1
=N
t(1+artXt), (2)
where rt is a novelty factor that decays with time and satisfies ro=1, Xt is a random variable with mean 1, and a is a positive constant.
This equation takes into account two factors that together influence the growth of collective attention: popularity and novelty. The popularity effect is captured by the multiplicative form of equation (2), and the novelty effect is described by rt. All other factors are contained in the noise term Xt.
In addition to popularity and novelty, there also is a third position factor. A user-selectable content displayed at a top position on the front page easily draws more attention than a similar user-selectable content placed on later pages. Hence the growth decay art should depend on the physical position at which the user-selectable content is presented.
In the specific case of digg.com, its front page is divided into 15 slots, being able to display 15 stories at a time. The user-selectable contents are always sorted chronologically, with the latest user-selectable content at the top. If the positions are labeled from top to bottom by i=i, 2, . . . , 15, we can modify equation (2) to allow for an explicit dependency of a on i:
N
t+1
=N
t(1+airtXt), (3)
where ai is a position factor that decreases with i.
The assumption that the novelty effect and the position effect can be separated into two factors rt and ai was tested empirically. To this end the growth rate was tracked for each slot, rather than for each story. For multiplicative models it is convenient to define the logarithmic growth rate
s
t=log Nt+1−log Nt. (4)
When a is small (which is always true for short time periods) we have from Equation (3)
sti≈airtXt (5)
for a story placed at position i at time t. Taking the expected value of both sides, we have
Esti≈airt, (6)
since EXt=1.
The logarithmic growth rate sti can be measured as follows. For each fixed position i, if a digg story appears on that position at both times t and t+5 (the front page is refreshed every 5 minutes), then the observed quantity
counts as one sample point of sti.
From the historical data shown in
where tj is the lifetime of the j′th data point. The estimator for the 1,220 data points obtained from the top position is calculated to be â1=0.120. The fitted curve
is shown as a solid curve in
C. Determining a Prioritization Order of the User-Selectable Contents
The content prioritization system 30 determines a prioritization order of the user-selectable contents in respective prioritized positions on the web page based on the novelty decay values (
Some embodiments are modeled in an infinite-horizon framework in which future clicks are discounted with a discount parameter δ, so that one click at time t counts as δ′ click at time 0. In these embodiments, the objective is to maximize
where Nt is the total number of dicks generated from the user-selectable contents on the web page in period t.
Other embodiments are modeled with the finite-horizon objective. In these embodiments, the variable content slots of a web page are populated with user-selectable contents in a way that generates the largest number of clicks within a certain finite time period T. Some of these embodiments employ ordering strategies called indexing strategies, which are defined as follows. Given a state of a user-selectable content (which in the illustrated embodiments is a two-vector (Nt, t)) an index O is calculated for each user-selectable content using a predefined index function O(Nt, t), and then sorts the user-selectable contents based on their respective indices. In some embodiments, the slots on the web page are populated in descending order, with the user-selectable content with the largest index displayed at the top, the user-selectable content with the second largest index displayed next, and so on.
Some of these embodiments employ an indexing strategy that prioritizes user-selectable contents that are predicted to receive the most attention in the next time period in accordance with equation (8):
O
1(t)=N1rt. (8)
In these embodiments, the process of determining the prioritization order for each of the user-selectable contents involves determining the respective index value from a respective multiplication together of the respective popularity value (Nt) and the respective novelty decay value (rt). This is a “one-step-greedy” strategy. Ignoring the position effect (i.e., assume a=1), a user-selectable content in state (Nt, t) generates on average Ntrt more clicks (or “diggs” in the case of the digg.com web site) in the next period. This strategy thus places the most “replicated” story at the top of a web page.
In accordance with the method of
In this embodiment, the process of determining the prioritization order involves selecting one of multiple different prioritization procedures based on the one or more ascertained parameter values and determining the prioritization order in accordance with the selected prioritization strategy. In particular, if the one or more parameter values satisfy a predicate for a first prioritization procedure (
In some embodiments, the selection process involves selecting between (i) a first prioritization procedure that assigns ones of the user-selectable contents determined to be higher in novelty to higher priority ones of the locations on the web page than ones of the user-selectable contents determined to be lower in novelty and (ii) a second priotization procedure that assigns ones of the user-selectable contents determined to be higher in popularity to higher priority ones of the locations on the web page than ones of the user-selectable contents determined to be lower in popularity.
In some of these embodiments, the first prioritization procedure involves sorting the user-selectable contents by their novelty, with the newest user-selectable contents at the top, in accordance with equation (9):
O
2(t)=−t (9)
The second prioritization procedure involves sorting the user-selectable contents by their popularity, with the most popular user-selectable contents at the top, in accordance with equation (10):
O
3(t)=Nt (10)
Notice that because Nt grows with time, the effect of sorting by O2 is almost the opposite of sorting according to O3.
A rough estimate of the performance of the prioritization strategies O2 and O3 can be obtained as follows. For the sake of generality, assume that there are m positions on the front page. New stories arrive at a rate λ>0. Novelty decays as rt=e−w
be the average position factor, which equals 0.08 for digg.com. Let Δt be the refresh time step, which is five minutes for digg.com.
Consider strategy O3 first. According to the index rule, new user-selectable contents never appear on the front page. In the case of dig.com, all diggs are generated by the initial m stories. After time T we have from equation (4) that
Hence, on average each story's log-performance is
When T is large, we have
Next consider O2, which orders the user-selectable contents by their respective lifetimes (t). On average every s=1/λ minutes a new user-selectable content replaces an old user-selectable content, and each old story moves down one position on the web page. Hence, on average each user-selectable content stays on the front page for ms minutes, where m is the number of positions. The quantity ms is referred to as one page cycle, which is the average time it takes to refresh the whole page. Before a story disappears from the front page, it generates
clicks, where i(t) is the position of the user-selectable content at time t. When a user-selectable content gets replaced by a new user-selectable content, they are counted as one user-selectable content restarting from the state Nt=1 and t=0. The multiplicative process starts over, and another Nms, clicks are generated in the next ms minutes, on average. Thus, in a total time period T the process is repeated T/(ms) times, and a total number of NmsT/(ms) clicks are generated per user-selectable content. The log-performance of O2 is approximately
where ai(t) is replaced by ā since on average each user-selectable content stays in position 1, . . . , m for equal times. Taking the expected value of both sides, yields:
The critical point can be determined by equating Equation (12) and (15):
which holds for any functional form of rt. The left side of equation (17) can be interpreted as the total novelty left after a time ms, or the total log-performance that can be gained from one user-selectable content after one page cycle. The right hand side of equation (17) is the total log-time left after one page cycle. Thus, equations (17) and (19) say that, after one page cycle, if there is more novelty left than the log-time remained, the user-selectable contents should be ordered by decreasing popularity rather than by decreasing novelty (O3 is better than O2). Conversely, if novelty decays too fast (not enough novelty left after one page cycle), then the user-selectable contents should be ordered by decreasing novelty rather than decreasing popularity (O2 is better than O3).
When rt=e−w
is the incomplete Gamma function. In this case the critical equation can also be written as
For the parameters of digg.com (ā=0.08, m=15, s=20) and horizon T=50,000 one can solve for the critical curve (α,β) on which O2 and O3 have the same performance.
A simulated was built to test the prioritization strategies O1, O2, and O3. The simulator closely resembles the functioning of digg.com in that it incorporates the following rules:
ΔNt+5=Nt+5−Nt5airtXtN1. (22)
The performance of all three index functions O1, O2, and O3 were tested in the simulator. For each index function, Steps 2 to 5 were repeated 100,000 times (or equivalently 500,000 minutes). Strategy O2 (sort by novelty) achieved a total number of 514,314.8 diggs. Strategy O3 (sort by popularity) only generated 354.6 diggs. Strategy O1 (one-step-greedy) generated 452,402.3 diggs. Thus for these parameter values O2 turns out to be best strategy, since it is 13.7% better than O1 and tremendously better than O3.
The reason for the relatively poor performance of the index O3 is easy to understand. Strategy O3 gives higher priority to stories that have been dugg many times. According to the indexing rule, after one period new stories can never find their way to the front page since all the old stories have more than 1 digg! When novelty decays fast, the old stories remaining on the front page soon lose their freshness and cease to generate any new diggs. The system thus gets frozen in an unfruitful state.
The fact that O2 outperforms O1 is a bit harder to understand. Some intuition can be gained by considering an extreme case. Suppose each user-selectable content completely loses its novelty after one second (ro=1, rt=0 for all t>0). Then only “new arrivals” should be displayed since they are the only ones that can generate new diggs. Sorting stories by their lifetime is a good idea when novelty decays fast. On the other hand, if novelty never decays (rt=1), the lifetime factor becomes irrelevant. Thus in this case, strategy O1, which prioritizes popular stories, will win over O2. Hence, the fact that O2 works better than O1 in the simulations shows that novelty decays relatively fast for digg.com. Should it decay at a slower rate, O1 would be a better choice.
Note that the simulation only showed that the ordering implied by O2 works better than O1 for a particular choice of T. In general this may not be true for other values of T. In fact, for a time interval of T=5 minutes (one time step) O1 is by definition the best strategy. Hence, comparing the performance of two or more index functions only makes sense after one has specified a time horizon (or how much the future should be discounted if an infinite horizon is assumed).
In order to quantitatively test the limiting behavior of the three indexing strategies, the simulations were repeated for a range of different values of the decay parameter rt. In the illustrated embodiments, the decay parameter rt is modeled by a function that decays as a stretched exponential function, whose general form can be written as rt=e−w
The performance of each indexing strategy is measured by the logarithm of the total number of diggs generated in 10,000 rounds. As β increases (faster decay), the number of diggs decreases for all three indexing strategies. When β>0.34, O2 performs slightly better than O1 and much better than O3. When β<0.33, however, O1 and O3 perform significantly better than O2. In other words, on the two sides of the value of β=0.335, the stories should be displayed in completely reversed order. This phenomenon is referred to as a phase transition that takes place at the value of β=0.335 (see
O
1′(Nt,t)=log O3(Nt,t)=log Nt+log rt. (23)
Clearly, O1′ linearly trades off between log Nt and log rt, assigning identical weight to the two effects. This is by no means the best tradeoff. For example, the index function
O
4(Nt,t)=0.6 log Nt+log rt (24)
achieves 556,444.1 diggs after 100,000 rounds of simulation, which is 8.2% more than O2 and 23.0% more than O1.
In general, the content prioritization system 30 typically includes one or more discrete data processing components, each of which may be in the form of any one of various commercially available data processing chips. In some implementations, the content prioritization system 30 is embedded in the hardware of any one of a wide variety of digital and analog electronic devices, including desktop and workstation computers, digital still image cameras, digital video cameras, printers, scanners, and portable electronic devices (e.g., mobile phones, laptop and notebook computers, and personal digital assistants). In some embodiments, the content prioritization system 30 executes process instructions (e.g., machine-readable code, such as computer software) in the process of implementing the methods that are described herein. These process instructions, as well as the data generated in the course of their execution, are stored in one or more computer-readable media. Storage devices suitable for tangibly embodying these instructions and data include all forms of non-volatile computer-readable memory, including, for example, semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices, magnetic disks such as internal hard disks and removable hard disks, magneto-optcal disks, DVD-ROM/RAM, and CD-ROM/RAM.
Embodiments of the content prioritization system 30 may be implemented by one or more discrete modules (or data processing components) that are not limited to any particular hardware or software configuration, but rather it may be implemented in any computing or processing environment, including in digital electronic circuitry or in computer hardware, firmware, device driver, or software. In some embodiments, the functionalities of the modules are combined into a single data processing component. In some embodiments, the respective functionalities of each of one or more of the modules are performed by a respective set of multiple data processing components. The various modules of the content prioritization system 30 may be co-located on a single apparatus or they may be distributed across multiple apparatus; if distributed across multiple apparatus, the modules may communicate with each other over local wired or wireless connections, or they may communicate over global network connections (e.g., communications over the internet).
A user may interact (e.g., enter commands or data) with the computer 120 using one or more input devices 130 (e.g., a keyboard, a computer mouse, a microphone, joystick, and touch pad). Information may be presented through a user interface that is displayed to the user on a display monitor 160, which is controlled by a display controller 150 (implemented by, e.g., a video graphics card). The computer system 120 also typically includes peripheral output devices, such as speakers and a printer. One or more remote computers may be connected to the computer system 120 through a network interface card (NIC) 136.
As shown in
The embodiments that are described herein provide methods and apparatus for populating variable content slots on web pages with user-selectable contents (e.g., advertisements, topic tiles, and other variable contents) in a way that increases the attention that is drawn to the web page. These embodiments provide a principled way of prioritizing user-selectable contents when designing dynamic websites. In some embodiments, the rates with which novelty and popularity evolve within the website are translated into a prioritization ordering of the user-selectable contents. Some embodiments, are designed to guarantee a maximal level of attention (e.g., a maximum number of clicks per interval of time) when deciding between strategies (or procedures) for ordering user-selectable contents on a web page.
Other embodiments are within the scope of the claims.