1. Technical Field
The present invention generally relates to predicting traffic volume on the Internet, and more specifically to predicting traffic volume to assist in marketing, planning, execution, and evaluation of advertising campaigns for the Internet.
2. Related Art
The number of users on the Internet continues to grow at an astounding rate while businesses continue to rapidly commercialize its use. As they surf through websites, users generate a high volume of traffic over the Internet. Increasingly, businesses take advantage of this traffic by advertising their products or services on the Internet. These advertisements may appear in the form of leased advertising space on websites, which are similar to rented billboard space in highways and cities or commercials broadcasted during television/radio programs. Experience has shown that it can be difficult to plan, execute, and/or evaluate an advertising campaign conducted over the Internet. Unlike billboards and commercials, there are very few tools (e.g., Nielson ratings, etc.) to accurately measure or predict user traffic on the Internet.
One method for measuring exposure of advertisements posted on a website may be based on daily traffic estimates. This method allows one to control the exposure of an ad and predict the traffic volume (i.e., number of impressions, viewers, actions, website hits, mouse clicks, etc.) on a given site at daily intervals. However, there is no control over how this exposure occurs within the day itself because the method assumes a constant rate of traffic throughout the day. Experience has shown that website traffic typically exhibits strong hourly patterns. Traffic may accelerate at peak-hours, and hence, so does ad exposure. Conversely, at low traffic times, ads may be viewed at a lower rate. These daily (as opposed to hourly) estimates exhibit high intra-day errors, which result in irregular or uneven ad campaigns that are not always favored by advertisers.
This situation is illustrated in
Campaign unevenness is a symptom of prediction errors (positive or negative). As illustrated in
Because of the dynamic nature of the Internet, it is difficult to predict the amount of time it will take before advertising goals for a particular advertisement are met. Therefore, it would be beneficial to provide a mechanism to better estimate traffic volume.
Methods, systems, and articles of manufacture of the present invention may assist in planning, execution, and evaluation of advertising campaigns on the Internet. Particularly, methods, systems, and articles of manufacture of the present invention may help evaluate and/or predict traffic volume on the Internet.
One exemplary embodiment of the invention relates to a method for predicting traffic. The method may comprise receiving historical traffic data for a location, and computing a prediction of traffic volume for a particular time at the location using the historical traffic data and at least one prediction algorithm.
Additional embodiments and aspects of the invention are set forth in the detailed description which follows, and in part are obvious from the description, or may be learned by practice of methods, systems, and articles of manufacture consistent with the present invention. It is understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate several embodiments of the invention and together with the description, serve to explain the principles of the invention. In the drawings:
Reference is now made in detail to exemplary embodiments of the invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers are used throughout the drawings to refer to the same or like parts.
As discussed above, one method for predicting traffic may estimate a daily traffic volume for a location and use the estimate to compute a constant traffic rate throughout the day. However, other methods (e.g., hour-of-day means method, previous-hour method, previous-hour-plus-drift method, point-slope method, etc.) described below, may also be used to compute traffic predictions using different time intervals, such as with hourly predictions.
One exemplary method for predicting traffic may compute traffic averages for each hour of a day. The hour-of-day means (HDM) method may assume that traffic depends only on the hour of the day regardless of an overall traffic trend at other times of the day. For example, let xi,kj represent the measured traffic volume of location j during hour k of day i. Assuming
xi,kj=vkj
where vkj is a random variable with mean μkj and variance (σkj)2 that describes the traffic volume at location j according to the kth hour (k=0, . . . , 23), the family of xi,kj for i=1, 2, . . . is then a sequence of independent, identically distributed (i.i.d.) random variables. For illustrative purposes, the following example focuses on a single location. Hence, the superscript j may be dropped from the notation.
Letting Ei,k[.] denote an expectation operator conditioned on hour k of day i (i.e., the history of the traffic volume for the location is known up to hour k of day i), the HDM method may then use the expectation as a forecast of the traffic volume for the next hour, which yields
Ei,k[xi,k+1]=E[vk+1]=μk+1
As one of ordinary skill in the art of traffic estimation can appreciate, for all l less than i, the HDM method may have
El,k[xi,k]=μk
A traffic volume predictor
for each k=0, . . . , 23. Therefore, in the HDM method, the traffic volume prediction {circumflex over (x)}k at an hour k for any day is given by
{circumflex over (x)}k=
which is simply the mean of the measured traffic volume at hour k over a history of n days. The history of n days may be n consecutive or nonconsecutive days.
The variance of the predictor vk is given by
where {circumflex over (σ)}k2 is the estimated variance of the measured traffic volume at hour k over n days and is given by
Hence, the rate of reduction of the variance of {circumflex over (v)}k (in percentage terms) as the history increases from n to n+1 is n/(1+n2), or approximately 1/(1+n) as n becomes large. This result shows that gaining accuracy in traffic volume prediction may become increasingly difficult after the history grows beyond a certain number of days. Even assuming that hourly means of traffic volume are stationary (i.e., they don't change over time), accuracy in their estimation is limited by available computational resources. Because of the slowdown in the prediction's convergence and the estimated magnitude of the variance for typically measured traffic at a location, a three-month history provided to the predictor
Another exemplary method for predicting traffic may assume that traffic at a location obeys a random walk with zero mean scenario. That is, traffic at a given hour may be predicted by traffic at a previous hour plus a zero-mean, random disturbance. The previous-hour (PrevHr) method can capture the effect of “traffic momentum” (i.e., the momentum of traffic from the previous hour carries over to the next hour). For example, the PrevHr method may assume the following structure
xi,k+1j=xi,kj+εkj
where εkj is a random variable with E[εkj]=0 and var(εkj)=σε
Limiting the analysis to a single location, superscript j may be dropped from the notation. Using expectation Ei,k[.] as a forecast of the traffic volume for xi,k+1 and a history of measured traffic volume up to day i and hour k, the following equation is obtained:
Ei,k[xi,k+1]=Ei,k[xi,k+εk]=xi,k
Therefore, the predicted traffic volume {circumflex over (x)}i,k+1 at day i and hour k+1 is given by,
{circumflex over (x)}i,k+1=xi,k
which is the measured traffic volume at day i and hour k. Note that for any hour m occurring after hour k, this method may predict the traffic volume at hour m to be the last measured traffic volume in the history.
Another exemplary method for predicting traffic may combine recent traffic information (e.g., traffic information from the previous hour) and a history of changes (i.e., drift) in traffic. The previous-hour-plus-drift (PrevHr+) method assumes the changes are of an additive, incremental form and the increments are adjusted according to the hour of the day, which allows the method to accommodate daily patterns observed in historical traffic data. For example, the PrevHr+ method may assume the following structure:
xi,k+1j=Δk+1+xi,kj
where Δk is a random variable describing the traffic increment for an hour k of the day. In this equation, the following convention is used: xi,0j=Δ0+xi-1,23j.
Again, dropping the superscript j and using the expectation as a forecast for the expected traffic volume, the following equation is obtained:
Ei,k[xi,k+1]=Ei,k[Δk+1]+xi,k
As one of ordinary skill in the art can appreciate, traffic for m hours into the future may be forecasted in a recursive manner. That is, the above equation may be recursively applied to yield
using the following conventions: xi-1,24=xi,0 and Ei,k[Δk+s]=Ei,k└Δmod(k+s,24)┘. With a traffic history of n days, a traffic increment estimator may estimate the expectation Ei,k[Δk] using
Therefore, the forecast for the expected traffic volume may be rewritten as
Ei,k[xi,k+1]={circumflex over (Δ)}k+1+xi,k
and the predicted traffic volume {circumflex over (x)}i,k+1 at day i and hour k+1 is then given by
{circumflex over (x)}i,k+1={circumflex over (Δ)}k+1+xi,k
which is the estimated traffic increment at hour k+1 plus the measured traffic volume in the previous hour.
The increment estimator {circumflex over (Δ)}k may only use the most recent three months of historical traffic data to generate the estimate because using more data may not significantly reduce the variance of the estimate. Using more data may also increasingly expose the estimate to incorrect modeling due to long-term, structural changes in traffic patterns. An increment variance estimator may approximate the variance of Δk using
The variance estimator may be useful when the historical traffic data contains extreme traffic volume values or outlying data, as defined below. It is not unusual to encounter extreme values coming from errors or by omission in historical traffic data. For instance, a chain of missing values in the historical traffic data at times where traffic is typically high for a certain location may indicate that there has been some historical data capture problem. Of course, it may also mean that the location became unpopular and that traffic for those times was indeed zero. This type of atypical data is referred to as outlying data. The criteria for deciding between what is legitimate data and what is outlying data is rather subjective. However, traffic volume prediction may be improved if these extreme values are removed or corrected.
In one exemplary embodiment of the present invention, a filter may be used to correct or remove outlying data from the historical data. The filter may employ a criteria that assumes a measured traffic volume at some time (e.g., at day i and at hour k) in the historical data is outlying data when the measured traffic volume at that time lies more than Nd standard deviations from the mean of the measured traffic volume at hour k over a history of n days. For example, the filter may estimate {circumflex over (Δ)}k and {circumflex over (σ)}Δ
xi,k>xi,k−1+{circumflex over (Δ)}k+Nd{circumflex over (σ)}Δ
or
xi,k>xi,k−1+{circumflex over (Δ)}k−Nd{circumflex over (σ)}Δ
then the measured traffic volume xi,k may be classified as outlying data and the filter may substitute xi,k−1+{circumflex over (Δ)}k for xi,k in the historical traffic data. The predicted traffic volume may then be calculated using the corrected data as previously described.
Another exemplary method for predicting traffic may add another degree of freedom to the PrevHr+ method because the explanatory impact of recent traffic may vary according to the time of day in addition to a time-of-day dependent, additive shock. This method may assume a linear relationship between xi,kj and xi,k+1j, and hence, is called the point-slope method.
From the above observations, the point-slope method may assume the following structure:
xi,k+1j=ak+1j+bk+1jxi,kj+εk+1j
where akj is a mean hour-of-day additive increment, bkj is a constant or a loading for the hour prior to hour k, and εkj is a random variable (i.e., noise term) with zero mean (i.e., Ei,k[εk+1]=0) at location j and hour k. Focusing on one location (i.e., dropping superscript j), using the expectation as a forecast for the expected traffic volume, and recognizing that Ei,k[xi,k]=xi,k, the following equation is obtained:
Ei,k[xi,k+1]=ak+1+bk+1Ei,k[xi,k]+Ei,k[εk+1]=ak+1+bk+1xi,k
Traffic for more distant times in the future may be forecasted in a recursive manner. More specifically, a forecast for traffic volume m hours after the hour k may be given by
As one of ordinary skill in the art can appreciate, the point-slope method, discussed above, uses a linear regression with xi,k as regress and and xi,k−1 as regressor. The coefficients ak and bk may not be directly observable from the historical traffic data, but they may be estimated using, for example, a least squares method. The least squares method may estimate ak and bk by minimizing a sum of squared errors
where ei,k is a prediction error between a predicted traffic volume at hour k of day i and the measured traffic volume at hour k of day i. Using first-order conditions to minimize
the point-slope method may solve for coefficients âk and {circumflex over (b)}k to yield
where
with the convention xi,−1=xi-1,23. We may substitute the coefficient estimates for the coefficients ak and bk in the expected traffic volume forecast, and the predicted traffic volume {circumflex over (x)}i,k+1 at day i and hour k+1 is then given by
{circumflex over (x)}i,k+1=âk+1{circumflex over (b)}k+1xi,k
In one exemplary embodiment, the hourly traffic predictions from any of the HDM, PrevHr, PrevHr+, and point-slope methods may be combined to predict the traffic volume for a location (e.g., a website) over a period of time comprising mz hours.
Using the point-slope method as an example, let {circumflex over (x)}i,k+1,z represent the predicted traffic volume for hour k+1 of day i in time niche z. Then, {circumflex over (x)}i,k+1,z may be calculated using
{circumflex over (x)}i,k+1,z=âk+1+{circumflex over (b)}k+1xi,k
From the previous results for Ei,k[xi,k+m], the traffic volume m hours after hour k of day i at a location may be calculated using
If Hz is a set of hours k+m, then the predicted traffic volume for a location during the Hz hours may be calculated by
which is simply the sum of the individual hourly traffic volume predictions for the time defined by Hz.
In general, the point-slope method may provide consistently accurate traffic volume predictions, but when the measured traffic volume contains structural traffic changes (e.g., outlying data), the method may “blow up” (i.e., yield extraordinarily large predictions). The traffic volume predictions may be filtered to prevent the blow ups using mathematical functions, distributions, or other criteria. For example, one embodiment of the present invention may construct a test statistic filter f({circumflex over (x)}i,k), such that
where tc is a threshold estimate,
One exemplary embodiment of the present invention may use filter f({circumflex over (x)}i,k) to measure whether {circumflex over (x)}i,k is believable based on historical traffic data. A problem with this is that if a permanent regime or behavioral change occurs in a traffic pattern, then past traffic data may become irrelevant. In spite of this, filter f({circumflex over (x)}i,k) may be used to indicate whether a location's traffic pattern is stable enough for the point-slope method to be effective. If this is not the case, then when f({circumflex over (x)}i,k) is zero, one embodiment may revert to other methods (e.g., HDM method, PrevHr method, etc.) that may not blow up in the face of pattern changes.
Table 3 uses various exemplary predictability scores to compare the performance of the HDM, PrevHr, PrevHr+, and point-slope methods in predicting traffic volume at a test location for a period from Feb. 1, 2001 to Feb. 28, 2001.
The predictions were computed using a 90-day sliding window of historical traffic data (i.e., when calculating the prediction for each hour of the day, only the most recent 90 days of traffic data were used). The comparison is made in terms of hourly prediction errors, where each method observed (i.e., recorded in the historical traffic data) the traffic volume for the last 90 days up to hour k of day i and computed a prediction {circumflex over (x)}i,k+1 for the next hour's traffic based on the observation. Each method continued predicting the traffic volume for the subsequent hour as the previous hour of traffic volume was observed. Then, from the prediction and the measured traffic volumes, the prediction errors ei,k were computed, as defined by
ei,k=xi,k−{circumflex over (x)}i,k
The predictability scores in Table 3 were calculated using
(mean error),
(standard deviation),
(maximum error),
(minimum error), and
(normalized L1 score)
Although the above lists the mean error, standard deviation, maximum error, minimum error, and normalized L1 score as possible predictability scores, other metrics (e.g., total traffic, etc.) may be used as a predictability score.
From Table 3, we can see that the PrevHr+ and the point-slope methods are among the best performers. The point-slope method in particular exhibits the lowest standard deviation and maximum error. The prediction method selected may depend on a user's objectives and willingness to trade-off error mean and variance. Table 3 also shows that the point-slope model has the lowest normalized L1 score. This may come at the expense of a higher mean error. However, this mean error may be orders of magnitude below what a method using daily means (instead of hourly predictions) would yield.
Predictability scores may provide a good criterion for selecting a method of predicting traffic based on a desired smoothness in deployment of an ad campaign. A smoothly deployed ad campaign exposes users to advertisements at a predictable pace. Hence, a smooth ad campaign may use a method that accurately predicts traffic volume. In contrast, an unsmoothly deployed ad campaign exposes users to advertisements unpredictably or even haphazardly until the exposure reaches a predetermined level that signifies the end of the campaign.
A predictability score gives a measure of the size of a method's prediction error for an analyzed time period. That is, it may give a measure of a location's traffic predictability and may be used to compare the predictability of different locations. This is an important criterion when seeking smooth campaigns because it provides a comparison metric across different locations. The predictability score may be used for campaign decision-making. Campaigns with a high smoothness priority may deliver ads at locations based on the knowledge that the locations with a better predictability score may be more predictable and are likely to deliver smoother campaigns. Note that a first location's predictability score may be better than a second location's predictability score if the first score is lower or higher than the second score.
For example, consider the normalized L1 score in Table 4 for a second location B during the month of February. Compared with the performance results in Table 3, the location for Table 4 may be deemed less predictable because its normalized L1 score using the point-slope model is 12%, which is lower than the score (6%) for Table 3's location. However, the second location has less total traffic (i.e., 8,962,345 impressions) than the first location (i.e., 92,407,331 impressions). In general, lower traffic locations may be less predictable, so a predictability score based on total traffic would be better if it is higher.
It may be better to direct smoothness-sensitive campaigns towards locations with a better predictability score. Generalizing this idea, we can form a predictability map that compares how safe (in terms of smoothness) a location is relative to other locations.
where G is a set of all locations j in the group, Tj is location j's total traffic per unit of time (i.e., day), and PRj is the predictability score of location j.
For example, using the map in
According to features and principles of the present invention and as illustrated in
According to features and principles of the present invention, system 600 may be configured to implement exemplary method 700, illustrated in
Particularly, the historical traffic data may include observations of the traffic volume xi,k at the website at each hour k of day i for any number of days. The observations may be made by processor 604, counters at the website, or any other mechanism. Besides websites, the location may be any other place where traffic passes through or attendance can be measured and/or observed. For example, a location may be a highway, a street, a television channel, a radio station, or any other place where traffic information is obtainable.
Consistent with features and principles of the present invention, processor 604 may identify one or more time-dependent parameters based on the historical traffic data (step 704). For example, processor 604 may estimate the parameters âk, {circumflex over (b)}k, {circumflex over (x)}k, {circumflex over (x)}i,k, {circumflex over (x)}i,k,z, {circumflex over (σ)}k, {circumflex over (σ)}k2, {circumflex over (Δ)}k, {circumflex over (d)}z,
Processor 604 may compute a traffic volume prediction (step 706), consistent with features and principles of the present invention. The prediction may be computed using any of the methods discussed herein and it may be the predicted traffic volume for the next hour, day, time niche, or other time period. Processor 604 may then compare the prediction against actual measured traffic volume data (step 708). The actual traffic volume data may reflect visits, hits, etc. by users at a location (e.g., website) via computers 608 or 610. In one embodiment, processor 604 may make the comparison by calculating ei,k.
Consistent with features and principles of the present invention, processor 604 may then compute a predictability score for the location (step 710). The predictability score may be a normalized L1 score, a mean error, a maximum error, a minimum error, or any other metric. When ei,k is calculated, the computed predictability score may also be based on ei,k.
Additionally, processor 604 may perform steps 702 to 710 to compute a predictability score of another location. System 600 may execute an ad campaign based on the predictability scores of the two locations using an exemplary method 800 illustrated in
According to features and principles of the present invention, during the life of the ad campaign, processor 604 may adjust an advertising schedule of the ad campaign (step 810) to compensate for differences or variances between predicted and actual traffic. The advertising schedule may include the planned times and locations where processor 604 intends to place ads, as determined in steps 802 to 806. As an ad campaign progresses, processor 604 may predict the traffic volume at various locations for a window of W days (e.g., processor 604 may predict the traffic volume for multiple hours at a website, as previously discussed). Processor 604 may then use the predictions to adjust the advertisement delivery schedule within the time window.
In the foregoing description, various features are grouped together in various embodiments for purposes of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects may lie in less than all features of a single foregoing disclosed embodiment. Thus, the following claims are hereby incorporated into this description, with each claim standing on its own as a separate embodiment of the invention. Furthermore, as used herein, the words “may” and “may be” are to be interpreted in an open-ended, non-restrictive manner.
This application is a continuation of application Ser. No. 10/231,025, filed on Aug. 30, 2002, now U.S. Pat. No. 7,668,946 and claims the benefit of U.S. Provisional Application No. 60/316,022, filed on Aug. 31, 2001, all of which are incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
5751969 | Kapoor | May 1998 | A |
5753809 | Ogusu et al. | May 1998 | A |
6236977 | Verba et al. | May 2001 | B1 |
6411681 | Nolting et al. | Jun 2002 | B1 |
6560204 | Rayes | May 2003 | B1 |
6629138 | Lambert et al. | Sep 2003 | B1 |
6715005 | Rodriguez et al. | Mar 2004 | B1 |
6721405 | Nolting et al. | Apr 2004 | B1 |
6801945 | Lin et al. | Oct 2004 | B2 |
6810356 | Garcia-Franco et al. | Oct 2004 | B1 |
6836800 | Sweet et al. | Dec 2004 | B1 |
6876988 | Helsper et al. | Apr 2005 | B2 |
7031932 | Lipsky et al. | Apr 2006 | B1 |
7130808 | Ranka et al. | Oct 2006 | B1 |
20010054097 | Chafe | Dec 2001 | A1 |
20020042821 | Muret et al. | Apr 2002 | A1 |
20020111847 | Smith, II | Aug 2002 | A1 |
20020169657 | Singh et al. | Nov 2002 | A1 |
Number | Date | Country | |
---|---|---|---|
60316022 | Aug 2001 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 10231025 | Aug 2002 | US |
Child | 12650200 | US |