An advertiser whose advertisement is displayed on a billboard may pay a price determined based on the number of impressions to an audience near the billboard. In some situations, the number of impressions or a count of a number of people who had a chance to view the content displayed may be estimated. For instance, the number of impressions of an advertisement displayed on a billboard may be estimated based on the number of people in vehicles passing nearby the location of the billboard. The number of people in vehicles near a billboard can be estimated from smartphone triangulation data.
The methods, systems and/or programming described herein are further described in terms of exemplary embodiments. These exemplary embodiments are described in detail with reference to the drawings. These embodiments are non-limiting exemplary embodiments, in which like reference numerals represent similar structures throughout the several views of the drawings, and wherein:
In the following detailed description, numerous specific details are set forth by way of examples in order to facilitate a thorough understanding of the relevant teachings. However, it should be apparent to those skilled in the art that the present teachings may be practiced without such details. In other instances, well known methods, procedures, components, and/or system have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present teachings.
The present teaching is directed to improving time series forecasting to derive impression counts in a hybrid mode based on profile metrics obtained from measured impression counts. Details related to the hybrid mode in estimating impression counts will be disclosed below. In some embodiments, the measured impression counts are obtained with respect to a site, which can be a billboard or a wall space, where some content (such as advertisement) is displayed and viewable by others near the site. An impression count associated with a particular site may be measured with respect to a time unit such as an hour, over a period of time such as a day. The impression count may be measured by counting the number of people who are near the site within each time unit. For instance, the impression count in an hour with respect to a billboard along the east bound of a highway may be estimated based on the number of people in vehicles passing through the billboard viewshed within the hour when traveling east bound on the highway. In some embodiments, the billboard viewshed may be defined as a geographical region centered around the billboard within which a user is considered to have an unobstructed view of the billboard at an angle and distance that the content on the billboard is readable. An example includes a semi-circle area in front of a billboard that spans 60 degrees in each direction from the center, with a total span of 120 degrees without any obstructions within that space up to a radial distance of 600 m from the billboard. The number of people in vehicles within the viewshed of a billboard may be estimated from position (optionally also velocity) inferred from location-tracking techniques such as radio or GPS triangulation.
In estimating the number of impressions with respect to a site, a communication service provider may be able to do so based on logged locations of its customers, which may then be used to estimate the overall impression count (I-counts) of the entire population by extrapolating the impression count from the service provider according to its market share. For instance, for a particular billboard, a communications service provider A may estimate the I-counts of the site based on registered locations of its customers. If the local market share of service provider A is 30%, then the overall I-counts for the site from the entire population may be estimated by scaling up (e.g., I-counts/0.3) or extrapolating the I-counts based on the known market share. The market share distribution at each locale may differ based on the locale or the market share in the home location of the customer, so that the market I-counts determined this way are adaptive to the market share distribution in each locale and each home location of the customer.
Smartphone location data can sometimes be inconsistently captured and logged. Such inaccuracies may impact the ability to correctly estimate the I-counts. Thus, in some situations, corrections to the measured I-counts may be needed. In some respects, correction comprises: 1) recognizing a situation where a correction is needed, and 2) determining a specific correction scheme to be applied that is appropriate for the recognized situation. To recognize a situation where correction is needed, residuals between measured and a corresponding estimated I-counts, such as example 130 shown in
Other criteria may also be defined for detecting a situation for correction. In some embodiments, a measured I-count is to be corrected if it is 3σ less than the model predicted I-count and 50% smaller than the model predicted I-count. These exemplary conditions are formulated based on individual I-count values measured at different time instances so that they may not be able to capture any characteristics of data patterns that extend over a period of time (e.g., in several hours or in a day). In addition, when hard thresholds are used without identifying specific data characteristic type, correction based on such an individual I-count based scheme may result in inconsistent correction results for I-counts having the same type of underlying data characteristics. So, reliably recognizing that multiple I-counts involved in the same data characteristic pattern is crucially important. The present teaching detects different correction situations based on profile metrics, obtained from a profile with multiple I-counts to capture persistent characteristics associated with different types of data patterns. Based on detected data patterns, the present teaching then determines correction schemes based on characteristics associated with different types of data patterns. Details about the profile metrics-based detection of correction situations will be provided below.
The second task is related to how to correct measured I-counts associated with a detected type of data failure. As illustrated in
Each of these exemplary data situations exhibit different characteristic patterns which may call for corrections appropriate for each specific situation. For the second task associated with correction, the present teaching discloses an adaptive approach to dynamically determine how to correct measured I-counts in a profile according to the types of data characteristics detected based on profile metrics computed from the profile data.
The present teaching is directed to improving the quality of impression counts associated with content displayed on a display site. Specific aspects of the present teaching relate to detecting different types of data characteristics based on metrics computed from profile data and then based on specific type(s) of detected characteristics to carrying out correction appropriate for the detected types of data pattern.
As discussed herein, the impression counts with respect to each site may differ, e.g., because the crowd gathering patterns around different sites may be different, the forecasting TS model for each site is established based on the I-counts from that specific site in order to capture the characteristics specific to the site. As shown in
As discussed herein, in some situations, MI-counts collected from a site may not represent the overall impression volume with respect to the site. A service provider, such as a wireless smartphone service carrier, may estimate the impression volume associated with a site based on registered home location data of its customers. Then the service provider's known market share may be used to estimate the overall impression volume of the site by, e.g., scaling up or extrapolating the collected MI-count data volume according to known market share information. For example, if a wireless phone carrier has a market share of 50%, then the MI-count volume estimated by the carrier with respect to a specific site may be doubled (scaled up according to the market share) to come up with the total market population for the site as an estimate of the entire market. This approach may be applied to each site so that the overall estimated I-count volume may adapt to the market share distribution of each locale and the home location of the customers that visit this locale. The market I-count generation unit 220 is provided to take collected MI-counts for each site and generate an overall market I-count via, e.g., extrapolation based on market share information associated with the geographical region around the site.
The total population of I-count for each site derived via extrapolating may then be used to develop a forecasting TS model for that site. Different forecasting models may be used, including, without limitation, statistical models and machine learned forecasting models. Statistical models may include, e.g., Moving Average, Exponential Smoothing, Box-Jenkins, Drift Method, Naive Method, multiple linear regression, autoregressive integrated moving average (ARIMA), seasonal autoregressive integrated moving average (SARIMA), and autoregressive integrated moving average with explanatory variable (ARIMAX). In some applications, machine learning may also be used to derive a forecasting TS model based on, e.g., training data related to each site. These time series models may be univariant, bivariant, or multivariant. It is understood that these exemplary forecasting TS models are merely for illustration purposes instead of as limitations. Any other forecasting timeseries models appropriate may also be used.
Based on the extrapolated market I-counts from different sites, the forecasting TS model generator 230 generate a forecasting TS model for each site. This yields a plurality of forecasting TS models 240, each of which is for a corresponding site, which captures the characteristics of the impression patterns related to each site and may be used to estimate or forecast I-counts. As discussed herein, discrepancies exist between market I-counts and a corresponding forecasting TS model. Such residuals may be used to determine how a measured I-count may be corrected. The residual determination unit 250 is provided for computing, respectively, residuals between the measured and estimated I-counts for each site.
The hybrid I-count determiner 260 in
Each MI-count associated with a site may be aggregated with respect to a pre-defined time unit (e.g., every hour), determined, e.g., by a total number of viewers appearing within a certain distance of the site. An MI-count associated with a specific site (e.g., a billboard) may be stamped with a time (e.g., a particular hour of a specific day). Thus, MI-counts associated with a site form a sequence with the MI-counts arranged in accordance with their corresponding times. Each of the MI-counts for each site may then be used to generate, at 215, the market I-counts for the site via, e.g., extrapolation to obtain a sequence of market I-counts for the site. At 225, each market I-counts sequence for a site may then be utilized, by the forecasting TS model generator 230, to derive a corresponding forecasting TS model for the site.
In some embodiments, some of the market I-counts associated with a site may be excluded from being used to establish a forecasting TS model. For example, the market I-counts in a profile subject to correction may not be used to derive the forecasting TS model. For instance, the I-counts from the last day of a collection period may form a profile that is subject to correction so that they may not be used for deriving a TS model. This is illustrated in
As discussed herein, residuals between measured and estimated (from forecasting TS models) I-counts may play a role in detecting situations where corrections may be needed. To facilitate that, the residual determination unit 250 may be invoked to compute, at 235, residuals for each site between market I-counts and corresponding I-count values from the TS model. The market I-counts, the corresponding forecasting TS model, and residuals computed for each site may then be input to the hybrid I-count determiner 260. Based on these inputs, the I-count determiner 260 may access, at 245, a metric-based replacement model 270 to compute certain metrics for detecting different types of data characteristics and then carries out, at 255, hybrid correction to the profile data before it outputs, at 265, the corrected I-counts. Details related to the hybrid I-count determiner 260 are provided below with reference to
The profile metrics determiner 300 receives profile data and computes different profile metrics therefrom. Such computed profile metrics characterize certain properties of the I-counts in the profile and may be used to detect different types of data characteristics that may give rise to a need for correction.
A profile may be described based on its characteristics, which may be captured via metrics. For example, metric IMARM characterizes a profile distribution where impression counts are missing at random and quantifies the strengthen of the characteristics. Metric NIM may characterize the random nature or noisiness of a profile and indicate the level of noisiness. Metric ALIM may characterize the situation that impression counts in a profile are anomalously low. There may be other types of data patterns with various characteristics, and each may be captured and quantified using corresponding characterizing metrics. For each profile, one or more metrics may be computed for detecting one or more types of data characteristics and the specific metric(s) to be used may be determined based on nature of the application in hand.
Depending on the data properties at issue, a statistical metric may be formulated so that the targeted properties in profile data representing a specific type of characteristics may be captured. For example, IMARM may be formulated to detect the level of correlation between the measured and the estimated I-counts in a period of, e.g., one day. If the measured impressions for each hour of the latest 24-hour interval is taken as the independent variable, and the estimated impressions for each hour of the latest 24-hour interval is taken as the dependent variable, then the R-squared formulation may be adopted to measure the correlation. The p-value may also be used as a statistical measure to discern if there is a relation between the independent variable (measured I-counts) and dependent variable (estimated I-counts). Typically, a p-value less than 0.05 may be used as a threshold to identify existence of a correlated relationship. As another example, the slope of a line fit between the estimated and measured impressions may increase from 1.0 (one) when the fraction of missing impressions increases from zero.
With respect to ALIM, its computation may be designed to capture situations where impressions are inconsistently reported or not reported at all at, e.g., hourly intervals. To reflect that, metric ALIM may be, e.g., defined as the ratio of the measured daily impressions to the estimated daily impressions. With respect to NIM, it may be designed to be high if the profile data exhibits the characteristics as shown in
With respect to a transformed Fourier profile, the day-to-day variability may manifest itself as white noise. Such a feature may be captured by averaging the Fourier amplitudes from a quarter of the Nyquist frequency to the Nyquist frequency. The amplitude at 0 cycles/day may have a much higher impressions averaged across the time series. In some embodiments, NIM may be formulated using the following definition:
where V denotes the noisy impression metric, n denotes the number of points in the time series profile at an hour granularity, and f corresponds to a vector of n points of the absolute value of the Fourier transform of the time series profile. Different definitions for these exemplary metrics may also be used so long as the characteristics of the underlying types of data failure can be captured.
The metrics as discussed herein may then be used by the metric-based replacement label determiner 320 to determine what types of data pattern are present in the profile. To rely on such computed metrics to recognize each type of data pattern, certain thresholds or conditions may be defined in terms of metric values with respect to each type of data pattern so that an assessment of whether metrics satisfy conditions related to each type of data pattern can be carried out in order to decide whether each type of data pattern is present in the profile. For example, with respect to data pattern with characteristics due to impression missing at random, assuming that R-squared and slope metrics are used, then thresholds with regard to the values of R-squared and slope may be provided so that if the values of the metrics exceed the thresholds, it is deemed that the data pattern associated with impression missing at random situation does exist. As discussed herein, in some embodiments, the threshold for R-squared may be set at 0.7 and the threshold for slope may be set at 1.2; the condition for detecting this type of data pattern may be that both metrics R-squared and slope are larger than 0.7 and 1.2, respectively. So, when the calculated R-squared is larger than 0.7 and the calculated slope is larger than 1.2, then the data characteristics attributed to impression missing at random is detected. In some embodiments, other conditions may also be incorporated to define data characteristics caused by impression missing at random. For example, when p-value is also used, a threshold for the p-value to be set at 0.05 and detection condition may be added that the p-value also needs to be below 0.05 in order to detect the situation associated with data missing at random.
Similarly, in some embodiments, the threshold for anomalously low impression metric may be set at 0.3 and the condition for detecting the anomalously low impression data situation may be defined as metric <0.3. Furthermore, the threshold for noisy impression metric may be set at 0.025 and the condition to detecting a data pattern associated with noisy impressions may be defined as that the noisy impression metric is larger than 0.025. These thresholds and the accordingly defined conditions enable the detection of different types of situations associated with specific data characteristics.
In some embodiments, thresholds set in relation to different metrics in conditions for detecting different types of data patterns may be specified by, e.g., a subject matter expert upon examining the profiles in different types of data situations and the characteristics of the metrics computed from such profiles. In some embodiments, machine learning may be used to learn such thresholds based on training data including, e.g., profiles with data pattern types classified (labeled as ground truth) and corresponding metrics computed therefrom. With machine learning, thresholds for different metrics with respect to different types of data characteristics may be learned so long as the classification of the data pattern type is labeled as such.
As discussed herein, the need for correction may arise when certain type(s) of data characteristics is detected from a profile. The specific correction to be applied may be determined according to how many types and in what combination of different types of data patterns are recognized from the profile data. In some embodiments, based on the detected type(s) of data situation, a correction label may be set according to some preset criteria, e.g., specified by a metric-based replacement model 270. For instance, the metric-based replacement model 270 may define that if anomalously low impression type of data characteristics is detected, then all measured I-counts in the profile may be replaced with the estimated I-counts from the forecasting TS model.
The replacement label set according to the detected type(s) of data pattern may dictate that impacted I-counts are to be corrected in a certain way. For instance, the replacement label may be set “full,” indicating that all measured I-counts in a profile from a site are to be replaced with the estimated I-counts from a corresponding forecasting TS model for the site. If the replacement label is set “none,” it may instruct that none of the measured I-counts in the profile is to be replaced. With these two replacement label values, the measured I-counts in the profile may be handled in the same way (either no replacement for all or replacement for all). Optionally, the replacement label may also be set “intermediate solution” when the conditions for “full” and “none” are not met. With an “intermediate solution,” it may indicate a correction mode in which, instead of all I-counts are handled the same way, the correction with respect to each I-count is assessed and carried out individually depending on whether the value of the measured I-count satisfies some preset conditions. In some embodiments, the “intermediate solution” correction mode may be implemented using an imputation approach as discussed herein. In this case, statistics (such as standard deviation a) of residuals between measured and estimated I-counts may be used to define a condition under which a measured I-count is to be corrected.
According to the set replacement label, the I-count determiner 350 is invoked to generate a hybrid I-count output according to the value of the replacement label (set by the metric-based replacement label determiner 320). As discussed herein, when the label is either “none” or “full,” the I-count determiner 350 performs correction on all profile I-counts uniformly, i.e., either retain the measured I-counts when the label is “none” or the I-counts estimated from the forecasting TS model 240 if the replacement label is “full”. The hybrid I-count output is generated by the I-counter determiner 350 in these two correction modes.
Optionally, when the replacement label is set “intermediate solution”, the I-count determiner 350 may activate the intermediate solution unit 330 to perform correction on individual I-count in the when certain condition is met by the I-count. When the intermediate solution is implemented using imputation, whether correction is needed for each measured I-count may be assessed against the imputation criteria stored in 340. In some embodiments, the imputation criteria 340 may be defined based on statistics associated with residuals between measured I-counts and TS model estimated I-counts. In this case, the intermediate solution unit 330 may receive residuals from the residual determination unit 250 (as shown in
Once the replacement label is set, the I-count determiner 350 is activated to perform correction on the profile data in accordance with the replacement label. As illustrated in
Optionally, in some embodiments, an intermediate correction mode may also be incorporated by setting the replacement label “IS” or intermediate solution. As shown in
In some embodiments, the correction scheme in the intermediate solution mode may be implemented using imputation, which determines whether to correct based on preset imputation criteria 340.
The imputation criteria may be provided based on the need of each application. As discussed herein, the imputation criteria may be formulated based on statistics of the residuals between the measured and estimated I-counts. One example is illustrated in
If the measured I-count meets the imputation criteria, the correction is carried out and the estimated I-count from the TS model is used to replace, at 565, the measured I-count and the corrected I-count is output at 570. It is then determined, at 575, whether there are more measured I-counts in the profile to be processed. If there is no more measured I-count in the profile, the process of intermediate solution unit 330 ends at 580. Otherwise, the process proceeds to step 550 to access the next measured I-count. As seen, in this intermediate solution process, each I-measured I-count in the profile is individually handled for correction so that some of the measured I-counts may be corrected and some may not, depending on their values as compared with the imputation criteria 340. The outputs from the intermediate solution unit 330 are sent to the I-count determiner 350 so that these I-counts are used as the result of the hybrid I-count determiner 260.
The present teaching as described herein for determining corrected I-counts for each site in a hybrid mode may be applied to any application where impression count time series associated with a site displaying content to attract an audience so that inaccurate impression counts due to either failure or specific data characteristics may be adjusted according to an understanding to the nature of the data. In this way, the I-counts derived may be made more accurate on-the-fly based on the present teaching as disclosed herein. In some example applications, the improved I-counts associated with a site displaying an advertisement may be used to determine, e.g., a payment to a host of the site by a corresponding advertiser. In this example, the improved I-counts derived based on the present teaching enable more accurate estimation of the price of displaying advertisements on different types of public display means.
To implement various modules, units, and their functionalities as described in the present disclosure, computer hardware platforms may be used as the hardware platform(s) for one or more of the elements described herein. The hardware elements, operating systems and programming languages of such computers are conventional in nature, and it is presumed that those skilled in the art are adequately familiar with to adapt those technologies to appropriate settings as described herein. A computer with user interface elements may be used to implement a personal computer (PC) or other type of workstation or terminal device, although a computer may also act as a server if appropriately programmed. It is believed that those skilled in the art are familiar with the structure, programming, and general operation of such computer equipment and as a result the drawings should be self-explanatory.
Computer 700, for example, includes COM ports 750 connected to and from a network connected thereto to facilitate data communications. Computer 700 also includes a central processing unit (CPU) 720, in the form of one or more processors, for executing program instructions. The exemplary computer platform includes an internal communication bus 710, program storage and data storage of different forms (e.g., disk 770, read only memory (ROM) 730, or random-access memory (RAM) 740), for various data files to be processed and/or communicated by computer 700, as well as possibly program instructions to be executed by CPU 720. Computer 800 also includes an I/O component 760, supporting input/output flows between the computer and other components therein such as user interface elements 880. Computer 700 may also receive programming and data via network communications.
Hence, aspects of the methods of information analytics and management and/or other processes, as outlined above, may be embodied in programming. Program aspects of the technology may be thought of as “products” or “articles of manufacture” typically in the form of executable code and/or associated data that is carried on or embodied in a type of machine readable medium. Tangible non-transitory “storage” type media include any or all of the memory or other storage for the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide storage at any time for the software programming.
All or portions of the software may at times be communicated through a network such as the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, in connection with information analytics and management. Thus, another type of media that may bear the software elements includes optical, electrical, and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. The physical elements that carry such waves, such as wired or wireless links, optical links, or the like, also may be considered as media bearing the software. As used herein, unless restricted to tangible “storage” media, terms such as computer or machine “readable medium” refer to any medium that participates in providing instructions to a processor for execution.
Hence, a machine-readable medium may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium. Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, which may be used to implement the system or any of its components as shown in the drawings. Volatile storage media include dynamic memory, such as a main memory of such a computer platform. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that form a bus within a computer system. Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a physical processor for execution.
It is noted that the present teachings are amenable to a variety of modifications and/or enhancements. For example, although the implementation of various components described above may be embodied in a hardware device, it may also be implemented as a software only solution, e.g., an installation on an existing server. In addition, the techniques as disclosed herein may be implemented as a firmware, firmware/software combination, firmware/hardware combination, or a hardware/firmware/software combination.
In the preceding specification, various example embodiments have been described with reference to the accompanying drawings. It will, however, be evident that various modifications and changes may be made thereto, and additional embodiments may be implemented, without departing from the broader scope of the present teaching as set forth in the claims that follow. The specification and drawings are accordingly to be regarded in an illustrative rather than restrictive sense.