The present invention relates to inferring and/or forecasting demographic information for unmonitored media network content providers, such as television networks.
Television advertising time, usually in the form of commercials, accounts for a significant portion of the total marketing spend of organizations in a number of geographic markets, including the United States. Typically, television advertisements are marketed on the basis of, among other things, estimated reach, with reach being defined as the total number of people or households exposed or tuned in, at least once, to a television network during a given period of time. Historically, the estimated reach of a television network at a given time has been determined by extrapolating the recorded viewing activities of a sample population obtained from media ratings and measurement companies (e.g., Nielsen Media Research, Comscore, Arbitron, etc.) to forecast and measure behaviors of larger audiences. The directly recorded viewing activity data, which also includes the viewers' age and gender demographics is used as the “currency” for the television industry when marketing their advertising time and is hereinafter referred to as “currency data.” Using the currency data, advertising time slots aired during programming that is forecast to attract a large number of viewers with the desired age and gender demographics is typically sold at higher prices per unit than time slots during programming that is forecast to attract fewer viewers, or viewers with unfavorable demographic statistics.
In the past decades, the number of media content providers, such as television networks, radio networks, and the like (hereinafter referred to as a “media network”) on which advertising time and/or media presentation opportunities are available for sale has increased dramatically. However, currency data is not always available for all of these media networks and/or for all of the media network's available program and advertisement opportunities available for sale to a given purchaser. When currency data is not available, or insufficient for a particular media network, that media network may be referred to herein as an “unmeasured media network.” Furthermore, even when such data exists, it may not be sufficient to accurately forecast and measure potential viewership for the purpose of establishing currency data. With the availability of new device-based TV viewership data sets, such as set-top-box (STB) or connected or smart TV data, the viewing activity for many media networks, especially television networks, regardless of their size or overall popularity, may be collected in sufficient sample sizes as to infer and/or forecast currency data for these networks. This inferred and/or forecast currency data may then be used to, for example, establish values for advertising time slots and provide viewership and demographic information to ensure an advertiser's desired audiences are reached.
Turning now to
Exemplary currency data sources 120a-120n include, media ratings and measurement companies (e.g., Nielsen Media Research, Comscore, Arbitron, etc.). Currency data may include viewership data for a particular network or group of networks as well as demographic information regarding the counted viewers. Exemplary demographic information includes age, marital status, gender, race, preferred language, ethnicity, and geographic location, household size, number of devices in household, and purchasing behavior.
The gathered viewership and currency information may be transmitted from set top boxes 110a-110n and/or currency data sources 120a-120n to a central repository of the STB or other data provider (130a) or measurement company (130b) and/or directly to CPU 72 periodically, continuously, and/or upon request. The received viewership and currency information may be stored in CPU 72 and/or a STB/currency data storage device 140a/b. Client device 145 may be any client computing device (e.g., desktop, laptop, and/or tablet computer) by which a user and/or administrator of system 100 may interact with system components and provide instructions to the components. CPU 72 may be any computing device configured to execute one or more of the processes describe herein. More specifically, CPU 72 may be configured to receive viewership and currency data, transform that data into one or more data series, compare the data series to each other so as to determine one or more factors and/or coefficients to describe the relationships between the series of data, apply the factors to viewership data so as to infer and/or forecast demographic information for the viewership data, determine whether the inferred demographic information aligns and/or is consistent with known demographic information, calculate one or more error metrics for adjusting the factors and/or coefficients, and applying the error adjusted factors and/or coefficients to the viewership data. The inferred and/or forecasted device and/or demographic and/or currency data may be stored in an inferred/forecasted currency data storage device 150.
Initially, currency data regarding a monitored network may be received (step 202). The received currency data may include viewership statistics and demographic information about the viewers of the network. Exemplary viewership information may include a viewership count for a series of time intervals throughout a given time period (e.g., hour, day, month, etc.). The currency data may be received from a single source or a plurality of sources, such as currency data sources 120a-120n.
In step 204, set-top-box (hereinafter, “STB”) or other viewership data regarding the monitored media network may be received from one or more STB or other sources 110a-110n. Exemplary sources for STB or other viewership data include cable television providers, satellite television providers, website visit logs, consumer electronics devices, connected or smart TV's and the like. In most instances, the viewership data does not include demographic information or includes only partial or incomplete demographic information.
In step 206, the viewership data included in the currency data received in step 202 may be transformed into a first series of data (step 206). The viewership data received in step 204 may then be transformed into a second series of data (step 208). An objective of the transformations of step 206 and 208 is to adjust the viewership data, received from different sources, so that it may be meaningfully compared. For instance, time intervals between or over which data captures occur may be measured using any unit of measure (e.g., seconds, minutes, hours, days, weeks, etc.) and the transformations of step 206 and 208 may include adjusting the viewership data so that the time intervals by which the viewership counts are incremented are consistent with one another.
In some embodiments, the received currency data and/or viewership data may include data regarding multiple media networks. In these embodiments, the transformations of step 206 and/or 208 may include sorting the received currency data and/or viewership data by media network and/or isolating received currency data and/or viewership data for a particular media network of interest for further processing.
In other embodiments, the transformations of steps 206 and/or 208 may include spatial alignment of the two series of data, which incorporate aligning the first and second series of data according to one or more criterion (e.g., geographic observations, demographic observations, network observations, etc.)
In the exemplary data of
In addition, because viewership counts may be received from a variety of sources, the magnitudes of the counts across the various sources may vary. While the absolute magnitudes of the counts are not especially important for purposes of the present invention, relative magnitudes of the counts within each data set reported by each data source are useful. Hence, the transformation of steps 206 and/or 208 may include adjusting the relative magnitudes for different data sources so that the viewership data in the first series of data and second series of data may be compared with one another. For example,
In some instances, the comparison of the first series of data 310 and the second series of data 320 may include factoring the first series of data 310 and the second series of data 320 to determine one or more factors and/or regression coefficients describing transformations between the first series of data 310 and the second series of data 320. In this way, execution of step 210 may enable transformations of viewership data through factoring so that one data set may be compared with and/or substituted for another.
Next, the factors and/or coefficients are applied to the viewership data and an inference of demographic information for the viewership data is made (step 212).
The inferred demographic information may then be compared to the known demographic information for any monitored network and, in step 216, it may be determined whether the inferred demographic information for the viewership data aligns with, or is sufficiently similar to, the known demographic information. An exemplary set of error metrics is provided in table 303 of
When the inferred demographic data does not align with the known demographic data, then one or more adjustments to the factors and/or coefficients may be determined in order to improve the alignment of the inferred demographic data with the known demographic data (step 218). The method includes the restriction of the basis networks to those found to be of similar scale within the viewership data and the currency data, the restriction of the basis to those networks found to have high correlations and stable factors from one time period to the next.
Steps 212-218 may be iteratively repeated until the inferred demographic information is sufficiently aligned with the known demographic information for the monitored networks. The adjusted factors and/or coefficients may then be applied to the viewership data of the unmonitored networks (step 220).
Each of these tables, 400, 401, and 402, represent a different portion of a data table for a particular monitored network (network 1), for which we calculate a reconstructed series with the aim of estimating the errors in the process. Column 1 of tables 400, 401, and 402 provides the date (year, month, day) associated with a particular data point. Column 2 tables 400, 401, and 402 provides the Daypart (i.e., part of the day), which for the data of tables 400, 401, and 402 is the overnight week portion of the daypart. Column 3 provides raw viewership counts in the device data for network 1 (represented on table 400 as “Raw_f18.54_Network1”). Column 4 provides reference data for Network 1 during the measured time intervals (represented on table 400 as “Reference_f18.54_Network1”). The reference data for network 1 refers to the known demographic information for Network 1 during the time intervals for which data is captured. Column 5 provides factored data for Network 1 during the measured time intervals (represented on table 400 as “Factored_f18.54_Network1”). Factored data may be generated via execution of step 212 on the raw data of column 3. Column 6 provides reconstructed data for Network 1 during the measured time interval (represented on table 400 as “Reconstructed_f18.54_Network1”). Reconstructed data may be generated via execution of step 220 using the coefficients and factors specified in table 303
y=3.5449x+110343
R2=0.25911
where:
In step 502, viewership data regarding an unmonitored network may be received from one or more sources 110a-110n. The received viewership data may then be transformed into a third series of data so that it may be compared to and/or correlated with the first and/or second series of data (step 504). The transformation of step 504 may be similar to the transformations of steps 206 and 208 as discussed above with regard to
Next the coefficients in terms of the basis networks are determined in step 506 which may be similar to steps 212 to 218 of
The time series data sets 121 may then be factored with regard to one another to generate network factors 130. The factoring may include development of mappings used to express one time series data set 121 and/or time series currency data sets 122 as a function of one or more of the other time series data sets 121 and/or time series currency data sets 122. In many instances, network factor generation 130 may involve regression analysis so that individual factors for each of the time series data set 121 and/or time series currency data sets 122 with respect to individual ones of the other time series data set 121 and/or time series currency data sets 122 may be developed. The quality of the factors may be assessed using generally accepted statistical quality measures (e.g., root mean square error computations, mean absolute percentage error computations, etc.).
A demographic viewership composition for any given network and time interval may be determined by calculating the demographic composition of the network 136 using the time series currency data 122, which includes the demographic characteristics of the network viewers. The demographic composition 137 may then be then applied to the same network in the comparative time series viewing data 121 to infer viewership demographics for an unmeasured network in terms of the time series currency data 122. Often times, application of demographic composition 137 is done using the best linear representation of unmeasured networks time series viewing data as determined using stepwise regression 134 of the initially received data set for the unmeasured network 110. The stepwise regression may be used to describe the time series for an unmeasured network as a linear combination of a time series of a set of basis measured network (for which currency data is available) from viewership data.
Unmeasured network viewership measurements and forecasts 142 in terms of the time series currency data 122 may then be calculated by, for example, applying factors 131, demographic composition 137 fractions for the measured networks making up the best linear representation of the unmeasured network, and a basis network coefficient for the time series viewing data 121. An exemplary formula for generating the unmeasured network viewership estimate 140 is as follows:
v(t)=a1f1v1(t)+a2f2v2(t)+ . . . anfnvn(t)
where: a=linear coefficient for the basis network;
Hence, systems and methods for inferring and forecasting viewership and demographic data for unmonitored media networks have been herein described.
This application is NONPROVISIONAL of U.S. Provisional Patent Application No. 61/917,977 entitled “Measured Networks Basis Factoring Method To Estimate Reach in Unmeasured Networks” filed on 19 Dec. 2013, which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61917977 | Dec 2013 | US |