Program providers supply content segments to viewers over various communications networks. Content segments may include broadcast television programs. Content segments may include video programs streamed, for example, over the Internet. Content segments also may include video advertisements that accompany, or in some way relate to the video programs. Content segments may be accessed using an application on a mobile device. Other content segments and other distribution methods are possible.
Sponsors provide sponsored content segments to promote products and services. Sponsors may use one or more different media (e.g., television, radio, print, online) to promote the products and services. Sponsors may create a promotional campaign that uses sponsored content segments appearing in different media. The sponsored content segments may be for the same products and services although the sponsored content segments appear in different media. Thus, individuals may be exposed to sponsored content segments in a first media, a second media, and so on.
Program providers may be interested in knowing what content segments are accessed or viewed by which viewers. Sponsors may want to know how effective their promotional campaign is. One way to determine this “viewing history” is by sampling a large population and making inferences about the viewing history based on the sample results. One way to determine promotional campaign effectiveness is to measure media consumption metrics such as an amount of time an individual spends exposed to the media, a number of times a specific content segment has seen and/or heard by the individual, and the number of exposures to sponsored content segments among the different media, for example. These techniques may be inaccurate and unreliable, and may be costly to implement.
A method, executed by a processor, for estimating media metrics from large population data includes formatting and storing panel data, the panel data comprising observed viewing data of a plurality of individual panelists and demographic data for the plurality of panelists, the panel being drawn from a large population; accessing the large population data, the large population data comprising household-level viewing data and household level demographics; training a model to estimate viewing audience size based on the observed panel data; estimating, using the trained model, audience size for each household in the large population data; estimating a viewing score for each individual viewer in a plurality of households in the large population data; and combining the estimates of audience size and viewing score to produce probabilities that each of the viewers in the household viewed a specific media event.
The detailed description refers to the following Figures in which like numerals refer to like items, and in which:
Program providers supply content segments to viewers over various communications networks. Content segments may include broadcast television programs. Content segments may include video programs streamed, for example, over the Internet. Content segments also may include video advertisements that accompany, or in some way relate to the video programs. Content segments may be accessed using an application on a mobile device. Other content segments and other distribution methods are possible.
Sponsors provide sponsored content segments to promote products and services. Sponsors may use one or more different media (e.g., television, radio, print, online) to promote the products and services. Sponsors may create a promotional campaign that uses sponsored content segments appearing in different media. The sponsored content segments may be for the same products and services although the sponsored content segments appear in different media. Thus, individuals may be exposed to sponsored content segments in a first media, a second media, and so on.
Program providers may be interested in knowing what content segments are accessed or viewed by which viewers. One way to determine this “viewing history” is by sampling a large population and making inferences about the viewing history based on the sample results. One way to sample a viewing population is through the use of individual panelists (viewers in the sample population) and metering devices that record and report on the individual panelists' viewing history. For example, an individual panelist (i.e., a viewer) may agree to installation of a meter at the panelist's residence. The meter records the individual panelist's television viewing and Internet activity, and reports the data to a remote server. Note that this approach works in a household having more than one viewer. For example, each household member may be recruited as a panelist. Alternately, a subset of the household members may participate as panelists. Note that the panelists may agree to being measured. Furthermore, individual panelists would agree to sign in to measurement, but and measurement may be suspended at any time (incognito) by a panelist.
In contrast to metering individual panelists, viewing history data may be collected by a single metering device installed at a household. For example, a television set top box (STB) may record television viewing data. This approach cannot distinguish viewing by individual household members, but may be less costly to implement.
Sponsors may want to know how effective their promotional campaigns are. One way to determine effectiveness is to measure media consumption metrics such as an amount of time an individual spends exposed to the media, a number of times a specific content segment has seen and/or heard by the individual, and the number of exposures to sponsored content segments among the different media, for example.
Reach is an example of a media consumption metric. In the context of an individual, reach is a binary metric; either an individual has been exposed to the media (for example, exposed to a sponsored content segment) or the individual has not been so exposed. Reach may be defined on an individual basis or over a population group. However, for a population, reach may be expressed as a percentage. Reach also may be defined over multiple media types. Reach may be measured by a panel and overall reach of the population from which the panel is drawn may be estimated or inferred from the panel data.
As noted above, one way to obtain high quality data for a media consumption study is to recruit a statistically representative sample of households and individuals and install meters on every media device of the participating households so as to record the viewing date, time, audio signature, and identity of the panelists viewing the media. In this approach, all household members and guests may be required to register their media viewing using, for example, a remote control. While providing high quality user-level media consumption data, this approach is hard to scale to a large population due to high recruitment and meter installation costs. For example, a small, high quality panel may not provide statistically reliable data for all population segments of interest because of the costs associated with obtaining the data from so many possible population groups. Thus, the cost associated with building a large-scale, high quality panel, where individual viewers in a household are metered may be prohibitive.
To overcome this problem with determining television consumption data at the level of individual users in a household, disclosed herein are methods and systems that infer individual consumption in a large population using a model that is trained on a sample of high quality television program viewing data. The systems and methods disclosed herein begin by recruiting a high quality panel, or acquiring data from an existing high quality panel. Note that such a panel may record data form multiple types of media. For example, the panel may be a single source panel (SSP) that records media consumption data for television viewing, Internet activity, radio listening, and other types of media consumption. The SSP also records panelist demographic data.
Next, considering for example, television as the media being consumed, the systems and methods use data such as that which may be obtained by buying television STB logs data from cable or satellite TV providers. This approach to acquiring data on a large scale may be less costly than attempting to record such data using individually-metered panelists. One limitation of data from STB logs is that with a STB, television viewing logged at the household level. This household level data recordation may prevent direct calculation of many television audience measures that require individual viewer-level data.
To overcome this limitation with STB log data, the systems and methods use a process that infers individual viewing from collected household-level viewing data. One aspect of this process of household-to-individual conversion is to take into account the fact that watching television often is a group activity. Another aspect of this process is to do soft classification as opposed to hard classification for every television viewing event observed in the household. Here, a television viewing event may be defined as an uninterrupted period that a television was tuned to a specific channel. The process includes estimating a size of the household audience and then splitting the estimate among all viewers in the household. Individual viewer-level television viewing probabilities, as outputs of the process then are aggregated to compute television audience measures, such as reach and target rating points. In addition, the process may be extended to estimate incremental reach.
Block 2 trains a model, using panel data for an individual, to estimate all model parameters. A regression model, for example, may be used to estimate the coefficients of the predictor X.
In block 3, the trained model is applied to STB log data obtained for a sample of households in the population. Block 3 may include two stages. In a first stage (block 3A), shared viewing of the television viewing events is predicted using the trained model applied to STB log data. In a second stage (block 3B), individual viewing is scored.
In block 4, the shared viewing predictions and individual scoring of block 3 are combined. Finally, in block 5 aggregated campaign metrics such as reach and target rating point are estimated using the processed STB log data. The processes of
Note that for a single-viewer household (i.e., a household supplying the STB log data), the household television viewing directly converts to individual scoring, and thus the modeling of block 3 is not needed.
For a two-viewer household, each television viewing event is a manifestation of three possible viewer-level television viewing scenarios: only viewed by the first viewer, only viewed by the second viewer, or viewed by both viewers. The process for shared viewing prediction begins with a “soft assignment” of the household television viewing event to one of the three scenarios. In an equivalent form, the process involves assigning one viewing probability to each of the two viewers, and each probability accounts for both solo viewing and shared viewing.
For a household of three or more viewers, the process of block 3,
More generally, regardless of household size, the process of measuring television viewing based on data from STB logs 1) estimates the effective viewership for each television viewing event, and 2) splits the estimate among all viewers.
Returning to the scenario of a two-viewer household, in an embodiment, as noted above, a first aspect of the prediction process includes two stages: first, predict shared viewing, and second, score individual viewing (i.e., block 3,
For both stages, one set of predictive signals (i.e., the predictors X) comes from demographic data, such as household income, demographic region, number of children in the household, as well as each viewer's age, gender, education levels, and occupation, for example. Another set of predictive signals X comes from the set top box data, such as the time of the day, day of the week, channel, and genre keywords extracted from an electronic program guide, for example.
To predict shared viewing for one household for one television viewing event, the process of block 3A begins with the determination of y as an indicator of shared viewing or not for a vector of predictors X using a regression model such as:
log it(E[y|X])=log it(p)=Xβ, Eqn 1
where the expected value of y given X is p, and β is the coefficient vector of the predictors. Thus, if p is the probability of shared viewing, the estimated audience size for the television viewing event is 1+p, a value that is bounded between 1 and 2. The vector of predictors X includes household level demographics and television viewing event meta-information as noted above. For viewer-level demographics, such as age, gender, education levels, occupations types, the process uses an unordered combination of viewer-level variables. An example of such an unordered combination for gender has three levels: (female, female), (female, male), and (male, male). Other variables, such as education, may have many more than three levels. The first stage may be conducted for each predictor X.
Put another way, the model has been trained on metered data collected from individuals (i.e., panelists) in a high quality panel. The trained model then is used to predict shared viewing in the household data; that is, the shared viewing data obtained from the panelists is used to train the model, which in turn is used to predict shared viewing in a household. The household demographics are known (these data are supplied as part of the purchase of the STB log data). The STB log data also indicates when a television viewing event occurred in the household. The model then predicts if the television viewing event in the household was a shared viewing event or not. This completes the first stage of the prediction process (i.e., the first stage of the process of block 3,
The second stage of the prediction scores individual viewing. In the second-stage, for each household television viewing event, a scoring process scores each viewer in the household to reflect the degree of confidence that the viewer has viewed that television viewing event. In this second stage of the prediction process, the score is determined as a product of two factors: (1) a probability of having television viewing, and (2) time-varying channel preferences measured in an amount of television viewing time.
Following the process of block 3,
and the probability of having viewer 2 as the viewer is
As noted above, in a household of three or more viewers, the modeling process (block 3,
where X is the vector of predictors. In the model represented by equation 2, the audience size estimation is scaled to fall between 0 and 1. This scaling allows use of the same model framework as in equation 1. The construction of the predictors follows that for a two-viewer household except for the combination of viewer-level demographics. For a demographic predictors with many possible levels, such as education, the combination of all viewer's variables may result in a very large number of levels. In an embodiment, the process of block 3 may use a percentage-based approach to valuing certain predictors X. For example, for the education level predictor, the value of X may be expressed as a percentage of the household that has reached a specified education level.
In executing the processes of
The process of
The viewing locations 20i include first media devices 24i and second media devices 26i through which viewers (e.g., panelists) 22i are exposed to media from sponsor 40 and program provider 60. A viewing location 20i may be the residence of a panelist 22i who operates media devices 24i and 26i to access, through router 25i, resources such as Web sites and to receive television programs, radio programs, and other media. The media devices 24i and 26i may be fixed or mobile. For example, media device 24i may be an Internet connected “smart” television (ITV); a “basic” or “smart” television connected to a set top box (STB) or other Internet-enabled device; a Blu-ray™ player; a game box; and a radio, for example. Media device 26i may be a tablet, a smart phone, a laptop computer, or a desk top computer, for example. The media devices 24i and 26i may include browsers. A browser may be a software application for retrieving, presenting, and traversing resources such as at the Web sites. The browser may record certain data related to the Web site visits. The media devices 24i and 26i also may include applications. The panelist 22i may cause the media devices 24i or 26i to execute an application, such as a mobile banking application, to access online banking services. The application may involve use of a browser or other means, including cellular means, to connect to the online banking services.
The viewing location 20A may be a single panelist viewing location and may include a meter 27A that records and reports data collected during exposure of sponsored content segments 42 and programs 62 to the panelist 22A. The example meter 27A may be incorporated into the router 25A through which all media received at the viewing location 20i passes.
Alternately, in an example of a tow-viewer viewing locations, panelists 22N1 and 22N2 operate media devices 24N and 26N. In operating these media devices, the panelists 22Ni may operate separate meters 27N1 and 27N2 for each media device. The meters 27N1 and 27N2 may send the collected data to the analytics service 70.
The sponsor 40 operates server 44 to provide sponsored content segments that are served with programs 62 provided by the program provider 60. For example, the server 44 may provide sponsored content segments to serve with broadcast television programming. The sponsored content segments 42 may include audio, video, and animation features. The sponsored content segments 42 may be in a rich media format. The sponsor 40 may provide a promotional campaign that includes sponsored content segments to be served across different media types or a single media type. The cross-media sponsored content segments 42 may be complementary; that is, related to the same product or service.
More specifically, the sponsor 40 may develop a promotional campaign to provide sponsored content segments for airing as part of a television broadcast and sponsored content segments for airing at one or more of the Web sites. In an alternative, the television portion of the promotional campaign may include both traditional sponsored content segments that air during the program breaks, product placement content displays, wherein specific products are incorporated into the television programs (e.g., a specific brand of automobile is used in a television comedy show), and content displays that may be placed in fixed positions in a television program (e.g., a content display for an automobile insurance company that appears on a stadium wall during airing of a soccer game). The online portion of the promotional campaign may include sponsored content segments that are shown on Web pages. The online sponsored content segments may use some creative features from corresponding television sponsored content segments. For example, an online sponsored content segment for an automobile company may show an image of a sport utility vehicle (SUV) that was promoted during a television program. The promotional campaign also may address other forms of media such as in mobile applications.
The network 50 may be any communications network that allows the transmission of signals, media, messages, voice, and data among the entities shown in
The program provider 60 delivers programs for consumption by the panelists 22i and also for consumption by members of a large population from which the panelists 22i are recruited. The programs 62 may be broadcast television programs. Alternately, the programs 62 may be radio programs, Internet Web sites, or any other media. The programs 62 include provisions for serving and displaying sponsored content segments 42. The program provider 60 may receive the sponsored content segments 42 from the sponsor and incorporate the sponsored content segments into the programs 62. Alternately, the panelist's media devices may request a sponsored content segment 42 when those media devices display a program 62.
The analytics service 70, which operates analytics server 72, may collect data related to sponsored content segments 42 and programs 62 to which an panelist was exposed. In an embodiment, such data collection is performed through a panelist program where panelists 22 are recruited to voluntarily provide such data. The actual data collection may be performed by way of surveys and/or by collection by the meters 27. The collected data are sent to and stored in analytics server 72. The analytics service 70 also collects (or buys) STB log data 90 for a large population. The service 70 then processes the data according to program 200, stores the results of the processing, and may report the results to another entity such as the sponsor 40.
The first panel 62A, as used in an aspect of the herein disclosed processes, may be a high quality panel of individuals who agree to record information related to their television viewing activities and to provide demographic data. High quality, as used herein means the data supplied by the panelists is accurate and complete. The panel 62A also, as noted above, may be a single source panel, in which media consumption for multiple media types is recorded by the panelists. However, for purposes of estimating media metrics according to the process of
A population is represented by data obtained through STB logs. The data may include demographic data for households and viewers in the households.
In
The database 82 includes a computer-readable storage medium on which is encoded the machine instructions comprising the system 200 (see
The system 200 may be used to provide an estimate of reach (and incremental reach) for arbitrary population subsets or sectors based on one or more characteristics X of the population or population subset or sector. Thus, a population may be divided according to those viewers having one or more characteristics X in common. In this approach, vector Xi represents population characteristics (variables), both demographics and viewing habits.
In
The data collection module 210 includes the machine instructions to execute the processes of
The model training module 220 uses the acquired panelist data to train a model, such as a regression model.
The estimator module 230 also may be used to acquire, format, and save STB log data. The estimator module 230 applies a trained model to estimate shared viewing based on the household data from the STB log data and then to score individual viewing by viewers from the households providing the STB log data.
The combiner module 240 combines the predicted shared viewing and scored individual viewing according to the processes of block 4,
The media metrics estimation module 250 estimates media metrics such as reach, TRP, and incremental reach. One use of granular-level (as described herein, individual viewer-level) estimated television viewing probabilities is to combine the viewer-level estimates to estimate campaign-level estimates. The ultimate goal of estimating granular level TV viewing probabilities is to compute aggregated campaign measures. One such campaign-level metric is television campaign reach. Given the viewer-level television viewing estimated derived from STB logs by using a model trained on a high quality panel, estimation of television campaign reach may proceed as follows, according to the herein disclosed systems.
First, the system 200 provides for the estimation of a viewer-level campaign reach indicator for each viewer. Second, the system 200 provides for computation of a weighted average of the viewer-level reach, where the weight adjusts for viewer representation in the high quality panel to compare to the actual demographic representation of similar viewers in the larger population represented by the STB logs:
Using the STB log data, the system 200 replaces the observed viewer-level campaign reach with an estimated reach probability. Next, the system 200 translates the viewing probabilities of a sequence of television viewing events that contain a specific sponsored event or advertisement to the estimated reach probability. Assuming that all sponsored events in a campaign are viewed independently of each other, the overall reach probability may be stated as:
{circumflex over (r)}
i=1−Πk(1−pk) Eqn 4
Another widely-used television viewing metric is television target rating point (TRP), which may be defined as the product of overall campaign reach and average reach frequency within a specific population. TRP may be computed by summing viewer-level reach frequency within the population. The process for computing TRP begins with computation of viewer-level campaign reach frequency fi for each viewer i. As before, viewer-level campaign reach frequency for each panelist may be observed, and the data used to train a model, which in turn is applied to the larger population represented by the STB log data. The overall campaign reach for the population represented by the STB log data then is:
where wi is a weight adjusting the demographic representation of the viewer in the population.
Next, the process, using the STB log data, estimates viewer reach frequency based on the viewing probabilities of a sequence of events that contain campaign-level events. Viewer-level reach frequency may be stated as
{circumflex over (f)}
i=Σkpk. Eqn 6
This computation does not rely on an assumption of independence among campaign-level sponsored event viewing.
The above-disclosed systems and processes also may be applied to estimate incremental reach based on the STB log data. Incremental reach may be defined as a number of unique viewers exposed to advertising from a specific campaign, via a specific media channel, who were not exposed to advertising from that specific campaign on any other media channel. For example, a viewer may be exposed to an advertisement on a television broadcast but was not exposed to the same or a similar advertisement online. Thus, the STB log data may be combined with online panel data to build a single source panel to measure cross-media usage. Then, the probabilistic television viewing data may be leveraged to produce cross-media campaign measures, such as incremental reach of online video advertising to relative to television advertising.
To estimate incremental reach, the system 200 estimates a second reach metric such as online reach. In this example, for each viewer i, the television reach for a specific campaign may be expressed as ri and online reach for the campaign as Overall incremental reach may be estimated by first computing viewer-level incremental reach as ri′(1−ri) for user i. Next, the system 200 computes a weighted average of the viewer-level incremental reach according to
where wi is used to adjust viewer i according to the viewer's demographic representation.
In block 410, the system 200 trains a regression model using the observed panelist data. In block 415, the system 200 acquires STB log data from a large population. The system 200 may format and store the STB log data.
In block 420, the system 200 estimates audience size in each household represented by the STB log data. In block 425, the system estimates an individual viewing score for each individual viewer in a household. In block 430, the system 200 combines the estimated audience size and the viewing score to produce viewer-level probabilities for each viewer in each household for each household television viewing event. The process 400 then ends.
Using the STB log data, in block 520, the system 200 replaces the observed viewer-level campaign reach with an estimated reach probability. In block 525, the system 200 translates the viewing probabilities of a sequence of television viewing events that contain a specific sponsored event or advertisement to the estimated reach probability to provide the overall reach probability according to {circumflex over (r)}i=1−Πk(1−pk). The method 500 then ends.
The method 550 then ends.
Certain of the devices shown in the herein described figures include a computing system. The computing system includes a processor (CPU) and a system bus that couples various system components including a system memory such as read only memory (ROM) and random access memory (RAM), to the processor. Other system memory may be available for use as well. The computing system may include more than one processor or a group or cluster of computing system networked together to provide greater processing capability. The system bus may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. A basic input/output (BIOS) stored in the ROM or the like, may provide basic routines that help to transfer information between elements within the computing system, such as during start-up. The computing system further includes data stores, which maintain a database according to known database management systems. The data stores may be embodied in many forms, such as a hard disk drive, a magnetic disk drive, an optical disk drive, tape drive, or another type of computer readable media which can store data that are accessible by the processor, such as magnetic cassettes, flash memory cards, digital versatile disks, cartridges, random access memories (RAM) and, read only memory (ROM). The data stores may be connected to the system bus by a drive interface. The data stores provide nonvolatile storage of computer readable instructions, data structures, program modules and other data for the computing system.
To enable human (and in some instances, machine) user interaction, the computing system may include an input device, such as a microphone for speech and audio, a touch sensitive screen for gesture or graphical input, keyboard, mouse, motion input, and so forth. An output device can include one or more of a number of output mechanisms. In some instances, multimodal systems enable a user to provide multiple types of input to communicate with the computing system. A communications interface generally enables the computing device system to communicate with one or more other computing devices using various communication and network protocols.
The preceding disclosure refers to flowcharts and accompanying descriptions to illustrate the embodiments represented in
Embodiments disclosed herein can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the herein disclosed structures and their equivalents. Some embodiments can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on computer storage medium for execution by one or more processors. A computer storage medium can be, or can be included in, a computer-readable storage device, a computer-readable storage substrate, or a random or serial access memory. The computer storage medium can also be, or can be included in, one or more separate physical components or media such as multiple CDs, disks, or other storage devices. The computer readable storage medium does not include a transitory signal.
The herein disclosed methods can be implemented as operations performed by a processor on data stored on one or more computer-readable storage devices or received from other sources.
A computer program (also known as a program, module, engine, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, object, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.