This disclosure relates generally to neural networks and, more particularly, to neural network processing of return path data to estimate household member and visitor demographics.
Audience measurement entities (AMEs), such as The Nielsen Company (US), LLC, may extrapolate ratings metrics and/or other audience measurement data for a total television viewing audience from a relatively small sample of panel homes. The panel homes may be well studied and are typically chosen to be representative of an audience universe as a whole. Furthermore, to help supplement panel data, an AME, such as The Nielsen Company (US), LLC, may reach agreements with pay-television provider companies to obtain the television tuning information derived from set top boxes and/or other devices/software, which is referred to herein, and in the industry, as return path data.
The figures are not to scale. In general, the same reference numbers will be used throughout the drawing(s) and accompanying written description to refer to the same or like parts, elements, etc.
Descriptors “first,” “second,” “third,” etc., are used herein when identifying multiple elements or components which may be referred to separately. Unless otherwise specified or understood based on their context of use, such descriptors are not intended to impute any meaning of priority or ordering in time but merely as labels for referring to multiple elements or components separately for ease of understanding the disclosed examples. In some examples, the descriptor “first” may be used to refer to an element in the detailed description, while the same element may be referred to in a claim with a different descriptor such as “second” or “third.” In such instances, it should be understood that such descriptors are used merely for ease of referencing multiple elements or components.
Example methods, apparatus, systems and articles of manufacture (e.g., physical storage media) to implement neural network processing of return path data to estimate household demographics are disclosed herein. Example of such demographic estimation systems disclosed herein include a feature generator to generate features from return path data reported from set-top boxes associated with return path data households. Example demographic estimation systems disclosed herein also include a neural network to process the features generated from the return path data to predict demographic classification probabilities for the return path data households. Example demographic estimation systems disclosed herein further include a demographic assignment engine to assign one or more demographic categories to respective ones of the return path data households based on the predicted demographic classification probabilities
These and other example methods, apparatus, systems and articles of manufacture (e.g., physical storage media) to implement neural network processing of return path data to estimate household demographics are disclosed in further detail below.
As noted above, AMEs extrapolate ratings metrics and/or other audience measurement data for a total television viewing audience from a relatively small sample of panelist households, also referred to herein as panel homes. The panel homes may be well studied and are typically chosen to be representative of an audience universe as a whole. However, accurately representing the geographic distribution and demographic diversity that exists in the total audience population with a small sample of panel homes remains a challenge. Incorporating additional streams of information about media exposure to the total audience population can fill in gaps or biases inherent to any statistical sample.
To help supplement panel data, an AME, such as The Nielsen Company (US), LLC, may reach agreements with pay-television provider companies to obtain the television tuning information derived from set top boxes, which is referred to herein, and in the industry, as return path data (RPD). Set-top box (STB) data includes all the data collected by the set-top box. STB data may include, for example, tuning events and/or commands received by the STB (e.g., power on, power off, change channel, change input source, start presenting media, pause the presentation of media, record a presentation of media, volume up/down, etc.). STB data may additionally or alternatively include commands sent to a content provider by the STB (e.g., switch input sources, record a media presentation, delete a recorded media presentation, the time/date a media presentation was started, the time a media presentation was completed, etc.), heartbeat signals, or the like. The set-top box data may additionally or alternatively include a household identification (e.g. a household ID) and/or a STB identification (e.g. a STB ID).
Return path data includes any data receivable at a media service provider (e.g., a such as a cable television service provider, a satellite television service provider, a streaming media service provider, a content provider, etc.) via a return path to the service provider from a media consumer site. As such, return path data includes at least a portion of the set-top box data. Return path data may additionally or alternatively include data from any other consumer device with network access capabilities (e.g., via a cellular network, the internet, other public or private networks, etc.). For example, return path data may include any or all of linear real time data from an STB, guide user data from a guide server, click stream data, key stream data (e.g., any click on the remote—volume, mute, etc.), interactive activity (such as Video On Demand) and any other data (e.g., data from middleware). RPD data can additionally or alternatively be from the network (e.g., via Switched Digital software) and/or any cloud-based data (such as a remote server DVR) from the cloud.
RPD can provide insight into the media exposure associated with a larger segment of the audience population. This is because RPD typically provides a rich stream of television viewing information for a much larger number of households than are included in an AME's panel homes. However, unlike the well-studied AME panel homes, the demographic details of pay-television subscribers are typically unknown. This lack of demographic details in the RPD can result in technical problems preventing, or at least limiting, the ability to effectively use RPD to supplement the AME's panel data because monitoring the behavioral profiles of various audience demographics requires knowledge of the demographic composition of the subscriber homes providing the RPD.
Neural network processing of set-top box RPD to estimate household demographics as disclosed herein provides a technical solution to the technical problem of combining RPD with panel data for audience measurement. As disclosed in further detail below, example neural-network-based demographic estimation systems implemented in accordance with teachings of this disclosure use panel data collected from monitored AME panel homes as a training set for training a neural network (e.g., a recurrent neural network) to be able to predict, from RPD tuning data describing historical television tuning behavior, probabilities of different household demographic characteristics being associated with respective ones of the RPD households reporting the RPD data. Disclosed example neural-network-based demographic estimation system predictions then use the predicted probabilities of different household demographic characteristics to assign demographic compositions to households. In this way, example neural-network-based demographic estimation systems assign demographic compositions to the subscriber homes providing the RPD, thereby allowing the RPD to be combined with or to otherwise enhance the panel data driving an AME's audience measurement systems. Such example neural-network-based demographic estimation systems are also referred to as implementing an example Household Demographic Assignment Model (HDAM) to assign demographic compositions to households.
In some disclosed examples, the HDAM implemented by the neural-network-based demographic estimation system predicts household level demographic assignments based on television viewing data, but predict the household level demographics for primary household members and not long term visitors. As used herein, a long-term visitor is a person who visits the home more than once and/or for an extended period of time, such as at least once every two weeks, or, stays in the home for at least one month in a calendar year. A long term visitor has a primary residence elsewhere, are not household members, and typically watches and/or listens to television in the home during at least a portion of their visits or extended stay. However, other criteria can be used to classify individuals as long-term visitors, taking into account factors such as improvement to household compliance. Some examples disclosed herein modify the HDAM model to predict demographics long term visitors while being consistent with the long-term visitor distributions obtained from panel data obtained by the AME by monitoring its panelists, such as Nielsen People Monitoring (NPM) panel data generated by The Nielsen Company (US), LLC.
In some examples disclosed herein, the modification to the HDAM model to predict demographics of long term visitors uses long term visitor data available from the AME's panel data. Using this panel data, the percentages of total people in each age-gender bucket that are long term visitors can be determined. Taking the estimates for the total number of people in each age-gender bucket, these estimated totals can be modified to include visitors based on the panel visitor percentages. By modifying the population targets applied to the HDAM model, the HDAM model can be modified to predict household compositions that contain both long term visitors and primary household members.
Example disclosed herein distinguish the long term visitors from the primary household members. In some examples, the expected aggregate level long term visitor population and age-gender distributions are known from the AME's panel data. The output of the HDAM model can be thought of as a vector containing the total number of long term visitors and primary household members in each of the various demographic categories (e.g., age-gender buckets). Knowing the percentage of long term visitors in each demographic category (e.g., each age-gender group), a population of long term visitors can be assigned to the homes to satisfy the expected distribution across demographic categories. In examples disclosed herein, a set of visitor vectors are created that satisfy the panel informed consistency requirements. After creating the visitor vectors, homes in which these visitor vectors can be placed are determined. For example, the set of possible (candidate) homes (from the HDAM predictions) that each visitor vector could be assigned are determined. Using the probabilities that respective ones of the individual homes have a long term visitor, the visitor vectors are placed in the homes, prioritizing the homes that are more likely to have long term visitors. Examples disclosed herein produce long term visitor assignments to the predicted homes that automatically satisfy the panel informed consistency requirements without disrupting/impacting the HDAM predictions for primary household members. As a result, primary household member assignments and long term visitor assignments can be achieved by examples disclosed herein.
Turning to the figures, a block diagram of an example processing flow 100 to estimate demographic classification probabilities from set-top box RPD using a neural network in accordance with teachings of this disclosure is illustrated in
In the data collection phase 105 of the neural network training branch 120, example panelist tuning data 130 is collected from meters monitoring media exposure in panel homes recruited by an AME. Panelist tuning data 130 can include any data collectable by the meters, such as, but not limited to, data identifying media presented by media devices in the panel homes, demographic data identifying characteristics of the panelists in the panel homes, etc. In the feature generation phase 110 of the neural network training branch 120, example features 135 are generated from the collected panelist tuning data 130 and arranged to form feature vectors, as described in further detail below. In the neural network demographic probability prediction phase 115 of the neural network training branch 120, a neural network 140 is trained to predict, from the features 135 generated from the collected panelist tuning data 130, probabilities of different household demographic characteristics being associated with the different panel homes, as described in further detail below.
In the data collection phase 105 of the neural network application branch 125, example RPD tuning data 145 is collected from set-top boxes of one or more pay television providers (e.g., cable television service providers, satellite television service providers, streaming media service providers, content providers, etc.). A set-top box may also refer to any decoder, receiver, integrated receiver-decoder (IRD), media device, etc., from which the RPD tuning data 145 may be collected. In the feature generation phase 110 of the neural network application branch 125, example features 150 are generated from the collected RPD tuning data 145 and arranged to form feature vectors, as described in further detail below. In the neural network demographic probability prediction phase 115 of the neural network application branch 125, the trained neural network 155 is applied to the features 150 generated from the collected RPD tuning data 145 to predict example estimated probabilities 160 of different household demographic characteristics being associated with the different RPD subscriber households that reported the RPD tuning data 145, as described in further detail below
A block diagram of an example processing flow 200 to use the estimated demographic classification probabilities 160 predicted by the example processing flow 100 of
A block diagram of an example neural-network-based demographic estimation system 300 structured to implement the processing flows 100 and 200 of
In the illustrated example, the panel tuning data collector 310 collects, via the network interface 305 in communication with one or more example networks 355, the panelist tuning data 130 from example meters 360A-B monitoring media exposure associated with example media devices 365A-B (e.g., televisions, radios, computers, tablet devices, smart phones, etc.) in panel homes recruited by an AME. The panel tuning data collector 310 stores the collected panelist tuning data 130 in the panelist database 315. In the illustrated example, the RPD data collector 320 collects, via the network interface 305 in communication with the one or more networks 355, the RPD tuning data 145 from one or more example service providers 370 that collect the RPD tuning data 145 from example individual STBs 375 in the subscriber households. Additionally or alternatively, in some examples, the RPD data collector 320 collects the RPD tuning data 145 from tone or more of the individual STBs 375 in the subscriber households directly via the network interface 305 in communication with the one or more networks 355. The RPD data collector 320 stores the collected RPD tuning data 145 in the RPD database 325.
The feature generator 330 of the illustrated example generates the features and feature vectors used by the example demographic prediction neural network 335. In some examples, RPD tuning data consists of sequential logs of when respective set top boxes were tuned to different stations. Individuals (e.g., audience members) transfer between multiple networks over the course of a contiguous television viewing session, and this pattern of activity may provide additional information about the household beyond the tuning record in isolation. To capture this behavior, the feature generator 330 compiles the STB records of television tuning into “view blocks” that aggregate the viewing behavior of one or more unknown viewers into a fixed number of features summarizing each contiguous viewing session. In some examples, view block durations are capped at 1 hour, or some other duration, to account for situations in which multiple viewers may take control of a television without necessarily turning the television off between sessions. In the illustrated example, each view block contains F features recording information about the start time of the view block, channel click rate, duration of the viewing sessions and a listing of the television stations visited during the session.
The feature generator 330 of the illustrated example groups view blocks by household and a group of N view blocks is assembled into a two-dimensional (N×F) matrix containing a record of the view blocks generated by a household over a given observation period. In some examples, the feature generator 330 aggregates relevant household level features, including the number of television tuners, and the amount of television watched, with the view block data, into an H dimensional (1×H) additional feature vector for each household.
In some examples, each view block is a (1×173) feature vector describing a corresponding television viewing session. As such, the corresponding (N×F) matrix has an F dimension of 173 for this examples. Table 1 illustrates the contents of an example view block represented as a (1×173) feature vector.
0-Inf
The first three features in Table 1 are self-explanatory. The “Channel Change Rate” feature of Table 1 is the ratio of the number of times the channel changed during the view block to the duration of the view block in minutes. The “Minutes Viewing Each Network” feature is the total number of minutes each television station was watched. In the example of Table 1, view blocks are capped at 60 minutes duration and, thus, the summation of these features over all networks is to be <=60.0 minutes. In some such examples, a viewing session may thereby be associated with one or more view blocks. In the example of Table 1, each station is randomly assigned an index value between 4 and 173.
In some examples, view blocks (from panel households) containing less than 5 minutes of television viewing behavior are not used to train the demographic prediction neural network 335. The view blocks for each household (e.g., panel households for neural network training and RPD households for neural network application) are then stacked into a two-dimensional matrix with, for example, 400 rows (e.g., N=400). In some examples, households that generated fewer than 400 unique view blocks are zero padded by the feature generator 330 until they have 400 rows, while those with greater than 400 are truncated by the feature generator 330 to the first 400 rows. The two-dimensional arrays from each household are then stacked by the feature generator 330 to forming a three-dimensional matrix that can be fed into the demographic prediction neural network 335.
In some examples, the feature generator 330 augments viewing data with three household level features, H, that are merged into the demographic prediction neural network 335 following a recurrent layer, as described below. Table 2 illustrates an example set of the three household level features, H, corresponding to (i) a total amount of tuning reported for the given household across the different durations of time covered by the view blocks (e.g., a 24 hour period) (corresponding to Index 0 in the table), (ii) a number of view blocks reported for the given household across the different durations of time (corresponding to Index 1 in the table), and (iii) a total number of tuners included in the first one of the return path data households (corresponding to Index 2 in the table).
In the illustrated example, the demographic prediction neural network 335 is structured to predict 20 variables (e.g., a 1×20 vector) representing probabilities of different household level demographics being present in a household (although other numbers of variables representing other demographics could additionally or alternatively be predicted in other example implementations of the demographic prediction neural network 335). In the illustrated example, fourteen household demographic target variables predicted by the demographic prediction neural network 335 indicate the respective probabilities (e.g., likelihoods) of 14 different age gender combinations being present in the household, examples of which are represented in Table 3.
In addition to the presence variables of Table 3, in some examples, the demographic prediction neural network 335 predicts six additional target variables describing the demographic profile of the head of household (HOH), examples of which are represented in Table 4.
An example implementation of the demographic prediction neural network 335 of
In the example demographic prediction neural network 335 of
Table 5 lists example dimensions of the data at each stage of the example demographic prediction neural network 335 of
In some examples, to prevent the demographic prediction neural network 335 from over-fitting, and enable it to better generalize, the feature generator 330 shuffles the order of blocks fed into demographic prediction neural network 335 during each training epoch.
Returning to
As illustrated in the example of
subject to a set of constraints. The example constraints of
Referring to
The example constraints of
The example constraints of
The example constraints of
In some examples, the household demographic assignment engine 340 implements simulated annealing to further adjust the demographic category assignments made for the RPD households. An example operation of the household demographic assignment engine 340 to perform simulated annealing is illustrated in
In some examples, the household demographic assignment engine 340 breaks the demographic assignment problem illustrated in
Returning to
Examples disclosed above in connection with
Further examples disclosed herein assign long term visitors (e.g., virtual long term visitors) as well as household members (also referred to as primary household members, e.g., virtual members for whom the household is their residence) to the RPD households to satisfy the demographic presence categories assigned to the households and known, or estimated, UEs of numbers of long term visitors associated with the RPD households (e.g., from panel data and/or other audience measurement techniques). At a high-level, some such disclosed visitor assignment techniques modify the targets (e.g., UEs) used by the example demographic category assignment techniques disclosed above to incorporate visitors in the counts for the numbers of people in the different demographic classifications being used. By modifying the input targets, the example demographic category assignment techniques disclosed above (also referred to as household demographic assignment model (HDAM) techniques) are able to predict total household compositions that include (but at this point do not distinguish among) both household members and long term visitors. Disclosed example visitor assignment techniques also include creating visitor vectors that conform to provided visitor distribution targets (e.g., from panel data and/or other audience measurement techniques), and then assigning the visitor vectors to RPD households having predicted total household compositions that are able to support the sizes and demographic compositions of the respective visitor vectors.
A block diagram of an example neural-network-based demographic estimation system structured to implement the processing flows of
In the illustrated example of
fi=Nv,i/(Np,i+Nv,i) (Equation 1)
In Equation 1 above, Nv,i is the number of visitors in the demographic category (e.g., age-gender bucket) i (e.g., obtained from panel/NPM data), and Np,i is the number of primary household members in the demographic category (e.g., age-gender bucket) i (e.g., obtained from panel/NPM data). The scale factor for the demographic category i is found to be:
Si=1−fi=Np,i/(Np,i+Nv,i) (Equation 2).
The scale factor Si in Equation 2 above is used by the example demographic target adjuster 1050 to adjust (e.g., divide) the UE for the total number of RPD household members in demographic category i, which is used as the demographic presence UE for demographic category i for the HDAM technique, as described above in connection with the neural-network-based demographic estimation system 300. In this way, the example demographic target adjuster 1050 adjusts the presence UEs used by the HDAM technique to account for the presence of long term visitors.
In some examples, the demographic targets adjuster 1050 may adjust the demographic targets using target rates instead of the scale factors described above. In some such examples, it is assumed that the HDAM technique disclosed in the technique of the neural-network-based demographic estimation system 300 has been further modified to assign individual household members to the RPD households based on the demographic categories (e.g., demographic presence) assigned RPD households, as disclosed above. The demographic targets adjuster 1050 determines the target rates of occurrences of different numbers of people in the different demographic categories (e.g., as determined from panel data and/or other audience measurement techniques). The example HDAM technique disclosed above can be further modified to scale the demographic categories (e.g., demographic presence) assigned to RPD households by the target rates found by the example demographic targets adjuster 1050 to assign individual members to the RPD households in accordance with those targets. For example, if the target rates of occurrence of a first demographic category are 80% that one person in a household will be in that category, 15% that two people in a household will be in that same category, and 5% that three people in a household will be in that category, then a modified HDAM technique can select RPD households that have been assigned the first demographic category such that 80% of those households will be assigned one individual in that category, 15% of those households will be assigned two individuals in that category, and 5% of those homes will be assigned three individuals in that category.
The visitor assignment engine 1055 of the illustrated example of
In the illustrated example of
The visitor vector assigner 1125 of the illustrated example of
The output of the example visitor vector assigner 1125 includes demographic assignments for primary household members and long term visitors for the different RPD households. This output is provided by the example visitor assignment engine 1055 to the example ratings calculator 350, where the ratings calculator 350 performs calculations based on the output that includes both primary household members and long term visitors.
In some examples, the visitor vector assigner 1125 can be simplified under a constraint that only one visitor is to be assigned to a given RPD household. Such a simplification further assumes that the number of visitors is less than or equal to the number of RPD households. In such an example, the example visitor vector assigner selects one of the demographic categories (e.g., one of the age-gender buckets) to use in assigning the visitor vectors. The example visitor vector assigner 1125 identifies the set of RPD households that are candidates for a visitor in the selected demographic category (e.g., selected age-gender bucket) and orders that set of RPD households based on the probability of having a visitor in that demographic category (e.g., age-gender bucket). The example visitor vector assigner 1125 selects the RPD household with the highest visitor probability and labels one of the people assigned to that RPD households in that demographic category (e.g., age-gender bucket) as a long term visitor. In such an example, the example visitor vector assigner iterates until all visitors in the selected demographic category (e.g., age-gender bucket) have been placed in RPD households and over the different demographic categories until the target number of visitors have been assigned to the RPD homes.
While an example manner of implementing the neural-network-based demographic estimation system 300 is illustrated in
While an example manner of implementing the neural-network-based demographic estimation system 1000 is illustrated in
In examples disclosed herein, the example function generator 330 implements means for generating features from return path data. The example neural network 335 implements means for processing the features generated from the return path data to predict demographic classification probabilities for return path data households. The example demographic assignment engine 340 implements means for assigning one or more demographic categories to respective ones of return path data households. The example visitor assignment engine 1055 implements means for assigning virtual visitors to at least a subset of the respective ones of the return path data households. The example demographic targets adjuster 1050 implements means for updating demographic targets to account for the presence of visitors. The example visitor vector generator 1120 implements means for generating a visitor vector containing a first number of visitors. The example visitor vector assigner 1125 implements means for assigning the visitor vector to a first one of the return path data households. The example visitor demographic distribution calculator 1105 implements means for determining respective percentages of visitors in ones of the one or more demographic categories. The example visitor household distribution calculator 1110 implements means for determining respective percentages of the return path data households having corresponding numbers of visitors.
A flowchart representative of example hardware logic, machine readable instructions, hardware implemented state machines, and/or any combination thereof for implementing the example neural-network-based demographic estimation system 300 is shown in
Flowcharts representative of example hardware logic, machine readable instructions, hardware implemented state machines, and/or any combination thereof for implementing the example neural-network-based demographic estimation system 1000 are shown in
The machine readable instructions described herein may be stored in one or more of a compressed format, an encrypted format, a fragmented format, a packaged format, etc. Machine readable instructions as described herein may be stored as data (e.g., portions of instructions, code, representations of code, etc.) that may be utilized to create, manufacture, and/or produce machine executable instructions. For example, the machine readable instructions may be fragmented and stored on one or more storage devices and/or computing devices (e.g., servers). The machine readable instructions may require one or more of installation, modification, adaptation, updating, combining, supplementing, configuring, decryption, decompression, unpacking, distribution, reassignment, etc. in order to make them directly readable and/or executable by a computing device and/or other machine. For example, the machine readable instructions may be stored in multiple parts, which are individually compressed, encrypted, and stored on separate computing devices, wherein the parts when decrypted, decompressed, and combined form a set of executable instructions that implement a program such as that described herein. In another example, the machine readable instructions may be stored in a state in which they may be read by a computer, but require addition of a library (e.g., a dynamic link library), a software development kit (SDK), an application programming interface (API), etc. in order to execute the instructions on a particular computing device or other device. In another example, the machine readable instructions may need to be configured (e.g., settings stored, data input, network addresses recorded, etc.) before the machine readable instructions and/or the corresponding program(s) can be executed in whole or in part. Thus, the disclosed machine readable instructions and/or corresponding program(s) are intended to encompass such machine readable instructions and/or program(s) regardless of the particular format or state of the machine readable instructions and/or program(s) when stored or otherwise at rest or in transit.
As mentioned above, the example processes of
“Including” and “comprising” (and all forms and tenses thereof) are used herein to be open ended terms. Thus, whenever a claim employs any form of “include” or “comprise” (e.g., comprises, includes, comprising, including, having, etc.) as a preamble or within a claim recitation of any kind, it is to be understood that additional elements, terms, etc. may be present without falling outside the scope of the corresponding claim or recitation. As used herein, when the phrase “at least” is used as the transition term in, for example, a preamble of a claim, it is open-ended in the same manner as the term “comprising” and “including” are open ended. The term “and/or” when used, for example, in a form such as A, B, and/or C refers to any combination or subset of A, B, C such as (1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, (6) B with C, and (7) A with B and with C. As used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. Similarly, as used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. As used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. Similarly, as used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B.
An example program 1200 that may be executed to implement the example neural-network-based demographic estimation system 300 of
At block 1220, the example RPD data collector 320 of the neural-network-based demographic estimation system 300 collects RPD tuning data, as described above. At block 1225, the example feature generator 330 generates feature vectors (e.g., such as the vectors describes in Table 1 above) for the RPD households based on the collected RPD tuning data, as described above. At block 1230, the feature generator 330 applies the RPD feature vectors generated at block 1225 to the trained demographic prediction neural network 335 of the neural-network-based demographic estimation system 300 to predict demographic classification probabilities for the respective RPD homes, as described above. At block 1235, the example household demographic assignment engine 340 of the neural-network-based demographic estimation system 300 obtains the demographic assignments constraints from the example constraint database 345, as described above. At block 1240, the example household demographic assignment engine 340 of the neural-network-based demographic estimation system 300 uses the demographic classification probabilities determined at block 1230 to assign demographic categories to respective ones of the RPD households, as described above. At block 1245, the example ratings calculator 350 of the neural-network-based demographic estimation system 300 augments/combines the panel tuning data collected at block 1205, which already has associated demographic data, with the RPD tuning data collected at block 1220 based on the demographic categories assigned to the respective ones of the RPD households at block 1245, as described above.
An example program 1300 may be executed to implement the example neural-network-based demographic estimation system 1000 of
At block 1320, the example RPD data collector 320 of the neural-network-based demographic estimation system 1000 collects RPD tuning data, as described above. At block 1325, the example feature generator 330 generates feature vectors (e.g., such as the vectors describes in Table 1 above) for the RPD households based on the collected RPD tuning data, as described above. At block 1330, the feature generator 330 applies the RPD feature vectors generated at block 1325 to the trained demographic prediction neural network 335 of the neural-network-based demographic estimation system 1000 to predict demographic classification probabilities for the respective RPD households, as described above. At block 1335, the demographic targets adjuster 1050 updates the demographic targets for the household demographic assignment engine 340. As described in further detail below, the example flowcharts of
At block 1340, the example household demographic assignment engine 340 obtains the demographic assignment constraints from the example constraint database 345, as described above in connection with
At block 1350, the example visitor assignment engine 1055 assigns the visitors to households. The example visitor assignment engine 1055 uses the demographic category assignments of respective RPD households determined at block 1345 by the example household demographic assignment engine 340 to assign the visitors to the respective RPD households. As described in further detail below, the example flowchart of
At block 1355, the example ratings calculator 350 of the neural-network-based demographic estimation system 1000 augments/combines the panel tuning data collected at block 1305, which already has associated demographic data, with the RPD tuning data collected at block 1320 based on the demographic categories assigned to the respective ones of the RPD households at block 1345 and the visitors assigned to the respective ones of the RPD households at block 1350, as described above.
A first example program 1340a may be executed to implement the example demographic targets adjuster 1040 of
A second example alternative program 1340b that may be executed to implement the example demographic targets adjuster 1040 of
At block 1510, the example household demographic assignment engine 340 assigns individuals to the households in accordance with the target rates found at block 1505. Referring to the same example above for block 1505, the example household demographic assignment engine 340 selects RPD households that have been assigned the first demographic category such that 80% of those households will be assigned one individual in that category, 15% of those households will be assigned two individuals in that category, and 5% of those homes will be assigned three individuals in that category. In the example program 1340b of
An example program 1350 may be executed to implement the example visitor assignment engine 1055 of
At block 1615, the example visitor demographic distribution calculator 1105 determines the percentages of long term visitors in ones of the demographic categories. The example visitor demographic distribution calculator 1105 uses the panelist data from the panelist database 315 to determine what percentage of people in each of the demographic categories (e.g. age-gender bucket) are long term visitors. At block 1620, the example visitor household distribution calculator 1110 determines the percentages of households with corresponding numbers of long term visitors. The example visitor household distribution calculator 1110 uses the panelist data from the panelist database 315 to determine the percentage of households that have 1, 2, 3, 4, etc. visitors respectively. For example, the example visitor household distribution calculator 1110 determines what percentage of households have one visitor, and then what percentage of household have two visitors, etc.
At block 1625, the example visitor vectors generator 1120 creates visitor vectors. The example visitor vectors generator 1120 uses the percentages determined by the example visitor demographic distribution calculator 1105 and the example visitor household distribution calculator 1110 to creates the visitor vectors. As described in further detail below, the example flowchart of
At block 1635, the example visitor vector assigner 1125 assigns visitor vectors to the households. The example visitor vector assigner 1125 uses the panelist data from the panelist database 315 when assigning the visitor vectors created by the visitor vectors generator at bock 1625 to the RPD households. As described in further detail below, the example flowcharts of
An example program 1625 may be executed to implement the example visitor vectors generator 1120 of
At block 1710, the example visitor vector generator 1120 creates a visitor pool. Th example visitor vector generator 1120 creates a visitor pool with the total number of expected visitors in each demographic category (e.g., each age-gender bucket). At block 1715, the example visitor vector generator 1120 generates a visitor vector of a selected size. In examples disclosed herein, the selected size is determined based on a random number generator, where the probability of a given size being selected corresponds to the input percentage of households that have that given number (e.g., 1, 2, 3, 4, etc.) of visitors, as determined by the example visitor household distribution calculator 1110. However, other selection methods may additionally or alternatively be used.
At block 1720, the example visitor vector generator 1120 selects a number of visitors from the visitor pool based on the selected size of the visitor vector and places the selected visitors into the generated visitor vector. The example visitor vector generator 1120 selects a number of visitors from the visitor pool corresponding to the selected size of the visitor vector determined at block 1715. The example visitor vector generator 1120 then places the selected visitors into the visitor vector generated at block 1715. At block 1725, the example visitor vector generator 1120 determines if there are any visitors left in the visitor pool. If the example visitor vector generator 1120 determines that there are visitors left in the visitor pool, the example program 1625 of
A first example program 1635a may be executed to implement the example visitor vector assigner 1125 of
At block 1810, the example visitor vector assigner 1125 determines the probabilities that each household includes at least one visitor corresponding to the set of households. In some examples, the probabilities can be the same for all RPD households such that each home is equally likely to include a visitor. In some examples, the demographic prediction neural network 335 disclosed above can be adapted to output a probability to predict, based on the panelist tuning data from the panelist database 315, the likelihood that a given RPD household has a visitor.
At block 1815, the example visitor vector assigner 1125 selects one visitor vector from the generated visitor vectors from the example visitor vector generator 1120. At block 1820, the example visitor vector assigner 1125 generates a list of valid households for placement of the selected visitor vector. An RPD household is valid if the RPD household is assigned (by the example household demographic assignment engine 340 with the modified demographic targets from the demographic targets adjuster 1050) the same number, or more individuals, in each demographic category than are included in the selected visitor vector. The visitor vector assigner 1125 also ensures that there is at least one individual remains in the adult demographic category assigned to the RPD household that is not a long term visitor.
At block 1825, the example visitor vector assigner 1125 selects a household from the list of valid households determined at block 1815 that has the highest probability of having a visitor determined at block 1810. At block 1830, the example visitor vector assigner 1125 assigns the visitor vector to the selected RPD household. At block 1835, the example visitor vector assigner 1125 removes the selected household from the remaining set of available households for visitor assignments.
At block 1840, the example visitor vector assigner 1125 determines if there are visitor vectors left. If the example visitor vector assigner 1125 determines that there are visitor vectors left, the example program 1635a of
A second alternative program 1635b may be executed to implement the example visitor vector assigner 1125 of
At block 1905, the example visitor vector assigner 1125 determines the probabilities that each household includes at least one visitor corresponding to the set of households. In some examples, the probabilities can be the same for all RPD household such that each home is equally likely to includes a visitor. In some examples, the demographic prediction neural network 335 disclosed above can be adapted to output a probability to predict, based on the panelist tuning data from the panelist database 315, the likelihood that a given RPD household has a visitor.
At block 1910, the example visitor vector assigner 1125 selects one demographic category. At block 1915, the example visitor vector assigner 1125 identifies a set of RPD households that can have a visitor in the selected demographic category. At block 1920, the example visitor vector assigner 1125 orders the set of the RPD households based on the probability of having a visitor in the selected demographic category determined at block 1905. At block 1925, the example visitor vector assigner 1125 selects the household from the ordered set of households. In examples disclosed herein, the example visitor vector assigner 1125 selects the RPD household from the ordered set that has the highest probability of having at least one visitor. At block 1930, the example visitor vector assigner 1125 assigns the visitor of the selected demographic category to the selected RPD household. At block 1935, the example visitor vector assigner 1125 removes the selected household from the ordered set.
At block 1940, the example visitor vector assigner 1125 determines if there are visitors left in the demographic category. If the example visitor vector assigner 1125 determines that there are visitor left in the demographic category, the example program 1635b of
At block 1945, the example visitor vector assigner 1125 determines if there are demographic categories left. If the example visitor vector assigner 1125 determines that there are demographic categories left, the example program 1635 of
The processor platform 2000 of the illustrated example includes a processor 2012. The processor 2012 of the illustrated example is hardware. For example, the processor 2012 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor 2012 may be a semiconductor based (e.g., silicon based) device. In this example, the processor 2012 implements the example panel tuning data collector 310, the example RPD data collector 320, the example feature generator 330, the example household demographic assignment engine 340 and the example ratings calculator 350.
The processor 2012 of the illustrated example includes a local memory 2013 (e.g., a cache). The processor 2012 of the illustrated example is in communication with a main memory including a volatile memory 2014 and a non-volatile memory 2016 via a link 2018. The link 2018 may be implemented by a bus, one or more point-to-point connections, etc., or a combination thereof. The volatile memory 2014 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory (RDRAM®) and/or any other type of random access memory device. The non-volatile memory 2016 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 2014, 2016 is controlled by a memory controller.
The processor platform 2000 of the illustrated example also includes an interface circuit 2020. The interface circuit 2020 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), a Bluetooth® interface, a near field communication (NFC) interface, and/or a PCI express interface. In this example, the interface circuit 2020 implements the network interface 305.
In the illustrated example, one or more input devices 2022 are connected to the interface circuit 2020. The input device(s) 2022 permit(s) a user to enter data and/or commands into the processor 2012. The input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, a trackbar (such as an isopoint), a voice recognition system and/or any other human-machine interface. Also, many systems, such as the processor platform 2000, can allow the user to control the computer system and provide data to the computer using physical gestures, such as, but not limited to, hand or body movements, facial expressions, and face recognition.
One or more output devices 2024 are also connected to the interface circuit 2020 of the illustrated example. The output devices 2024 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube display (CRT), an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer and/or speakers(s). The interface circuit 2020 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor.
The interface circuit 2020 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 2026. The communication can be via, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, etc.
The processor platform 2000 of the illustrated example also includes one or more mass storage devices 2028 for storing software and/or data. Examples of such mass storage devices 2028 include floppy disk drives, hard drive disks, compact disk drives, Blu-ray disk drives, redundant array of independent disks (RAID) systems, and digital versatile disk (DVD) drives. In some examples, the mass storage device(s) 2028 may implement the panelist database 315, the RPD database 325 and/or the constraint database 345. Additionally or alternatively, in some examples the volatile memory 2014 may implement the panelist database 315, the RPD database 325 and/or the constraint database 345.
The machine executable instructions 2032 corresponding to the instructions of
The processor platform 2100 of the illustrated example includes a processor 2112. The processor 2112 of the illustrated example is hardware. For example, the processor 2112 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor 2112 may be a semiconductor based (e.g., silicon based) device. In this example, the processor 2112 implements the example panel tuning data collector 310, the example RPD data collector 320, the example feature generator 330, the example household demographic assignment engine 340, the example demographic targets adjuster 1050, the example visitor assignment engine 1055 and the example ratings calculator 350.
The processor 2112 of the illustrated example includes a local memory 2113 (e.g., a cache). The processor 2113 of the illustrated example is in communication with a main memory including a volatile memory 2113 and a non-volatile memory 2116 via a link 2118. The link 2118 may be implemented by a bus, one or more point-to-point connections, etc., or a combination thereof. The volatile memory 2114 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory (RDRAM®) and/or any other type of random access memory device. The non-volatile memory 2116 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 2114, 2116 is controlled by a memory controller.
The processor platform 2100 of the illustrated example also includes an interface circuit 2120. The interface circuit 2120 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), a Bluetooth® interface, a near field communication (NFC) interface, and/or a PCI express interface. In this example, the interface circuit 2020 implements the network interface 305.
In the illustrated example, one or more input devices 2120 are connected to the interface circuit 2120. The input device(s) 2122 permit(s) a user to enter data and/or commands into the processor 2112. The input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, a trackbar (such as an isopoint), a voice recognition system and/or any other human-machine interface. Also, many systems, such as the processor platform 2100, can allow the user to control the computer system and provide data to the computer using physical gestures, such as, but not limited to, hand or body movements, facial expressions, and face recognition.
One or more output devices 2124 are also connected to the interface circuit 2120 of the illustrated example. The output devices 2124 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube display (CRT), an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer and/or speakers(s). The interface circuit 2120 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor.
The interface circuit 2120 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 2126. The communication can be via, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, etc.
The processor platform 2100 of the illustrated example also includes one or more mass storage devices 2128 for storing software and/or data. Examples of such mass storage devices 2128 include floppy disk drives, hard drive disks, compact disk drives, Blu-ray disk drives, redundant array of independent disks (RAID) systems, and digital versatile disk (DVD) drives. In some examples, the mass storage device(s) 2028 may implement the panelist database 315, the RPD database 325 and/or the constraint database 345. Additionally or alternatively, in some examples the volatile memory 2014 may implement the panelist database 315, the RPD database 325 and/or the constraint database 345.
The machine executable instructions 2132 corresponding to the instructions of
From the foregoing, it will be appreciated that example methods, apparatus and articles of manufacture have been disclosed that implement neural network processing of set-top box return path data to estimate household demographics. An example neural-network-based demographic estimation system 1000 disclosed above uses a neural network having a time distributed dense layer (TDDL) followed by a long short term memory (LSTM) recurrent network layer to predict demographic classifications of a households (e.g., panel household for training, and RPD households after training) from viewing data (e.g., panelist tuning data for training, and RPD tuning data after training). The example neural-network-based demographic estimation system 1000 groups viewing data for a household into view blocks which describe respective viewing sessions, where a view block indicates the day of the week, the day of the year, the quarter hour of the day, the channel change rate, and the minutes each possible network was viewed. In some examples, viewing blocks are capped at 60 minutes. In some examples, view blocks for a given household are combined and processed by the TDDL to produce a condensed feature set for the viewing sessions of the household. The condensed feature set is then processed by the LSTM to produce a condensed summary feature vector that summarizes the viewing history for the household. The condensed summary feature vector is merged with additional household features, such as total TV consumption, number of view blocks recorded and number of TV tuners in the household, to produce a merged summary feature vector for the household. The merged summary feature vector is then applied to one or more additional hidden layers, which output a classification vector indicating the probability that the household belongs in the different possible demographic classes. Mixed integer programming is then used to solve an objective function based on the demographic classification probabilities output from the neural network, and subject to a set of constraints, to assign one or more demographic categories to respective ones of the RPD households providing the RPD tuning data.
The disclosed methods, apparatus and articles of manufacture improve the efficiency of using a computing device by enabling RPD tuning data to be combined with panelist tuning data in an audience measurement processing system. Combining RPD tuning data with available panel data can greatly increase the amount of data accessible by the audience measurement processing system for predicting audience metrics (e.g., ratings). Such an increased amount of data can improve the statistical completeness of the input data and thereby decrease the associated statistical bias of the results produced by the audience measurement processing system. The disclosed methods, apparatus and articles of manufacture are accordingly directed to one or more improvement(s) in the functioning of a computer.
Although certain example methods, apparatus and articles of manufacture have been disclosed herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus and articles of manufacture fairly falling within the scope of the claims of this patent.
This patent arises from a continuation-in-part of U.S. patent application Ser. No. 16/230,620, titled “NEURAL NETWORK PROCESSING OF RETURN PATH DATA TO ESTIMATE HOUSEHOLD DEMOGRAPHICS” and filed on Dec. 21, 2018, which claims the benefit of and priority to U.S. Provisional Application Ser. No. 62/743,925, titled “NEURAL NETWORK PROCESSING OF SET-TOP BOX RETURN PATH DATA TO ESTIMATE HOUSEHOLD DEMOGRAPHICS” and filed on Oct. 10, 2018. This patent also claims the benefit and priority to U.S. Provisional Application Ser. No. 62/841,641, titled “NEURAL NETWORK PROCESSING OF RETURN PATH DATA TO ESTIMATE HOUSEHOLD MEMBER AND VISITOR DEMOGRAPHICS” and filed on May 1, 2019. Priority to U.S. patent application Ser. No. 16/230,620, U.S. Provisional Application Ser. No. 62/743,925 and U.S. Provisional Application Ser. No. 62/841,641 is claimed. U.S. patent application Ser. No. 16/230,620, U.S. Provisional Application Ser. No. 62/743,925 and U.S. Provisional Application Ser. No. 62/841,641 are hereby incorporated herein by reference in their respective entireties.
Number | Date | Country | |
---|---|---|---|
62743925 | Oct 2018 | US | |
62841641 | May 2019 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16230620 | Dec 2018 | US |
Child | 16706398 | US |