This disclosure relates generally to computer systems for monitoring audiences, and, more particularly, to methods and apparatus to generate audience metrics using third-party privacy-protected cloud environments.
Media can be presented to and/or accessed by an audience via the Internet. A media provider can log impressions corresponding to media accesses by audience members. The media provider can generate audience-based media access metrics based on the logged media impressions.
The figures are not to scale. In general, the same reference numbers will be used throughout the drawing(s) and accompanying written description to refer to the same or like parts. Connection references (e.g., attached, coupled, connected, and joined) are to be construed broadly and may include intermediate members between a collection of elements and relative movement between elements unless otherwise indicated. As such, connection references do not necessarily infer that two elements are directly connected and in fixed relation to each other.
Descriptors “first,” “second,” “third,” etc. are used herein when identifying multiple elements or components which may be referred to separately. Unless otherwise specified or understood based on their context of use, such descriptors are not intended to impute any meaning of priority, physical order or arrangement in a list, or ordering in time but are merely used as labels for referring to multiple elements or components separately for ease of understanding the disclosed examples. In some examples, the descriptor “first” may be used to refer to an element in the detailed description, while the same element may be referred to in a claim with a different descriptor such as “second” or “third.” In such instances, it should be understood that such descriptors are used merely for ease of referencing multiple elements or components.
Audience measurement entities (AMEs) usually collect large amounts of audience measurement information from their panelists including the number of unique audience members for particular media and the number of impressions corresponding to each of the audience members. Unique audience size, as used herein, refers to the total number of unique people (e.g., non-duplicate people) who had an impression of (e.g., were exposed to) a particular media item, without counting duplicate audience members. As used herein, an impression is defined to be an event in which a home or individual accesses and/or is exposed to media (e.g., an advertisement, content, a group of advertisements and/or a collection of content). Impression count, as used herein, refers to the number of times audience members are exposed to a particular media item. The unique audience size associated with a particular media item will always be equal to or less than the number of impressions associated with the media item because, while all audience members by definition have at least one impression of the media, an individual audience member may have more than one impression. That is, the unique audience size is equal to the impression count only when every audience member was exposed to the media only a single time (i.e., the number of audience members equals the number of impressions). Where at least one audience member is exposed to the media multiple times, the unique audience size will be less than the total impression count because multiple impressions will be associated with individual audience members. Thus, unique audience size refers to the number of unique people in an audience (without double counting any person) exposed to media for which audience metrics are being generated. Unique audience size may also be referred to as unique audience, deduplicated audience size, deduplicated audience, or audience.
Techniques for monitoring user access to an Internet-accessible media, such as digital television (DTV) media and digital content ratings (DCR) media, have evolved significantly over the years. Internet-accessible media is also known as digital media. In the past, such monitoring was done primarily through server logs. In particular, media providers serving media on the Internet would log the number of requests received for their media at their servers. Basing Internet usage research on server logs is problematic for several reasons. For example, server logs can be tampered with either directly or via zombie programs, which repeatedly request media from the server to increase the server log counts. Also, media is sometimes retrieved once, cached locally and then repeatedly accessed from the local cache without involving the server. Server logs cannot track such repeat views of cached media. Thus, server logs are susceptible to both over-counting and under-counting errors.
As Internet technology advanced, the limitations of server logs were overcome through methodologies in which the Internet media to be tracked was tagged with monitoring instructions. In particular, monitoring instructions (also known as a media impression request or a beacon request) are associated with the hypertext markup language (HTML) of the media to be tracked. When a client requests the media, both the media and the impression request are downloaded to the client. The impression requests are, thus, executed whenever the media is accessed, be it from a server or from a cache.
The beacon instructions cause monitoring data reflecting information about the access to the media (e.g., the occurrence of a media impression) to be sent from the client that downloaded the media to a monitoring server. Typically, the monitoring server is owned and/or operated by an AME (e.g., any party interested in measuring or tracking audience exposures to advertisements, media, and/or any other media) that did not provide the media to the client and who is a trusted third party for providing accurate usage statistics (e.g., The Nielsen Company, LLC). Advantageously, because the beaconing instructions are associated with the media and executed by the client browser whenever the media is accessed, the monitoring information is provided to the AME irrespective of whether the client is associated with a panelist of the AME. In this manner, the AME is able to track every time a person is exposed to the media on a census-wide or population-wide level. As a result, the AME can reliably determine the total impression count for the media without having to extrapolate from panel data collected from a relatively limited pool of panelists within the population. Frequently, such beacon requests are implemented in connection with third-party cookies. Since the AME is a third party relative to the first party serving the media to the client device, the cookie sent to the AME in the impression request to report the occurrence of the media impression of the client device is a third-party cookie. Third-party cookie tracking is used by audience measurement servers to track access to media by client devices from first-party media servers.
Tracking impressions by tagging media with beacon instructions using third-party cookies is insufficient, by itself, to enable an AME to reliably determine the unique audience size associated with the media if the AME cannot identify the individual user associated with the third-party cookie. That is, the unique audience size cannot be determined because the collected monitoring information does not uniquely identify the person(s) exposed to the media. Under such circumstances, the AME cannot determine whether two reported impressions are associated with the same person or two separate people. The AME may set a third-party cookie on a client device reporting the monitoring information to identify when multiple impressions occur using the same device. However, cookie information does not indicate whether the same person used the client device in connection with each media impression. Furthermore, the same person may access media using multiple different devices that have different cookies so that the AME cannot directly determine when two separate impressions are associated with the same person or two different people.
Furthermore, the monitoring information reported by a client device executing the beacon instructions does not provide an indication of the demographics or other user information associated with the person(s) exposed to the associated media. To at least partially address this issue, the AME establishes a panel of users who have agreed to provide their demographic information and to have their Internet browsing activities monitored. When an individual joins the panel, that person provides corresponding detailed information concerning the person's identity and demographics (e.g., gender, race, income, home location, occupation, etc.) to the AME. The AME sets a cookie on the panelist computer that enables the AME to identify the panelist whenever the panelist accesses tagged media and, thus, sends monitoring information to the AME. Additionally or alternatively, the AME may identify the panelists using other techniques (independent of cookies) by, for example, prompting the user to login or identify themselves. While AMEs are able to obtain user-level information for impressions from panelists (e.g., identify unique individuals associated with particular media impressions), most of the client devices providing monitoring information from the tagged pages are not panelists. Thus, the identity of most people accessing media remains unknown to the AME such that it is necessary for the AME to use statistical methods to impute demographic information based on the data collected for panelists to the larger population of users providing data for the tagged media. However, panel sizes of AMEs remain small compared to the general population of users.
There are many database proprietors operating on the Internet. These database proprietors provide services to large numbers of subscribers. In exchange for the provision of services, the subscribers register with the database proprietors. Examples of such database proprietors include social network sites (e.g., Facebook, Twitter, My Space, etc.), multi-service sites (e.g., Yahoo!, Google, Axiom, Catalina, etc.), online retailer sites (e.g., Amazon.com, Buy.com, etc.), credit reporting sites (e.g., Experian), streaming media sites (e.g., YouTube, Hulu, etc.), etc. These database proprietors set cookies and/or other device/user identifiers on the client devices of their subscribers to enable the database proprietors to recognize their subscribers when their subscribers visit website(s) on the Internet domains of the database proprietors.
The protocols of the Internet make cookies inaccessible outside of the domain (e.g., Internet domain, domain name, etc.) on which they were set. Thus, a cookie set in, for example, the YouTube.com domain (e.g., a first party) is accessible to servers in the YouTube.com domain, but not to servers outside that domain. Therefore, although an AME (e.g., a third party) might find it advantageous to access the cookies set by the database proprietors, they are unable to do so. However, techniques have been developed that enable an AME to leverage media impression information collected in association with demographic information in subscriber databases of database proprietors to collect more extensive Internet usage (e.g., beyond the limited pool of individuals participating in an AME panel) by extending the impression request process to encompass partnered database proprietors and by using such partners as interim data collectors. In particular, this task is accomplished by structuring the AME to respond to impression requests from clients (who may not be a member of an audience measurement panel and, thus, may be unknown to the AME) by redirecting the clients from the AME to a database proprietor, such as a social network site partnered with the AME, using an impression response. Such a redirection initiates a communication session between the client accessing the tagged media and the database proprietor. For example, the impression response received from the AME may cause the client to send a second impression request to the database proprietor along with a cookie set by that database proprietor. In response to receiving this impression request, the database proprietor (e.g., Facebook) can access the cookie it has set on the client to thereby identify the client based on the internal records of the database proprietor.
In the event the client corresponds to a subscriber of the database proprietor (as determined from the cookie associated with the client), the database proprietor logs/records a database proprietor demographic impression in association with the client/user. As used herein, a demographic impression is an impression that can be matched to particular demographic information of a particular subscriber or registered users of the services of a database proprietor. The database proprietor has the demographic information for the particular subscriber because the subscriber would have provided such information when setting up an account to subscribe to the services of the database proprietor.
Sharing of demographic information associated with subscribers of database proprietors enables AMEs to extend or supplement their panel data with substantially reliable demographics information from external sources (e.g., database proprietors), thus extending the coverage, accuracy, and/or completeness of their demographics-based audience measurements. Such access also enables the AME to monitor persons who would not otherwise have joined an AME panel. Any web service provider having a database identifying demographics of a set of individuals may cooperate with the AME. Such web service providers may be referred to as “database proprietors” and include, for example, wireless service carriers, mobile software/service providers, social media sites (e.g., Facebook, Twitter, MySpace, etc.), online retailer sites (e.g., Amazon.com, Buy.com, etc.), multi-service sites (e.g., Yahoo!, Google, Experian, etc.), and/or any other Internet sites that collect demographic data of users and/or otherwise maintain user registration records. The use of demographic information from disparate data sources (e.g., high-quality demographic information from the panels of an audience measurement entity and/or registered user data of database proprietors) results in improved reporting effectiveness of metrics for both online and offline advertising campaigns.
The above approach to generating audience metrics by an AME depends upon the beacon requests (or tags) associated with the media to be monitored to enable an AME to obtain census wide impression counts (e.g., impressions that include the entire population exposed to the media regardless of whether the audience members are panelists of the AME). Further, the above approach also depends on third-party cookies to enable the enrichment of the census impressions with demographic information from database proprietors. However, in more recent years, there has been a movement away from the use of third-party cookies by third parties. Thus, while media providers (e.g., database proprietors) may still use first-party cookies to collect first-party data, the elimination of third-party cookies prevents the tracking of Internet media by AMEs (outside of client devices associated with panelists for which the AME has provided a meter to track Internet usage behavior). Furthermore, independent of the use of cookies, some database proprietors are moving towards the elimination of third party impression requests or tags (e.g., redirect instructions) embedded in media (e.g., beginning in 2020, third-party tags will no longer be allowed on Youtube.com and other Google Video Partner (GVP) sites). As technology moves in this direction, AMEs (e.g., third parties) will no longer be able to track census wide impressions of media in the manner they have in the past. Furthermore, AMEs will no longer be able to send a redirect request to a client accessing media to cause a second impression request to a database proprietor to associate the impression with demographic information. Thus, the only Internet media monitoring that AMEs will be able to directly perform in such a system will be with panelists that have agreed to be monitored using different techniques that do not depend on third-party cookies and/or tags.
Examples disclosed herein overcome at least some of the limitations that arise out of the elimination of third-party cookies and/or third-party tags by enabling the merging of high-quality demographic information from the panels of an AME with media impression data that continues to be collected by database proprietors. As mentioned above, while third-party cookies and/or third-party tags may be eliminated, database proprietors that provide and/or manage the delivery of media accessed online are still able to track impressions of the media (e.g., via first-party cookies and/or first-party tags). Furthermore, database proprietors are still able to associate demographic information with the impressions whenever the impressions can be matched to a particular subscriber of the database proprietor for which demographic information has been collected (e.g., when the user registered with the database proprietor). In some examples, the merging of AME panel data and database proprietor impressions data is merged in a privacy-protected cloud environment maintained by the database proprietor.
More particularly,
As used herein, a media impression is defined as an occurrence of access and/or exposure to media 108 (e.g., an advertisement, a movie, a movie trailer, a song, a web page banner, etc.). Examples disclosed herein may be used to monitor for media impressions of any one or more media types (e.g., video, audio, a web page, an image, text, etc.). In examples disclosed herein, the media 108 may be primary content and/or advertisements. Examples disclosed herein are not restricted for use with any particular type of media. On the contrary, examples disclosed herein may be implemented in connection with tracking impressions for media of any type or form in a network.
In the illustrated example of
In some examples, the media 108 is associated with a unique impression identifier (e.g., a consumer playback nonce (CPN)) generated by the database proprietor 102. In some examples, the impression identifier serves to uniquely identify a particular impression of the media 108. Thus, even though the same media 108 may be served multiple times, each time the media 108 is served the database proprietor 102 will generate a new and different impression identifier so that each impression of the media 108 can be distinguished from every other impression of the media. In some examples, the impression identifier is encoded into a uniform resource locator (URL) used to access the primary content (e.g., a particular YouTube video) along with which the media 108 (as an advertisement) is served. In some examples, with the impression identifier (e.g., CPN) encoded into the URL associated with the media 108, the audience measurement meter 115 extracts the identifier at the time that a media impression occurs so that the AME 104 is able to associate a captured impression with the impression identifier.
In some examples, the meter 115 may not be able to obtain the impression identifier (e.g., CPN) to associate with a particular media impression. For instance, in some examples where the panelist client device 112 is a mobile device, the meter 115 collects a mobile advertising identifier (MAID) and/or an identifier for advertisers (IDFA) that may be used to uniquely identify client devices 110 (e.g., the panelist client devices 112 being monitored by the AME 104). In some examples, the meter 115 reports the MAID and/or IDFA for the particular device associated with the meter 115 to the AME 104. The AME 104, in turn, provides the MAID and/or IDFA to the database proprietor 102 in a double blind exchange through which the database proprietor 102 provides the AME 104 with the impression identifiers (e.g., CPNs) associated with the client device 110 identified by the MAID and/or IDFA. Once the AME 104 receives the impression identifiers for the client device 110 (e.g., a particular panelist client device 112), the impression identifiers are associated with the impressions previously collected in connection with the device.
In the illustrated example, the database proprietor 102 logs each media impression occurring on any of the client devices 110 within the privacy-protected cloud environment 106. In some examples, logging an impression includes logging the time the impression occurred and the type of client device 110 (e.g., whether a desktop device, a mobile device, a tablet device, etc.) on which the impression occurred. Further, in some examples, impressions are logged along with the impression's unique impression identifier. In this example, the impressions and associated identifiers are logged in a campaign impressions database 116. The campaign impressions database 116 stores all impressions of the media 108 regardless of whether any particular impression was detected from a panelist client device 112 or a non-panelist client device 114. Furthermore, the campaign impressions database 116 stores all impressions of the media 108 regardless of whether the database proprietor 102 is able to match any particular impression to a particular subscriber of the database proprietor 102. As mentioned above, in some examples, the database proprietor 102 identifies a particular user (e.g., subscriber) associated with a particular media impression based on a cookie stored on the client device 110. In some examples, the database proprietor 102 associates a particular media impression with a user that was signed into the online services of the database proprietor 102 at the time the media impression occurred. In some examples, in addition to logging such impressions and associated identifiers in the campaign impressions database 116, the database proprietor 102 separately logs such impressions in a matchable impressions database 118. As used herein, a matchable impression is an impression that the database proprietor 102 is able to match to at least one of a particular subscriber (e.g., because the impression occurred on a client device 110 on which a user was signed into the database proprietor 102) or a particular client device 110 (e.g., based on a first-party cookie of the database proprietor 102 detected on the client device 110). In some examples, if the database proprietor 102 cannot match a particular media impression (e.g., because no user was signed in at the time the media impression occurred and there is no recognizable cookie on the associated client device 110) the impression is omitted from the matchable impressions database 118 but is still logged in the campaign impressions database 116.
As indicated above, the matchable impressions database 118 includes media impressions (and associated unique impression identifiers) that the database proprietor 102 is able to match to a particular user that has registered with the database proprietor 102. In some examples, the matchable impressions database 118 also includes user-based covariates that correspond to the particular user to which each impression in the database was matched. As used herein, a user-based covariate refers to any item(s) of information collected and/or generated by the database proprietor 102 that can be used to identify, characterize, quantify, and/or distinguish particular users and/or their associated behavior. For example, user-based covariates may include the name, age, and/or gender of the user (and/or any other demographic information about the user) collected at the time the user registered with the database proprietor 102, and/or the relative frequency with which the user uses the different types of client device 110, the number of media items the user has accessed during a most recent period of time (e.g., the last 30 days), the search terms entered by the user during a most recent period of time (e.g., the last 30 days), feature embeddings (numerical representations) of classifications of videos viewed and/or searches entered by the user, etc. As mentioned above, the matchable database 118 also includes impressions matched to particular client devices 110 (based on first-party cookies), even when the impressions cannot be matched to particular users (based on the users being signed in at the time). In some such examples, the impressions matched to particular client devices 110 are treated as distinct users within the matchable database 118. However, as no particular user can be identified, such impressions in the matchable database 118 will not be associated with any user-based covariates.
Although only one campaign impressions database 116 is shown in the illustrated example, the privacy-protected cloud environment 106 may include any number of campaign impressions databases 116, with each database storing impressions corresponding to different media campaigns associated with one or more different advertisers (e.g., product manufacturers, service providers, retailers, advertisement servers, etc.). In other examples, a single campaign impressions database 116 may store the impressions associated with multiple different campaigns. In some such examples, the campaign impressions database 116 may store a campaign identifier in connection with each impression to identify the particular campaign to which the impression is associated. Similarly, in some examples, the privacy-protected cloud environment 106 may include one or more matchable impressions databases 118 as appropriate. Further, in some examples, the campaign impressions database 116 and the matchable impressions database 118 may be combined and/or represented in a single database.
In the illustrated example of
As shown in the illustrated example, whereas the database proprietor 102 is able to collect impressions from both panelist client devices 112 and non-panelist client devices 114, the AME 104 is limited to collecting impressions from panelist client devices 112. In some examples, the AME 104 also collects the impression identifier associated with each collected media impression so that the collected impressions may be matched with the impressions collected by the database proprietor 102 as described further below. In the illustrated example, the impressions (and associated impression identifiers) of the panelists are stored in an AME panel data database 122 that is within an AME first party data store 124 in an AME proprietary cloud environment 126. In some examples, the AME proprietary cloud environment 126 is a cloud-based storage system (e.g., a Google Cloud Project) provided by the database proprietor 102 that includes functionality to enable interfacing with the privacy-protected cloud environment 106 also maintained by the database proprietor 102. As mentioned above, the privacy-protected cloud environment 106 is governed by privacy constraints that prevent any party (with some limited exceptions for the database proprietor 102) from accessing private information associated with particular individuals. By contrast, the AME proprietary cloud environment 126 is indicated as proprietary because it is exclusively controlled by the AME such that the AME has full control and access to the data without limitation. While some examples involve the AME proprietary cloud environment 126 being a cloud-based system that is provided by the database proprietor 102, in other examples, the AME proprietary cloud environment 126 may be provided by a third-party entity distinct from the database proprietor 102.
While the AME 104 is limited to collected impressions (and associated identifiers) from only panelists (e.g., via the panelist client devices 112), the AME 104 is able to collect panel data that is much more robust than merely media impressions. As mentioned above, the panelist client devices 112 are associated with users that have agreed to participate on a panel of the AME 104. Participation in a panel includes the provision of detailed demographic information about the panelist and/or all members in the panelist's household. Such demographic information may include age, gender, race, ethnicity, education, employment status, income level, geographic location of residence, etc. In addition to such demographic information, which may be collected at the time a user enrolls as a panelist, the panelist may also agree to enable the AME 104 to track and/or monitor various aspects of the user's behavior. For example, the AME 104 may monitor panelists' Internet usage behavior including the frequency of Internet usage, the times of day of such usage, the websites visited, and the media exposed to (from which the media impressions are collected).
AME panel data (including media impressions and associated identifiers, demographic information, and Internet usage data) is shown in
In some examples, there may be multiple different techniques and/or methodologies used to collect the AME panel data that depends on the particular circumstances involved. For example, different monitoring techniques and/or different types of audience measurement meters 115 may be employed for media accessed via a desktop computer relative to the media accessed via a mobile computing device. In some examples, the audience measurement meter 115 may be implemented as a software application that panelists agree to install on their devices to monitor all Internet usage activity on the respective devices. In some examples, the meter 115 may prompt a user of a particular device to identify themselves so that the AME 104 can confirm the identity of the user (e.g., whether it was the mother or daughter in a panelist household). In some examples, prompting a user to self-identify may be considered overly intrusive. Accordingly, in some such examples, the circumstances surrounding the behavior of the user of a panelist client device 112 (e.g., time of day, type of content being accessed, etc.) may be analyzed to infer the identity of the user to some confidence level (e.g., the accessing of children's content in the early afternoon would indicate a relatively high probability that a child is using the device at that point in time). In some examples, the audience measurement meter 115 may be a separate hardware device that is in communication with a particular panelist client device 112 and enabled to monitor the Internet usage of the panelist client device 112.
In some examples, the processes and/or techniques used by the AME 104 to capture panel data (including media impressions and who in particular was exposed to the media) can differ depending on the nature of the panelist client device 112 through which the media was accessed. For instance, in some examples, the identity of the individual using the client device 112 may be based on the individual responding to a prompt to self-identify. In some examples, such prompts are limited to desktop client devices because such a prompt is viewed as overly intrusive on a mobile device. However, without specifically prompting a user of a mobile device to self-identify, there often is no direct way to determine whether the user is the primary user of the device (e.g., the owner of the device) or someone else (e.g., a child of the primary user). Thus, there is the possibility of misattribution of media impressions within the panel data collected using mobile devices. In some examples, to overcome the issue of misattribution in the panel data, the AME 104 may develop a machine learning model that can predict the true user of a mobile device (or any device for that matter) based on information that the AME 104 does know for certain and/or has access to. For example, inputs to the machine learning model may include the composition of the panelist household, the type (e.g., genre and/or category) of the content, the daypart or time of day when the content was accessed, etc. In some examples, the truth data used to generate and validate such a model may be collected through field surveys in which the above input features are tracked and/or monitored for a subset of panelists that have agreed to be monitored in this manner (which is more intrusive than the typical passive monitoring of content accessed via mobile devices).
As mentioned above, in some examples, the AME panel data (stored in the AME panel data database 122) is merged with the database proprietor impressions data (stored in the matchable impressions database 118) within the privacy-protected cloud environment 106 to take advantage of the combination of the disparate sets of data to generate more robust and/or reliable audience measurement metrics. In particular, the database proprietor impressions data provides the advantage of volume. That is, the database proprietor impressions data corresponds to a much larger number of impressions than the AME panel data because the database proprietor impressions data includes census wide impression information that includes all impressions collected from both the panelist client devices 112 (associated with a relatively small pool of audience members) and the non-panelist client devices 114. The AME panel data provides the advantage of high-quality demographic data for a statistically significant pool of audience members (e.g., panelists) that may be used to correct for errors and/or biases in the database proprietor impressions data.
One source of error in the database proprietor impressions data is that the demographic information for matchable users collected by the database proprietor 102 during user registration may not be truthful. In particular, in some examples, many database proprietors impose age restrictions on their user accounts (e.g., a user must be at least 13 years of age, at least 18 years of age, etc.). However, when a person registers with the database proprietor 102, the user typically self-declares their age and may, therefore, lie about their age (e.g., an 11 year old may say they are 18 to bypass the age restrictions for a user account). Independent of age restrictions, a particular user may choose to enter an incorrect age for any other reason or no reason at all (e.g., a 44 year old may choose to assert they are only 25). Where a database proprietor 102 does not verify the self-declared age of users, there is a relatively high likelihood that the ages of at least some registered users of the database proprietor stored in the matchable impressions database 118 (as a particular user-based covariate) are inaccurate. Further, it is possible that other self-declared demographic information (e.g., gender, race, ethnicity, income level, etc.) may also be falsified by users during registration. As described further below, the AME panel data (which contains reliable demographic information about the panelists) can be used to correct for inaccurate demographic information in the database proprietor impressions data.
Another source of error in the database proprietor impressions data is based on the concept of misattribution, which arises in situations where multiple different people use the same client device 110 to access media. In some examples, the database proprietor 102 associates a particular impression to a particular user based on the user being signed into a platform provided by the database proprietor. For example, if a particular person signs into their Google account and begins watching a YouTube video on a particular client device 110, that person will be attributed with an impression for an ad served during the video because the person was signed in at the time. However, there may be instances where the person finishes using the client device 110 but does not sign out of his or her Google account. Thereafter, a second different person (e.g., a different member in the family of the first person) begins using the client device 110 to view another YouTube video. Although the second person is now accessing media via the client device 110, ad impressions during this time will still be attributed to the first person because the first person is the one who is still indicated as being signed in. Thus, there is likely to be circumstances where the actual person exposed to media 108 is misattributed to a different registered user of the database proprietor 102. The AME panel data (which includes an indication of the actual person using the panelist client devices 112 at any given moment) can be used to correct for misattribution in the demographic information in the database proprietor impressions data. As mentioned above, in some situations, the AME panel data may itself include misattribution errors. Accordingly, in some examples, the AME panel data may first be corrected for misattribution before the AME panel data is used to correct misattribution in the database proprietor impressions data. An example methodology to correct for misattribution in the database proprietor impressions data is described in Singh et al., U.S. Pat. No. 10,469,903, which is hereby incorporated herein by reference in its entirety.
Another problem with the database proprietor impressions data is that of non-coverage. Non-coverage refers to impressions recorded by the database proprietor 102 that cannot be matched to a particular registered user of the database proprietor 102. The inability of the database proprietor 102 to match a particular impression to a particular user can occur for several reasons including that the user is not signed in at the time of the media impression, that the user has not established an account with the database proprietor 102, that the user has enabled Limited Ad Tracking (LAT) to prevent the user account from being associated with ad impressions, or that the content associated with the media being monitored corresponds to children's content (for which user-based tracking is not performed). While the inability of the database proprietor 102 to match and assign a particular impression to a particular user is not necessarily an error in the database proprietor impressions data, it does undermine the ability to reliably estimate the total unique audience size for (e.g., the number of unique individuals that were exposed to) a particular media item. For example, assume that the database proprietor 102 records a total of 11,000 impressions for media 108 in a particular advertising campaign. Further assume that of those 11,000 impressions, the database proprietor 102 is able to match 10,000 impressions to a total of 5,000 different users (e.g., each user was exposed to the media on average 2 times) but is unable to match the remaining 1,000 impressions to particular users. Relying solely on the database proprietor impressions data, in this example, there is no way to determine whether the remaining 1,000 impressions should also be attributed to the 5,000 users already exposed at least once to the media 108 (for a total audience size of 5,000 people) or if one or more of the remaining 1,000 impressions should be attributed to other users not among the 5,000 already identified (for a total audience size of up to 6,000 people (if every one of the 1,000 impressions was associated with a different person not included in the matched 5,000 users)). In some examples disclosed herein, the AME panel data can be used to estimate the distribution of impressions across different users associated with the non-coverage portion of impressions in the database proprietor impressions data to thereby estimate a total audience size for the relevant media 108.
Another confounding factor to the estimation of the total unique audience size for media based on the database proprietor impressions data is the existence of multiple user accounts of a single user. More particular, in some situations a particular individual may establish multiple accounts with the database proprietor 102 for different purposes (e.g., a personal account, a work account, a joint account shared with other individuals, etc.). Such a situation can result in a larger number of different users being identified as audience members to media 108 than the actual number of individuals exposed to the media 108. For example, assume that a particular person registers three user accounts with the database proprietor 102 and is exposed to the media 108 once while signed into each of the three different accounts for a total of three impressions. In this scenario, the database proprietor 102 would match each impression to a different user based on the different user accounts making it appear that three different people were exposed to the media 108 when, in fact, only one person was exposed to the media three different times. Examples disclosed herein use the AME panel data in conjunction with the database proprietor impressions data to estimate an actual unique audience size from the potentially inflated number of apparently unique users exposed to the media 108.
In the illustrated example of
In some examples, the AME intermediary merged data is analyzed by an adjustment factor analyzer 134 to calculate adjustment or calibration factors that may be stored in an adjustment factors database 136 within an AME output data store 138 of the AME proprietary cloud environment 126. In some examples, the adjustment factor analyzer 134 calculates different types of adjustment factors to account for different types of errors and/or biases in the database proprietor impressions data. For instance, a multi-account adjustment factor corrects for the situation of a single user accessing media using multiple different user accounts associated with the database proprietor 102. A signed-out adjustment factor corrects for non-coverage associated with users that access media while signed out of their account associated with the database proprietor 102 (so that the database proprietor 102 is unable to associate the impression with the users). In some examples, the adjustment factor analyzer 134 is able to directly calculate the multi-account adjustment factor and the signed-out adjustment factor in a deterministic manner.
While the multi-account adjustment factors and the signed-out adjustment factors may be deterministically calculated, correcting for falsified or otherwise incorrect demographic information (e.g., incorrectly self-declared ages) of registered users of the database proprietor 102 cannot be solved in such a direct and deterministic manner. Rather, in some examples, a machine learning model is developed to analyze and predict the correct ages of registered users of the database proprietor 102. Specifically, as shown in
As mentioned above, there are many different types of covariates collected and/or generated by the database proprietor 102. In some examples, the covariates provided by the database proprietor 102 may include a certain number (e.g., 100) of the top search result click entities and/or video watch entities for every user during a most recent period of time (e.g., for the last month). These entities are integer identifiers (IDs) that map to a knowledge graph of all entities for the search result clicks and/or videos watched. That is, as used in this context, an entity corresponds to a particular node in a knowledge graph maintained by the database proprietor 102. In some examples, the total number of unique IDs in the knowledge graph may number in the tens of millions. More particularly, for example, YouTube videos are classified across roughly 20 million unique video entity IDs and Google search results are classified across roughly 25 million unique search result entity IDs. In addition to the top search result click entities and/or video watch entities, the database proprietor 102 may also provide embeddings for these entities. An embedding is a numerical representation (e.g., a vector array of values) of some class of similar objects, images, words, and the like. For example, a particular user that frequently searches for and/or views cat videos may be associated with a feature embedding representative of the class corresponding to cats. Thus, feature embeddings translate relatively high dimensional vectors of information (e.g., text strings, images, videos, etc.) into a lower dimensional space to enable the classification of different but similar objects.
In some examples, multiple embeddings may be associated with each search result click entity and/or video watch entity. Accordingly, assuming the top 100 search result entities and video watch entities are provided among the covariates and that 16 dimension embeddings are provided for each such entity, this results in a 100×16 matrix of values for every user, which may be too much data to process during generation of the demographic correction models as described above. Accordingly, in some examples, the dimensionality of the matrix is reduced to a more manageable size to be used as an input feature for the demographic correction model generation.
In some examples, a process is implemented to track different demographic correction model experiments over time to achieve high quality (e.g., accurate) models and also for auditing purposes. Accomplishing this objective within the context of the privacy-protected cloud environment 106 presents several unique challenges because the model features (e.g., inputs and hyperparameters) and model performance (e.g., accuracy) are stored separately to satisfy the privacy constraints of the environment.
In some examples, a model analyzer 144 may implement and/or use one or more demographic correction models to generate predictions and/or inferences as to the actual demographics (e.g., actual ages) of users associated with media impressions logged by the database proprietor 102. That is, in some examples, as shown in
As described above, in some examples, the database proprietor 102 may identify a particular user as corresponding to a particular impression based on the user being signed into the database proprietor 102. However, there are circumstances where the individual corresponding to the user account is not the actual person that was exposed to the relevant media. Accordingly, merely inferring a correct demographic (e.g., age) of the user associated with the signed in user account may not be the correct demographic of the actual person to which a particular media impression should be attributed. In other words, whereas the AME panelist data and the database proprietor impressions data is matched at the impression level, demographic correction is implemented at the user level. Therefore, before generating the demographic correction model, a method to reduce logged impressions to individual users is first implemented so that the demographic correction model can be reliably implemented.
With inferences made to correct for inaccurate demographic information of database proprietor users (e.g., falsified self-declared ages) and stored in the model inferences database 146, the AME 104 may be interested in extracting audience measurement metrics based on the corrected data. However, as mentioned above, the data contained inside the privacy-protected cloud environment 106 is subject to privacy constraints. In some examples, the privacy constraints ensure that the data can only be extracted for review and/or analysis in aggregate so as to protect the privacy of any particular individual represented in the data (e.g., a panelist of the AME 104 and/or a registered user of the database proprietor 102). Accordingly, in some examples, a data aggregator 148 aggregates the audience measurement data associated with particular media campaigns before the data is provided to an aggregated campaign data database 150 in the AME output data store 138 of the AME proprietary cloud environment 126.
The data aggregator 148 may aggregate data in different ways for different types of audience measurement metrics. For instance, at the highest level, the aggregated data may provide the total impression count and total number of users (e.g., estimated audience size) exposed to the media 108 for a particular media campaign. As mentioned above, the total number of users reported by the data aggregator 148 is based on the total number of unique user accounts matched to impressions but does not include the individuals associated with impressions that were not matched to a particular user (e.g., non-coverage). However, the total number of unique user accounts does not account for the fact that a single individual may correspond to more than one user account (e.g., multi-account users), and does not account for situations where a person other than a signed-in user was exposed to the media 108 (e.g., misattribution). These errors in the aggregated data may be corrected based on the adjustment factors stored in the adjustment factors database 136. Further, in some examples, the aggregated data may include an indication of the demographic composition of the users represented in the aggregated data (e.g., number of males vs females, number of users in different age brackets, etc.).
Additionally or alternatively, in some examples, the data aggregator 148 may provide aggregated data that is associated with a particular aspect of a media campaign. For instance, the data may be aggregated based on particular sites (e.g., all media impressions served on YouTube.com). In other examples, the data may be aggregated based on placement information (e.g., aggregated based on particular primary content videos accessed by users when the media advertisement was served). In other examples, the data may be aggregated based on device type (e.g., impressions served via a desktop computer versus impressions served via a mobile device). In other examples, the data may be aggregated based on a combination of one or more of the above factors and/or based on any other relevant factor(s).
In some examples, the privacy constraints imposed on the data within the privacy-protected cloud environment 106 include a limitation that data cannot be extracted (even when aggregated) for less than a threshold number of individuals (e.g., 50 individuals). Accordingly, if the particular metric being sought includes less than the threshold number of individuals, the data aggregator 148 will not provide such data. For instance, if the threshold number of individuals is 50 but there are only 46 females in the age range of 18-25 that were exposed to particular media 108, the data aggregator 148 would not provide the aggregate data for females in the 18-25 age bracket. Such privacy constraints can leave gaps in the audience measurement metrics, particularly in locations where the number of panelists is relatively small. Accordingly, in some examples, when audience measurement is not available for a particular demographic segment of interest in a particular region (e.g., a particular country), the audience measurement metrics in one or more comparable region(s) may be used to impute the metrics for the missing data in the first region of interest. In some examples, the particular metrics imputed from comparable regions is based on a comparison of audience metrics for which data is available in both regions. For instance, while data for females in the 18-25 bracket may be unavailable, assume that data for females in the 26-35 age bracket is available. The metrics associated with the 26-35 age bracket in the region of interests may be compared with metrics for the 26-35 age bracket in other regions and the regions with the closest metrics to the region of interest may be selected for use in calculating imputation factor(s).
As shown in the illustrated example, both the adjustment factors database 136 and the aggregated campaigns data database 150 are included within the AME output data store 138 of the AME proprietary cloud environment 126. As mentioned above, in some examples, the AME proprietary cloud environment 126 is provided by the database proprietor 102 and enables data to be provided to and retrieved from the privacy-protected cloud environment. In some examples, the aggregated campaign data and the adjustment factors are subsequently transferred to a separate computing apparatus 152 of the AME 104 for analysis by an audience metrics analyzer 154. In some examples, the separate computing apparatus may be omitted with its functionality provided by the AME proprietary cloud environment 126. In other examples, the AME proprietary cloud environment 126 may be omitted with the adjustment factors and the aggregated data provided directly to the computing apparatus 152. Further, in this example, the AME panel data database 122 is within the AME first party data store 124, which is shown as being separate from the AME output data store 138. However, in other examples, the AME first party data store 124 and the AME output data store 138 may be combined.
In the illustrated example of
The example input interface 202 receives and/or otherwise obtains input data from the AME privacy-protected data store 132 (
In some examples, the AME intermediary merged data includes survey data collected by the AME 104. For example, the survey data is collected for one or more panelists of the AME 104, and the survey data indicates demographics, device type, device usage statistics, and/or user account information corresponding to the one or more panelists. Example Table 1 below illustrates example survey data for an example household, where the household includes three example users (e.g., a first example user A, a second example user B, and a third example user C). In this example, Table 1 includes columns corresponding to the user account information. In this example, user A does not have a user account registered with the database proprietor 102. User B has a Gmail account registered with the database proprietor 102, and User C has a Gmail account and a YouTube account registered with the database proprietor 102. Users A, B, and C use a device to access media via YouTube, and user B is the primary user of the device. Furthermore, in this example, user A corresponds to a first demographic (e.g., males aged 2-12), user B corresponds to a second demographic (e.g., males aged 13-17), and user C corresponds to a third demographic (e.g., males aged 18-20). In other examples, the any number of users corresponding to any number of different demographics and/or user accounts can be used instead. In some examples, the survey data aggregates data across multiple households and/or multiple panelists associated with the AME 104.
The example panelist detector 204 identifies panelists (e.g., AME panelists) that also have database proprietor accounts of the database proprietor 102. For example, the panelists are registered subscribers of the database proprietor 102. In some examples, the panelist detector 204 identifies the panelists from the panel data collected by the AME 104 and/or from the survey data. In example Table 1 above, the panelists include users A, B, and/or C.
The example sign-in rate calculator 206 determines an actual sign-in rate of the panelists based on the panel data and the database proprietor impressions data. For example, the sign-in rate calculator 206 determines a panelist-subscriber (PS) impressions count of PS impressions of the panelists based on the database proprietor impressions data. In this example, the PS impressions are associated with media accessed by and/or presented to the panelists while the panelists are signed into subscriber accounts of the database proprietor 102. Stated differently, the PS impressions do not correspond to media accesses that occur while the panelists are signed out of the database proprietor 102. Furthermore, the sign-in rate calculator 206 determines a panelist total (PT) impressions count of PT impressions of the panelists based on the panel data. In this example, the PT impressions correspond to total impressions collected by the AME 104 for the panelists. For example, the PT impressions are associated with any media accesses by the panelists regardless of whether they are signed into the database proprietor 102.
To determine the actual sign-in rate, the sign-in rate calculator 206 determines the actual sign-in rate as a percentage of PS impressions count relative to the PT impressions count by dividing the PS impressions count by the PT impressions count (e.g., actual sign-in rate=PS/PT). In an example, assuming that the panelists experienced 60 impressions while signed into the database proprietor 102 (e.g., PS impressions count=60) and an additional 40 impressions while signed out of the database proprietor 102 for a PT impressions count of 100 (e.g., 60+40), the actual sign-in rate for the panelists would be 60% (e.g., PS/PT=60/100=60%).
The example adjustment factor generator 208 determines audience adjustment factors (e.g., non-coverage and/or misattribution adjustment factors) based on the survey data from the AME intermediary merged data 130. In some examples, the audience adjustment factors are misattribution adjustment factors used to correct misattribution error(s) in the database proprietor impressions data. In other examples, the audience adjustment factors are non-coverage adjustment factors used to correct non-coverage error(s) in the database proprietor impressions data. For example, the adjustment factor generator 208 determines 0% audience adjustment factors (e.g., zero sign-out rate (SOR) non-coverage and misattribution adjustment factors, zero SOR audience adjustment factors) corresponding to a sign-out rate of 0%. A 0% sign-out rate (e.g., zero percent SOR) means that a registered user of the database proprietor 102 does not sign out of the database proprietor 102 after using a device.
In some examples, the misattribution adjustment factor accounts for media impressions misattributed to a subscriber audience member (e.g., a registered subscriber signed into the database proprietor 102 via a device) that should be attributed to a different person (e.g., a different, guest user actually using the device to access media while the subscriber audience member was still signed in). As such, when the sign-out rate is 0%, logged impressions for media accesses by the device are associated with the registered user, regardless of whether the registered user is the person using the device. Stated differently, there is a high chance for misattribution of impressions when the sign-out rate is 0%. In some examples, the non-coverage adjustment factor accounts for impressions that the database proprietor 102 is unable to match with particular individuals. Thus, for a 0% sign-out rate (e.g., a user that does not sign out), there is no chance for non-coverage because the database proprietor 102 will match any media impressions to the registered user signed into the user account, such that all impressions of that user will be treated as covered.
Example Table 2 below shows example zero SOR audience adjustment factors corresponding to a sign-out rate of 0%. In this example, the zero SOR audience adjustment factors of Table 2 below are misattribution adjustment factors. As such, the zero SOR audience adjustment factors are provided in matrix form. In other examples, when the zero SOR audience adjustment factors are non-coverage adjustment factors, the zero SOR audience adjustment factors are provided in vector form. In this example, a first demographic corresponds to males aged 2 to 12, a second demographic corresponds to males aged 13 to 17, and a third demographic corresponds to males aged 18 to 20. In other examples, one or more different demographics may be used instead. In example Table 2 below, the demographic labels down the left-hand side are the actual demographics (e.g., true demographics) of audience members corresponding to the logged impressions, and demographic labels across the top labelled “logged demographics” are the demographics identified by the database proprietor 102 as corresponding to the logged impressions.
Example Table 2 below illustrates a probability that, for each logged impression, the database proprietor logs a demographic i given that the true demographic for the logged impression is j. In this example, because user A of Table 1 above does not have an account registered with the database proprietor 102, the database proprietor 102 does not log impressions associated with user A (e.g., corresponding to the first demographic M: 2-12). As such, when user A is using the device to access media, impressions for the media accesses experienced by user A are incorrectly reported as user B (e.g., the second demographic M: 13-17) or user C (e.g., the third demographic M: 18-20) based on whether user B or user C is signed into the device when the accesses occur. For example, based on a first row of the below Table 2 and a 0% sign-out rate, for an impression associated with the first actual demographic (j) (e.g., males aged 2-12), a probability that the database proprietor 102 correctly reports a first logged demographic (i) (M: 2-12) for the first actual demographic (j) (M: 2-12) for the impression is 0, a probability that the database proprietor 102 incorrectly reports a second logged demographic (i) (M: 13-17) for the first actual demographic (j) (M: 2-12) for the impression is 0.5, and a probability that the database proprietor 102 incorrectly reports a third logged demographic (i) (M: 18-20) for the first actual demographic (j) (M: 2-12) for the impression is 0.5.
In this example, media accesses by user B are logged as impressions for user B (e.g., M: 13-17) or for user C (e.g., M: 18-20) based on which user is signed into the device when the media accesses occur. For example, based on a second row of the below Table 2 and a 0% sign-out rate, for an impression associated with the second actual demographic (j) (e.g., males aged 13-17), a probability that the database proprietor 102 incorrectly reports the first logged demographic (i) (M: 2-12) for the second actual demographic (j) (M: 13-17) for the impression is 0, a probability that the database proprietor 102 correctly reports the second logged demographic (i) (M: 13-17) for the second actual demographic (j) (M: 13-17) for the impression is 0.6, and a probability that the database proprietor 102 incorrectly reports the third logged demographic (i) (M: 18-20) for the second actual demographic (j) (M: 13-17) for the impression is 0.4.
Similarly, media accesses by user C are logged as impressions for user B (e.g., M: 13-17) or for user C (e.g., M: 18-20) based on which user is signed into the device when the media accesses occur. In this example, a probability that a logged impression is reported as user B is greater than the probability that the logged impression is reported as user C because user B is the primary user of the device. Stated differently, impressions are more likely to be logged for the second demographic (e.g., M: 13-17) associated with user B than for the third demographic (M: 18-20) associated with user C. For example, based on a third row of the below Table 2 and a 0% sign-out rate, for an impression associated with the third actual demographic (j) (e.g., males aged 18-20), a probability that the database proprietor 102 incorrectly reports the first logged demographic (i) (M: 2-12) for the third actual demographic (j) (M: 18-20) for the impression is 0, a probability that the database proprietor 102 incorrectly reports the second logged demographic (i) (M: 13-17) for the third actual demographic (j) (M: 18-20) for the impression is 0.8, and a probability that the database proprietor 102 correctly reports the third logged demographic (i) (M: 18-20) for the third actual demographic (j) (M: 18-20) for the impression is 0.2.
The adjustment factor generator 208 also determines 100% audience adjustment factors (e.g., full SOR non-coverage and misattribution adjustment factors, full SOR audience adjustment factors) corresponding to a sign-out rate of 100%. A 100% sign-out rate (e.g., a full SOR, a one hundred percent SOR) means that a registered user of the database proprietor 102 signs out of a subscriber account of the database proprietor 102 via a device after the user uses that device to access media for which the database proprietor 102 logs impressions. In some examples, a 100% sign-out rate results from a registered user being always signed into the database proprietor 102 from a particular device when media accesses occur, and the registered user being signed out when the user is not using the device to access media. For a 100% sign-out rate for a particular user, the database proprietor 102 will correctly match that user to any media impressions resulting from the user accessing media via that particular device. Stated differently, for a 100% sign-in rate for a particular user, there is no chance for misattribution of impressions for that user when the user is registered with the database proprietor 102. However, in such a 100% sign-in rate, the likelihood of non-coverage occurring is relatively high because any time a different guest person who does not have a subscriber account with the database proprietor uses that particular client device 110 to access media, the database proprietor 102 will not log a media impression caused by the guest person.
Example Table 3 below shows example full SOR audience adjustment factors (e.g., full SOR misattribution adjustment factors) corresponding to a 100% sign-out rate. In this example, no impressions are logged for user A because user A does not have a subscriber account with the database proprietor 102. As such, as shown in the first row of Table 3 below, each of the full SOR audience adjustment factors corresponding to the first actual demographic (j) (e.g., males aged 2-12) is zero. Furthermore, in this example, only those media accesses by user B are logged as impressions for user B since user B is signed in when the media accesses occur and is signed out when not using the device. Similarly, only those media accesses by user C are logged as impressions for user C since user C is signed in when the media accesses occur and is signed out when not using the device. Stated differently, no impressions are misattributed to user B or user C since both user B and user C sign out of their respective accounts after using the device to access media. As such, as shown in a second row of the below Table 3 and a 100% sign-out rate, for an impression associated with the second actual demographic (j) (e.g., males aged 13-17), a probability that the database proprietor 102 correctly reports a second logged demographic (i) (M: 13-17) for the second actual demographic (j) (M: 13-17) for the impression is 1, and a probability that the database proprietor 102 incorrectly reports a first logged demographic (i) (M: 2-12) or a third logged demographic (i) (M: 18-20) for the second actual demographic (j) (M: 13-17) for the impression is 0. Similarly, as shown in a third row of the below Table 3 and a 100% sign-out rate, for an impression associated with the third actual demographic (j) (e.g., males aged 18-20), a probability that the database proprietor 102 correctly reports the third logged demographic (i) (M: 18-20) for the third actual demographic (j) (M: 18-20) for the impression is 1, and a probability that the database proprietor 102 incorrectly reports the first logged demographic (i) (M: 2-12) or the second logged demographic (i) (M: 13-17) for the third actual demographic (j) (M: 18-20) for the impression is 0.
The example weighting controller 210 weights the zero SOR audience adjustment factors and the full SOR audience adjustment factors. In such examples, the weighting controller 210 weights the zero SOR and full SOR audience adjustment factors based on the actual sign-in rate calculated by the sign-in rate calculator 206. For example, the weighting controller 210 multiples each of the zero SOR audience factors by the actual sign-in rate of 60%, and multiples each of the full SOR audience adjustment factors by a sign-out rate (e.g., 40%), where the sign-out rate corresponds to the actual sign-in rate subtracted from 100% (e.g., 100%-60%=40%).
Example Table 4 below illustrates example calculated signed-out adjustment factors corresponding to the zero SOR audience adjustment factors and the full SOR audience adjustment factors of Tables 2 and 3, respectively. In this example, the adjustment factor generator 208 determines a weighted average of the zero SOR audience adjustment factors of Table 2 and the full SOR audience adjustment factors of Table 3. In particular, based on Equations 1 to 9 below, the adjustment factor generator 208 combines (e.g., sums) corresponding ones of the weighted zero SOR and full SOR misattribution factors to determine a signed-out adjustment factor for each demographic pair in Table 4.
(60%×0)+(40%×0)=0 Equation 1:
(60%×0.5)+(40%×0)=0.3 Equation 2:
(60%×0.5)+(40%×0)=0.3 Equation 3:
(60%×0)+(40%×0)=0 Equation 4:
(60%×0.6)+(40%×1)=0.76 Equation 5:
(60%×0.4)+(40%×0)=0.24 Equation 6:
(60%×0)+(40%×0)=0 Equation 7:
(60%×0.8)+(40%×0)=0.48 Equation 8:
(60%×0.2)+(40%×1)=0.52 Equation 9:
For example, as shown in a first row and first column of the below Table 4, when the actual demographic (j) and the corresponding logged demographic (i) is the first demographic (e.g., males aged 2-12), a first signed-out adjustment factor is 0. In such an example, the weighting controller 210 determines a first product of the actual sign-in rate and the first zero SOR audience adjustment factor (e.g., 60%×0) of the first row and the first column of Table 2, and determines a second product of the sign-out rate and the second full SOR audience adjustment factor (e.g., 40%×0) of the first row and the first column of Table 3. Further, based on Equation 1 above, the adjustment factor generator 208 determines the first signed-out adjustment factor by determining a sum of the first product and the second product (e.g., 60%×0+40%×0). In some examples, the adjustment factor generator 208 similarly determines a signed-out adjustment factor for each demographic pair in Table 4 below.
In this example, Tables 1, 2, 3, and 4 above correspond to one example household for which survey data is collected. In some examples, the AME 104 collects the survey data for multiple households. In such examples, the adjustment factor generator 208 generates zero SOR audience adjustment factors, full SOR audience adjustment factors, and signed-out adjustment factors for each of the multiple households. The adjustment factor generator 208 then aggregates the signed-out adjustment factors for each demographic pair across the multiple households. In one example, zero SOR audience adjustment factors, full SOR audience adjustment factors, and signed-out adjustment factors for a second example household are shown below in example Tables 5, 6, and 7, respectively. In this example, the users in the second household do not correspond to any of the given demographics (e.g., males aged 2-12, males aged 13-17, and males aged 18-20), so that no impressions are logged for the given demographics. As such, the zero SOR audience adjustment factors, full SOR audience adjustment factors, and signed-out adjustment factors are zero.
In some examples, the adjustment factor generator 208 determines aggregate signed-out misattribution adjustment factors by combining the signed-out misattribution adjustment factors for the first and second households. For example, the adjustment factor generator 208 sums corresponding ones of the signed-out misattribution adjustment factors of Table 4 and Table 7 for each demographic pair, and the aggregate signed-out misattribution adjustment factors are shown in example Table 8 below. For example, the adjustment factor generator 208 sums a first signed-out misattribution adjustment factor for the first household (e.g., corresponding to a first row and a first column of Table 4) and a second signed-out misattribution adjustment factor for the second household (e.g., corresponding to a first row and a first column of Table 7) to determine a first aggregate signed-out misattribution adjustment factor in a first row and a first column of Table 8 below (e.g., 0+0). The adjustment factor generator 208 similarly determines the aggregate signed-out misattribution adjustment factor for each demographic pair in Table 8.
In some examples, the adjustment factor generator 208 generates normalized signed-out misattribution adjustment factors by normalizing the aggregate signed-out misattribution adjustment factors of Table 8 above. For example, the adjustment factor generator 208 determines a first sum of the aggregate misattribution adjustment factors across the first row of Table 8 (e.g., 0+0.3+0.3=0.6), a second sum of the aggregate misattribution adjustment factors across the second row of Table 8 (e.g., 0+0.76+0.24=1), and a third sum of the aggregate misattribution adjustment factors across the third row of Table 8 (e.g., 0+0.48+0.52=1). In some examples, the adjustment factor generator 208 divides each aggregate signed-out misattribution adjustment factor in the first row of Table 8 by the first sum (e.g., 0/0.6=0, 0.3/0.6=0.5, and 0.3/0.6=0.5), divides each aggregate signed-out misattribution adjustment factor in the second row of Table 8 by the second sum (e.g., 0/1=0, 0.76/1=0.76, and 0.24/1=0.24), and divides each aggregate signed-out misattribution adjustment factor in the third row of Table 8 by the third sum (e.g., 0/1=0, 0.48/1=0.48, and 0.52/1=0.52) to generate the normalized signed-out misattribution adjustment factors shown in Table 9 below.
The example update controller 212 provides the signed-out adjustment factors of Table 4 corresponding to the first household, the signed-out adjustment factors of Table 7 corresponding to the second household, and/or the normalized signed-out misattribution adjustment factors of Table 9 to the adjustment factors database 136 (
In some examples, the update controller 212 determines whether to update the normalized signed-out misattribution adjustment factors. For example, the update controller 212 determines that the normalized signed-out misattribution adjustment factors are to be updated in response to the adjustment factor analyzer 134 receiving and/or otherwise accessing new panel data and/or new database proprietor impressions data via the input interface 202. In response to determining that the normalized signed-out misattribution adjustment factors are to be updated, the update controller 212 directs the adjustment factor generator 208 to determine new normalized signed-out misattribution adjustment factors based on the new panel data and/or the new database proprietor impressions data.
In some examples, the audience adjustment factors are non-coverage adjustment factors that are applied to the database proprietor impressions data to correct for non-coverage associated with users that access media while signed out of their database proprietor accounts. In some examples, the non-coverage adjustment factors are provided in vector form, and the non-coverage adjustment factors are calculated similarly to the misattribution adjustment factors of Tables 2 to 9 above. For example, the adjustment factor generator 208 determines the 0% audience adjustment factors (e.g., zero SOR non-coverage adjustment factors) corresponding to a sign-out rate of 0%. In some examples, the non-coverage adjustment factor accounts for impressions that the database proprietor 102 is unable to match with particular individuals because the impressions occur when no registered user of the database proprietor 102 is signed in. Thus, for a 0% sign-out rate (e.g., a user that does not sign out), there is no chance for non-coverage because the database proprietor 102 will match any media impressions to the registered user signed into the user account. Thus, based on the survey data for the first household shown in Table 1, the media accesses by each of user A, B, and C are covered as impressions by the database proprietor 102. Example Table 10 below shows the zero SOR non-coverage adjustment factors corresponding to the first household. In this example, the zero SOR non-coverage adjustment factor is 1 when impressions associated with the user for a given demographic are covered (e.g., logged) by the database proprietor 102, and the zero SOR non-coverage adjustment factor is 0 when impressions associated with the user for a given demographic are not covered (e.g., not logged) by the database proprietor 102.
The adjustment factor generator 208 determines the 100% audience adjustment factors (e.g., full SOR non-coverage adjustment factors) corresponding to a sign-out rate of 100%. For a 100% sign-out rate, impressions for users B and C are logged by the database proprietor 102 when the users are signed into the database proprietor 102 via a device. In this example, user A does not have an account registered with the database proprietor 102, and both users B and C sign out of their respective accounts when not using the device. As such, media accesses by user A are not covered in the impressions logged by the database proprietor 102 when the sign-out rate is 100%. Example Table 11 below shows the full SOR non-coverage adjustment factors corresponding to the first household. In this example, the first demographic (e.g., M: 2-12) corresponding to user A is not covered, and the second and third demographics (e.g., M: 13-17 and M: 18-20) corresponding to users B and C are covered.
In some examples, the weighting controller 210 weights the zero SOR non-coverage adjustment factors and the full SOR non-coverage adjustment factors based on the actual sign-in rate. For example, the weighting controller 210 multiples each of the zero SOR non-coverage adjustment factors by the actual sign-in rate of 60%, and multiples each of the full SOR non-coverage adjustment factors by the sign-out rate (e.g., 40%). Example Table 12 below illustrates example calculated signed-out non-coverage adjustment factors corresponding to the zero SOR non-coverage adjustment factors and the full SOR non-coverage adjustment factors of Tables 10 and 11, respectively. In this example, the adjustment factor generator 208 determines a weighted average of the zero SOR non-coverage adjustment factors of Table 10 and the full SOR non-coverage adjustment factors of Table 10. In particular, based on Equations 10, 11, and 12 below, the adjustment factor generator 208 combines (e.g., sums) corresponding ones of the weighted zero SOR and full SOR non-coverage adjustment factors to determine a signed-out non-coverage adjustment factor for each demographic shown in Table 12.
(60%×1)+(40%×0)=0.6 Equation 10:
(60%×1)+(40%×1)=1 Equation 11:
(60%×1)+(40%×1)=1 Equation 12:
In this example, Table 12 above corresponds to one example household for which survey data is collected. In some examples, the adjustment factor generator 208 generates zero SOR non-coverage adjustment factors, full SOR non-coverage adjustment factors, and signed-out non-coverage adjustment factors for multiple households. In some examples, zero SOR non-coverage adjustment factors, full SOR non-coverage adjustment factors, and signed-out non-coverage adjustment factors for the second example household are shown below in example Tables 13, 14, and 15, respectively. In such examples, the users in the second household do not correspond to any of the given demographics (e.g., males aged 2-12, males aged 13-17, and males aged 18-20), so that no impressions are logged for the given demographics. As such, the zero SOR non-coverage adjustment factors, the full SOR non-coverage adjustment factors, and the signed-out non-coverage adjustment factors are zero.
In some examples, the adjustment factor generator 208 generates aggregate signed-out non-coverage adjustment factors by combining the signed-out non-coverage adjustment factors for the first and second households. For example, the adjustment factor generator 208 generates the aggregate signed-out non-coverage adjustment factors shown in a third column of example Table 16 below.
In this example, a first column of Table 16 below illustrates a total number of devices available to the given demographic for accessing media. In this example, a first device is available in the first household and a second device is available in the second household, corresponding to a total of two devices across the first and second households. As such, the total devices corresponding to each of the first demographic (e.g., M: 2-12), the second demographic (e.g., M: 13-17), and the third demographic (e.g., M: 18-20) is 2, as shown in the first column of Table 16.
In some examples, the adjustment factor generator 208 determines covered devices for each of the demographics in Table 16, and the covered devices are shown in a second column of Table 16. In this example, the adjustment factor generator 208 determines the covered devices by summing corresponding ones of the signed-out non-coverage adjustment factors for the first household (e.g., corresponding to Table 12 above) and the second household (e.g., corresponding to Table 15 above). For example, the adjustment factor generator 208 determines first covered devices corresponding to the first demographic (e.g., M: 2-12) by summing the signed-out non-coverage adjustment factors corresponding to the first demographic across the first and second households (e.g., 0.6+0=0.6). The adjustment factor generator 208 similarly generates second covered devices (e.g., 1+0=1) corresponding to the second demographic (e.g., M: 13-17) and third covered devices (e.g., 1+0=1) corresponding to the third demographic (e.g., M: 18-20).
In this example, the adjustment factor generator 208 determines a coverage ratio for each of the demographics in Table 16, and the coverage ratios are shown in the second column of Table 16. In some examples, the coverage ratio for a given demographic corresponds to a ratio of the total devices for the given demographic to the covered devices for the given demographic. For example, the adjustment factor generator 208 determines a first coverage ratio corresponding to the first demographic (e.g., M: 2-12) by dividing the first covered devices by the total devices for the first demographic (e.g., 2/0.6=3.33). Similarly, the adjustment factor generator 208 determines a second coverage ratio corresponding to the second demographic (e.g., M: 13-17) by dividing the second covered devices by the total devices for the second demographic (e.g., 2/1=2), and determines a third coverage ratio corresponding to the third demographic (e.g., M: 18-20) by dividing the third covered devices by the total devices for the third demographic (e.g., 2/1=2). In some examples, the coverage ratios of the third column of Table 16 correspond to the aggregate signed-out non-coverage adjustment factors for the first and second households.
In some examples, the example update controller 212 provides the signed-out non-coverage adjustment factors of Table 16 to the adjustment factors database 136 (
While an example manner of implementing the example adjustment factor analyzer 134 of
A flowchart representative of example hardware logic, machine readable instructions, hardware implemented state machines, and/or any combination thereof for implementing the adjustment factor analyzer of
The machine readable instructions described herein may be stored in one or more of a compressed format, an encrypted format, a fragmented format, a compiled format, an executable format, a packaged format, etc. Machine readable instructions as described herein may be stored as data or a data structure (e.g., portions of instructions, code, representations of code, etc.) that may be utilized to create, manufacture, and/or produce machine executable instructions. For example, the machine readable instructions may be fragmented and stored on one or more storage devices and/or computing devices (e.g., servers) located at the same or different locations of a network or collection of networks (e.g., in the cloud, in edge devices, etc.). The machine readable instructions may require one or more of installation, modification, adaptation, updating, combining, supplementing, configuring, decryption, decompression, unpacking, distribution, reassignment, compilation, etc. in order to make them directly readable, interpretable, and/or executable by a computing device and/or other machine. For example, the machine readable instructions may be stored in multiple parts, which are individually compressed, encrypted, and stored on separate computing devices, wherein the parts when decrypted, decompressed, and combined form a set of executable instructions that implement one or more functions that may together form a program such as that described herein.
In another example, the machine readable instructions may be stored in a state in which they may be read by processor circuitry, but require addition of a library (e.g., a dynamic link library (DLL)), a software development kit (SDK), an application programming interface (API), etc. in order to execute the instructions on a particular computing device or other device. In another example, the machine readable instructions may need to be configured (e.g., settings stored, data input, network addresses recorded, etc.) before the machine readable instructions and/or the corresponding program(s) can be executed in whole or in part. Thus, machine readable media, as used herein, may include machine readable instructions and/or program(s) regardless of the particular format or state of the machine readable instructions and/or program(s) when stored or otherwise at rest or in transit.
The machine readable instructions described herein can be represented by any past, present, or future instruction language, scripting language, programming language, etc. For example, the machine readable instructions may be represented using any of the following languages: C, C++, Java, C#, Perl, Python, JavaScript, HyperText Markup Language (HTML), Structured Query Language (SQL), Swift, etc.
As mentioned above, the example process of
“Including” and “comprising” (and all forms and tenses thereof) are used herein to be open ended terms. Thus, whenever a claim employs any form of “include” or “comprise” (e.g., comprises, includes, comprising, including, having, etc.) as a preamble or within a claim recitation of any kind, it is to be understood that additional elements, terms, etc. may be present without falling outside the scope of the corresponding claim or recitation. As used herein, when the phrase “at least” is used as the transition term in, for example, a preamble of a claim, it is open-ended in the same manner as the term “comprising” and “including” are open ended. The term “and/or” when used, for example, in a form such as A, B, and/or C refers to any combination or subset of A, B, C such as (1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, (6) B with C, and (7) A with B and with C. As used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. Similarly, as used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. As used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. Similarly, as used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B.
As used herein, singular references (e.g., “a”, “an”, “first”, “second”, etc.) do not exclude a plurality. The term “a” or “an” item, as used herein, refers to one or more of that item. The terms “a” (or “an”), “one or more”, and “at least one” can be used interchangeably herein. Furthermore, although individually listed, a plurality of means, elements or method actions may be implemented by, e.g., a single unit or processor. Additionally, although individual features may be included in different examples or claims, these may possibly be combined, and the inclusion in different examples or claims does not imply that a combination of features is not feasible and/or advantageous.
At block 304, the example adjustment factor analyzer 134 determines an actual sign-in rate for the identified AME panelists. For example, the example sign-in rate calculator 206 of
At block 306, the example adjustment factor analyzer 134 determines a 0% audience adjustment factor (e.g., a zero SOR audience adjustment factor). For example, the example adjustment factor generator 208 of
At block 308, the example adjustment factor analyzer 134 determines a 100% audience adjustment factor (e.g., a full SOR audience adjustment factor). For example, the example adjustment factor generator 208 determines the full SOR audience adjustment factor for a sign-out rate of 100%. In some examples, the 100% audience adjustment factor corresponds to at least one of a misattribution adjustment factor or a non-coverage adjustment factor.
At block 310, the example adjustment factor analyzer 134 generates a weighted zero SOR audience adjustment factor and a weighted full SOR audience adjustment factor. For example, the example adjustment factor analyzer 134 weights the zero SOR audience adjustment factor and the full SOR audience adjustment factor based on the actual sign-in rate to determine the weighted zero SOR audience adjustment factor and the weighted full SOR audience adjustment factor. In some examples, the example weighting controller 210 of
At block 312, the example adjustment factor analyzer 134 generates a signed-out adjustment factor based on the weighted zero SOR audience adjustment factor and the weighted full SOR audience adjustment factor. For example, the example adjustment factor generator 208 combines the weighted zero SOR and full SOR audience adjustment factors to define the signed-out adjustment factor. In some examples, the example adjustment factor generator 208 determines a signed-out adjustment factor for each demographic pair (e.g., as shown in Table 3 above).
At block 314, the example adjustment factor analyzer 134 provides the signed-out adjustment factor to the adjustment factors database 136. For example, the example update controller 212 of
At block 316, the example adjustment factor analyzer 134 reduces error based on the signed-out adjustment factor. For example, the error in the aggregated campaign data 150 of
At block 318, the example adjustment factor analyzer 134 determines whether to update the signed-out adjustment factor. In some examples, the example update controller 212 determines that the signed-out adjustment factor is to be updated in response to the example input interface 202 receiving and/or otherwise obtaining new AME panel data and/or new database proprietor impressions data from the AME intermediary merged data 130. For example, in response to the example update controller 212 determining that the signed-out adjustment factor is to be updated (e.g., block 318 returns a result of YES), control returns to block 302. Alternatively, in response to the example update controller 212 determining that the signed-out adjustment factor is not to be updated (e.g., block 318 returns a result of NO), the example instructions 300 of
The processor platform 400 of the illustrated example includes a processor 412. The processor 412 of the illustrated example is hardware. For example, the processor 412 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor based (e.g., silicon based) device. In this example, the processor implements the example input interface 202, the example panelist detector 204, the example sign-in rate calculator 206, the example adjustment factor generator 208, the example weighting controller 210, and the example update controller 212.
The processor 412 of the illustrated example includes a local memory 413 (e.g., a cache). The processor 412 of the illustrated example is in communication with a main memory including a volatile memory 414 and a non-volatile memory 416 via a bus 418. The volatile memory 414 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory (RDRAM®) and/or any other type of random access memory device. The non-volatile memory 416 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 414, 416 is controlled by a memory controller.
The processor platform 400 of the illustrated example also includes an interface circuit 420. The interface circuit 420 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), a Bluetooth® interface, a near field communication (NFC) interface, and/or a PCI express interface.
In the illustrated example, one or more input devices 422 are connected to the interface circuit 420. The input device(s) 422 permit(s) a user to enter data and/or commands into the processor 412. The input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, isopoint and/or a voice recognition system.
One or more output devices 424 are also connected to the interface circuit 420 of the illustrated example. The interface circuit 420 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor.
The interface circuit 420 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 426. The communication can be via, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, etc.
The processor platform 400 of the illustrated example also includes one or more mass storage devices 428 for storing software and/or data. Examples of such mass storage devices 428 include floppy disk drives, hard drive disks, compact disk drives, Blu-ray disk drives, redundant array of independent disks (RAID) systems, and digital versatile disk (DVD) drives.
The machine executable instructions 432 of
From the foregoing, it will be appreciated that example methods, apparatus and articles of manufacture have been disclosed that enable the generation of accurate and reliable audience measurement metrics for Internet-based media without the use of third-party cookies and/or tags that have been the standard approach for monitoring Internet media for many years. This is accomplished by merging AME panel data with database proprietor impressions data within a privacy-protected cloud based environment. The nature of the cloud environment and the privacy constraints imposed thereon as well as the nature in which the database proprietor collects the database proprietor impression data present technological challenges contributing to limitations in the reliability and/or completeness of the data. However, examples disclosed herein overcome these difficulties by generating adjustment factors based on the AME panel data. The disclosed methods, apparatus and articles of manufacture are accordingly directed to one or more improvement(s) in the functioning of a computer.
Example 1 includes an apparatus including a panelist detector to identify audience measurement panelists associated with database proprietor accounts, the audience measurement panelists from merged data, the merged data including panel data collected by a first server of an audience measurement entity and database proprietor impressions data collected by a second server of a database proprietor based on network communications from client devices, the merged data stored in computer memory of a privacy-protected cloud environment, the database proprietor impressions data including a computer-generated error based on the network communications from the client devices, a sign-in rate calculator to determine an actual sign-in rate of the audience measurement panelists based on first impressions represented in the database proprietor impressions data and second impressions represented in the panel data, the first impressions corresponding to access to media via the client devices, an adjustment factor generator to determine a first audience adjustment factor corresponding to a first sign-out rate and a second audience adjustment factor corresponding to a second sign-out rate, and a weighting controller to generate a first weighted audience adjustment factor and a second weighted audience adjustment factor by weighting the first and second audience adjustment factors by the actual sign-in rate, the adjustment factor generator to determine a signed-out adjustment factor based on the first and second weighted audience adjustment factors.
Example 2 includes the apparatus of Example 1, where the first and second audience adjustment factors are misattribution adjustment factors, the computer-generated error resulting from misattribution of one of the first impressions to a first user when the one of the first impressions corresponds to a second user different from the first user.
Example 3 includes the apparatus of Example 1, where the first and second audience adjustment factors are non-coverage adjustment factors, the computer-generated error resulting from non-coverage of one or more of the first impressions corresponding to a user for which demographics are not identifiable by the second server of the database proprietor.
Example 4 includes the apparatus of Example 1, where the second impressions correspond to total impressions logged by the audience measurement entity for the audience measurement panelists.
Example 5 includes the apparatus of Example 1, where the first impressions occur when the audience measurement panelists are signed-in to corresponding ones of the database proprietor accounts.
Example 6 includes the apparatus of Example 1, where the actual sign-in rate is a percentage corresponding to a first number of the first impressions relative to a second number of the second impressions.
Example 7 includes the apparatus of Example 1, where the first sign-out rate is zero percent and the second sign-out rate is one hundred percent, the first sign-out rate corresponding to the audience measurement panelists being signed-in to the database proprietor accounts during accesses to the media, the second sign-out rate corresponding to the audience measurement panelists not being signed-in to the database proprietor accounts during one or more of the accesses to the media.
Example 8 includes the apparatus of Example 1, and further includes an update controller to store the signed-out adjustment factor in a database, the adjustment factor generator to determine a second signed-out adjustment factor in response to receiving at least one of second panel data or second database proprietor impressions data from the merged data.
Example 9 includes the apparatus of Example 8, where the update controller applies the signed-out adjustment factor to the database proprietor impressions data to reduce the computer-generated error.
Example 10 includes an apparatus including memory, and processor circuitry to execute computer readable instructions to at least identify audience measurement panelists associated with database proprietor accounts, the audience measurement panelists from merged data, the merged data including panel data collected by a first server of an audience measurement entity and database proprietor impressions data collected by a second server of a database proprietor based on network communications from client devices, the merged data stored in computer memory of a privacy-protected cloud environment, the database proprietor impressions data including a computer-generated error based on the network communications from the client devices, determine an actual sign-in rate of the audience measurement panelists based on first impressions represented in the database proprietor impressions data and second impressions represented in the panel data, the first impressions corresponding to access to media via the client devices, determine a first audience adjustment factor corresponding to a first sign-out rate and a second audience adjustment factor corresponding to a second sign-out rate, generate a first weighted audience adjustment factor and a second weighted audience adjustment factor by weighting the first and second audience adjustment factors by the actual sign-in rate, and determine a signed-out adjustment factor based on the first and second weighted audience adjustment factors.
Example 11 includes the apparatus of Example 10, where the first and second audience adjustment factors are misattribution adjustment factors, the computer-generated error resulting from misattribution of one of the first impressions to a first user when the one of the first impressions corresponds to a second user different from the first user.
Example 12 includes the apparatus of Example 10, where the first and second audience adjustment factors are non-coverage adjustment factors, the computer-generated error resulting from non-coverage of one or more of the first impressions corresponding to a user for which demographics are not identifiable by the second server of the database proprietor.
Example 13 includes the apparatus of Example 10, where the second impressions correspond to total impressions logged by the audience measurement entity for the audience measurement panelists.
Example 14 includes the apparatus of Example 10, where the first impressions occur when the audience measurement panelists are signed-in to corresponding ones of the database proprietor accounts.
Example 15 includes the apparatus of Example 10, where the actual sign-in rate is a percentage corresponding to a first number of the first impressions relative to a second number of the second impressions.
Example 16 includes the apparatus of Example 10, where the first sign-out rate is zero percent and the second sign-out rate is one hundred percent, the first sign-out rate corresponding to the audience measurement panelists being signed-in to the database proprietor accounts during accesses to the media, the second sign-out rate corresponding to the audience measurement panelists not being signed-in to the database proprietor accounts during one or more of the accesses to the media.
Example 17 includes the apparatus of Example 10, where the processor circuitry is to execute the computer readable instructions to store the signed-out adjustment factor in a database, the processor circuitry to determine a second signed-out adjustment factor in response to receiving at least one of second panel data or second database proprietor impressions data from the merged data.
Example 18 includes the apparatus of Example 10, where the processor circuitry is to execute the computer readable instructions to apply the signed-out adjustment factor to the database proprietor impressions data to reduce the computer-generated error.
Example 19 includes a non-transitory computer readable medium comprising instructions that, when executed, cause processor circuitry to at least identify audience measurement panelists associated with database proprietor accounts, the audience measurement panelists from merged data, the merged data including panel data collected by a first server of an audience measurement entity and database proprietor impressions data collected by a second server of a database proprietor based on network communications from client devices, the merged data stored in computer memory of a privacy-protected cloud environment, the database proprietor impressions data including a computer-generated error based on the network communications from the client devices, determine an actual sign-in rate of the audience measurement panelists based on first impressions represented in the database proprietor impressions data and second impressions represented in the panel data, the first impressions corresponding to access to media via the client devices, determine a first audience adjustment factor corresponding to a first sign-out rate and a second audience adjustment factor corresponding to a second sign-out rate, generate a first weighted audience adjustment factor and a second weighted audience adjustment factor by weighting the first and second audience adjustment factors by the actual sign-in rate, and determine a signed-out adjustment factor based on the first and second weighted audience adjustment factors.
Example 20 includes the non-transitory computer readable medium of Example 19, where the first and second audience adjustment factors are misattribution adjustment factors, the computer-generated error resulting from misattribution of one of the first impressions to a first user when the one of the first impressions corresponds to a second user different from the first user.
Example 21 includes the non-transitory computer readable medium of Example 19, where the first and second audience adjustment factors are non-coverage adjustment factors, the computer-generated error resulting from non-coverage of one or more of the first impressions corresponding to a user for which demographics are not identifiable by the second server of the database proprietor.
Example 22 includes the non-transitory computer readable medium of Example 19, where the second impressions correspond to total impressions logged by the audience measurement entity for the audience measurement panelists.
Example 23 includes the non-transitory computer readable medium of Example 19, where the first impressions occur when the audience measurement panelists are signed-in to corresponding ones of the database proprietor accounts.
Example 24 includes the non-transitory computer readable medium of Example 19, where the actual sign-in rate is a percentage corresponding to a first number of the first impressions relative to a second number of the second impressions.
Example 25 includes the non-transitory computer readable medium of Example 19, where the first sign-out rate is zero percent and the second sign-out rate is one hundred percent, the first sign-out rate corresponding to the audience measurement panelists being signed-in to the database proprietor accounts during accesses to the media, the second sign-out rate corresponding to the audience measurement panelists not being signed-in to the database proprietor accounts during one or more of the accesses to the media.
Example 26 includes the non-transitory computer readable medium of Example 19, where the instructions, when executed, cause the processor circuitry to store the signed-out adjustment factor in a database, the processor circuitry to determine a second signed-out adjustment factor in response to receiving at least one of second panel data or second database proprietor impressions data from the merged data.
Example 27 includes the non-transitory computer readable medium of Example 19, where the instructions, when executed, cause the processor circuitry to apply the signed-out adjustment factor to the database proprietor impressions data to reduce the computer-generated error.
Example 28 includes a method including identifying audience measurement panelists associated with database proprietor accounts, the audience measurement panelists from merged data, the merged data including panel data collected by a first server of an audience measurement entity and database proprietor impressions data collected by a second server of a database proprietor based on network communications from client devices, the merged data stored in computer memory of a privacy-protected cloud environment, the database proprietor impressions data including a computer-generated error based on the network communications from the client devices, determining an actual sign-in rate of the audience measurement panelists based on first impressions represented in the database proprietor impressions data and second impressions represented in the panel data, the first impressions corresponding to access to media via the client devices, determining a first audience adjustment factor corresponding to a first sign-out rate and a second audience adjustment factor corresponding to a second sign-out rate, generating a first weighted audience adjustment factor and a second weighted audience adjustment factor by weighting the first and second audience adjustment factors by the actual sign-in rate, and determining a signed-out adjustment factor based on the first and second weighted audience adjustment factors.
Example 29 includes the method of Example 28, where the first and second audience adjustment factors are misattribution adjustment factors, the computer-generated error resulting from misattribution of one of the first impressions to a first user when the one of the first impressions corresponds to a second user different from the first user.
Example 30 includes the method of Example 28, where the first and second audience adjustment factors are non-coverage adjustment factors, the computer-generated error resulting from non-coverage of one or more of the first impressions corresponding to a user for which demographics are not identifiable by the second server of the database proprietor.
Example 31 includes the method of Example 28, where the second impressions correspond to total impressions logged by the audience measurement entity for the audience measurement panelists.
Example 32 includes the method of Example 28, where the first impressions occur when the audience measurement panelists are signed-in to corresponding ones of the database proprietor accounts.
Example 33 includes the method of Example 28, where the actual sign-in rate is a percentage corresponding to a first number of the first impressions relative to a second number of the second impressions.
Example 34 includes the method of Example 28, where the first sign-out rate is zero percent and the second sign-out rate is one hundred percent, the first sign-out rate corresponding to the audience measurement panelists being signed-in to the database proprietor accounts during accesses to the media, the second sign-out rate corresponding to the audience measurement panelists not being signed-in to the database proprietor accounts during one or more of the accesses to the media.
Example 35 includes the method of Example 28, and further includes determining a second signed-out adjustment factor in response to receiving at least one of second panel data or second database proprietor impressions data from the merged data.
Example 36 includes the method of Example 28, and further includes applying the signed-out adjustment factor to the database proprietor impressions data to reduce the computer-generated error.
Although certain example methods, apparatus and articles of manufacture have been disclosed herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus and articles of manufacture fairly falling within the scope of the claims of this patent.
This patent arises from a non-provisional patent application that claims the benefit of U.S. Provisional Patent Application No. 63/024,260, which was filed on May 13, 2020. U.S. Provisional Patent Application No. 63/024,260 is hereby incorporated herein by reference in its entirety. Priority to U.S. Provisional Patent Application No. 63/024,260 is hereby claimed. Additionally, U.S. patent application Ser. No. 17/316,168, entitled “METHODS AND APPARATUS TO GENERATE COMPUTER-TRAINED MACHINE LEARNING MODELS TO CORRECT COMPUTER-GENERATED ERRORS IN AUDIENCE DATA,” which was filed on May 10, 2021, U.S. patent application Ser. No. 17/317,404, entitled “METHODS AND APPARATUS TO GENERATE AUDIENCE METRICS USING THIRD-PARTY PRIVACY-PROTECTED CLOUD ENVIRONMENTS,” which was filed on May 11, 2021, U.S. patent application Ser. No. 17/317,461, entitled “METHODS AND APPARATUS FOR MULTI-ACCOUNT ADJUSTMENT IN THIRD-PARTY PRIVACY-PROTECTED CLOUD ENVIRONMENTS,” which was filed on May 11, 2021, U.S. patent application Ser. No. 17/317,616, entitled “METHODS AND APPARATUS TO GENERATE AUDIENCE METRICS USING THIRD-PARTY PRIVACY-PROTECTED CLOUD ENVIRONMENTS,” which was filed on May 11, 2021, U.S. patent application Ser. No. ______ (Attorney Docket No. 20004/81256312US01), entitled “METHODS AND APPARATUS TO GENERATE AUDIENCE METRICS USING THIRD-PARTY PRIVACY-PROTECTED CLOUD ENVIRONMENTS,” which was filed on May 12, 2021, and U.S. patent application Ser. No. ______ (Attorney Docket No. 81242158U502), entitled “METHODS AND APPARATUS TO ADJUST DEMOGRAPHIC INFORMATION OF USER ACCOUNTS TO REFLECT PRIMARY USERS OF THE USER ACCOUNTS,” which was filed on May 12, 2021, are hereby incorporated herein by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
63024260 | May 2020 | US |