Embodiments recognize advertising and ad insertion in television. Since its inception, television has been used to show product advertisements. In its modern form, advertising occurs during breaks over the duration of a show. In the U.S., advertising rates are determined primarily by Nielsen ratings, an audience measurement system that uses statistical sampling to estimate viewership. Nielsen uses indirect means to estimate viewership, as they only record the time and channel the TV is tuned to, but have no techniques to determine whether viewers were actually present.
The Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Embodiments contemplate the design of a system for ad impression verification in adaptive multimedia delivery systems employing adaptation to user behavior and viewing conditions.
One or more embodiments described herein can be used in multimedia delivery systems for mobile devices (e.g., smart phones, tablets, laptops) and home devices such as set-top boxes, streaming devices (e.g., Chromecast, Roku, Apple TV), gaining consoles (e.g., XBox and PlayStation), consumer/commercial TVs and SmartTVs, and Personal Computers. One or more embodiments may support the use of existing multimedia delivery frameworks including, but not limited to, IPTV, progressive download, bandwidth adaptive streaming standards (such as MPEG and 3GPP DASH) and existing streaming technologies such as Apple's HTTP Live Streaming.
Embodiments contemplate detection, estimation, and/or adaptation to user presence, proximity and/or ambient lighting conditions. Embodiments also contemplate user proximity estimation based on input from sensors in mobile devices. Embodiments further contemplate volume control and/or audio bitstream selection based on an estimate of one or more of these parameters: user's location, age, gender, ambient noise level and/or multiple users. Also, embodiments contemplate detection, estimation and/or adaptation to user presence and/or attention to advertisements delivered via various mechanisms, perhaps at various locations.
Embodiments contemplate one or more techniques for determining a media. content impression, where the media content may be communicated via a communication network to a client device. Techniques may include receiving, from the client device, a first data that may correspond to a user proximate to the client device during a period of time. Techniques may also include receiving, from the client device, a second data corresponding to a state of the client device during the period of time. Techniques may include receiving, from the client device, an indication of at least one specific media content presented by the client device during the period of time. Techniques may also include determining a measurement of a user impression of the at least one specific media content based on the first data and the second data. The measurement of the user impression may provide an indication of a user attention to the at least one specific media content during the period of time.
Embodiments contemplate a wireless transmit/receive unit (WTRU) in communication with a wireless communication network. The WTRU may comprise a processor that may be configured to identify a first data corresponding to a user proximate to the WTRU during a period of time. The processor may be configured to identify a second data corresponding to a state of the WIRE during the period of time. The processor may be configured to determine at least one specific media content presented by the WTRU during the period of time. The processor may be configured to determine a measurement of a user impression of the at least one specific media content based on the first data and the second data. The measurement of the user impression may provide an indication of a user attention to the at least one specific media content during the period of time.
Embodiments contemplate one or more techniques for modifying a media content, where the media content may be communicated via a communication network to a client device. Techniques may include receiving, from the client device, a first data corresponding to a user proximate to the client device during a period of time. Techniques may include receiving, from the client device, an indication of at least one specific media content presented by the client device during the period of time. Techniques may include determining an adjustment of the at least one specific media content based on the first data. The adjustment may form an adjusted specific media content. Techniques may include providing the adjusted specific media content to the client device during at least one of: the period of time or another period of time.
A more detailed understanding may be had from the following description, given by way of example in conjunction with the accompanying drawings wherein:
A detailed description of illustrative embodiments will now be described with reference to the various Figures. Although this description provides a detailed example of possible implementations, it should be noted that the details are intended to be exemplary and in no way limit the scope of the application. As used herein, the articles “a” and “an”, absent further qualification or characterization, may be understood to mean “one or more” or “at least one”, for example.
As shown in
The communications systems 100 may also include a base station 114a and a base station 114b. Each of the base stations 114a, 114b may be any type of device configured to wirelessly interface with at least one of the WTRUs 102a, 102b, 102c, 102d to facilitate access to one or more communication networks, such as the core network 106/107/109, the Internet 110, and/or the networks 112. By way of example, the base stations 114a, 114b may be a base transceiver station (BTS), a Node-B, an eNode B, a Home Node B, a Home eNode B, a site controller, an access point (AP), a wireless router, and the like. While the base stations 114a, 114b are each depicted as a single element, it will be appreciated that the base stations 114a, 114b may include any number of interconnected base stations and/or network elements.
The base station 114a may be part of the RAN 103/104/105, which may also include other base stations and/or network elements (not shown), such as a base station controller (BSC), a radio network controller (RNC), relay nodes, etc. The base station 114a and/or the base station 114b may be configured to transmit and/or receive wireless signals within a particular geographic region, which may be referred to as a cell (not shown). The cell may further be divided into cell sectors. For example, the cell associated with the base station 114a may be divided into three sectors. Thus, in one embodiment, the base station 114a may include three transceivers, i.e., one for each sector of the cell. In another embodiment, the base station 114a may employ multiple-input multiple output (MIMO) technology and, therefore, may utilize multiple transceivers for each sector of the cell.
The base stations 114a, 114b may communicate with one or more of the WTRUs 102a, 102b, 102c, 102d over an air interface 115/116/117 which may be any suitable wireless communication link (e.g., radio frequency (RE), microwave, infrared (IR), ultraviolet (UV), visible light, etc.). The air interface 115/116/117 may be established using any suitable radio access technology (RAT).
More specifically, as noted above, the communications system 100 may be a multiple access system and may employ one or more channel access schemes, such as CDMA, TDMA, FDMA, OFDMA, SC-FDMA, and the like. For example, the base station 114a in the RAN 103/104/105 and the WTRUs 102a, 102b, 102c may implement a radio technology such as Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access (UTRA), which may establish the air interface 115/116/117 using wideband CDMA (WCDMA). WCDMA may include communication protocols such as High-Speed Packet Access (HSPA) and/or Evolved HSPA (HSPA+). HSPA may include High-Speed Downlink Packet Access (HSDPA) and/or High-Speed Uplink Packet Access (HSUPA).
In another embodiment, the base station 114a and the WTRUs 102a, 102b, 102c may implement a radio technology such as Evolved UMTS Terrestrial Radio Access (E-UTRA), which may establish the air interface 115/116/117 using Long Term Evolution (LTE) and/or LIE-Advanced (LTE-A).
In other embodiments, the base station 114a and the WTRUs 102a, 102b, 102c may implement radio technologies such as IEEE 802.16 (i.e., Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000, CDMA2000 1×, CDMA2000 EV-DO, interim Standard 2000 (IS-2000), Interim Standard 95 (IS-95), Interim Standard 856 (IS-856), Global System for Mobile communications (GSM), Enhanced Data rates for GSM Evolution (EDGE), GSM EDGE (GERAN), and the like.
The base station 114b in
The RAN 103/104/105 may be in communication with the core network 106/107/109, which may be any type of network configured to provide voice, data, applications, and/or voice over internet protocol (VoIP) services to one or more of the WTRUs 102a, 102h, 102c, 102d. For example, the core network 106/107/109 may provide call control, billing services, mobile location-based services, pre-paid calling, Internet connectivity, video distribution, etc., and/or perform high-level security functions, such as user authentication. Although not shown in
The core network 106/107/109 may also serve as a gateway for the WTRUs 102a, 102b, 102c, 102d to access the PSTN 108, the Internet 110, and/or other networks 112. The PSTN 108 may include circuit-switched telephone networks that provide plain old telephone service (POTS). The Internet 110 may include a global system of interconnected computer networks and devices that use common communication protocols, such as the transmission control protocol (TCP), user datagram protocol (UDP) and the internet protocol (IP) in the TCP/IP internet protocol suite. The networks 112 may include wired or wireless communications networks owned and/or operated by other service providers. For example, the networks 112 may include another core network connected to one or more RANs, which may employ the same RAT as the RAN 103/104/105 or a different RAT.
Some or all of the WTRUs 102a, 102b, 102c, 102d in the communications system 100 may include multi-mode capabilities, i.e., the WTRUs 102a, 102h, 102c, 102d may include multiple transceivers for communicating with different wireless networks over different wireless links. For example, the WTRU 102c shown in
The processor 118 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Array (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like. The processor 118 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the WTRU 102 to operate in a wireless environment. The processor 118 may be coupled to the transceiver 120, which may be coupled to the transmit/receive element 122. While
The transmit/receive element 122 may be configured to transmit signals to, or receive signals from, a base station (e.g., the base station 114a) over the air interface 115/116/117. For example, in one embodiment, the transmit/receive element 122 may be an antenna configured to transmit and/or receive RE signals. In another embodiment, the transmit/receive element 122 may be an emitter/detector configured to transmit and/or receive IR, UV, or visible light signals, for example. In yet another embodiment, the transmit/receive element 122 may be configured to transmit and receive both RF and light signals. It will be appreciated that the transmit/receive element 122 may be configured to transmit and/or receive any combination of wireless signals.
In addition, although the transmit/receive element 122 is depicted in
The transceiver 120 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 122 and to demodulate the signals that are received by the transmit/receive element 122. As noted above, the WTRU 102 may have multi-mode capabilities. Thus, the transceiver 120 may include multiple transceivers for enabling the WTRU 102 to communicate via multiple RATs, such as UTRA and IEEE 802.11, for example.
The processor 118 of the WTRU 102 may be coupled to, and may receive user input data from, the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128 (e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display unit). The processor 118 may also output user data to the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128. In addition, the processor 118 may access information from, and store data in, any, type of suitable memory, such as the non-removable memory 130 and/or the removable memory 132. The non-removable memory 130 may include random-access memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device. The removable memory 132 may include a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, and the like. In other embodiments, the processor 118 may access information from, and store data in, memory that is not physically located on the WTRU 102, such as on a server or a home computer (not shown).
The processor 118 may receive power from the power source 134, and may be configured to distribute and/or control the power to the other components in the WTRU 102. The power source 134 may be any suitable device for powering the WTRU 102. For example, the power source 134 may include one or more dry cell batteries (e.g., nickel-cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), etc.), solar cells, fuel cells, and the like.
The processor 118 may also be coupled to the GPS chipset 136, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the WTRU 102. In addition to, or in lieu of, the information from the GPS chipset 136, the WTRU 102 may receive location information over the air interface 115/116/117 from a base station (e.g., base stations 114a, 114b) and/or determine its location based on the timing of the signals being received from two or more nearby base stations. It will be appreciated that the WTRU 102 may acquire location information by way of any suitable location-determination method while remaining consistent with an embodiment.
The processor 118 may further be coupled to other peripherals 138, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity. For example, the peripherals 138 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, and the like.
As shown in
The core network 106 shown in
The RNC 142a in the RAN 103 may be connected to the MSC 146 in the core network 106 via an IuCS interface. The MSC 146 may be connected to the MGW 144. The MSC 146 and the MGW 144 may provide the WTRUs 102a, 102b, 102c with access to circuit-switched networks, such as the PSTN 108, to facilitate communications between the WTRUs 102a, 102b, 102c and traditional land-line communications devices.
The RNC 142a in the RAN 103 may also be connected to the SGSN 148 in the core network 106 via an IuPS interface. The SGSN 148 may be connected to the GGSN 150. The SGSN 148 and the GGSN 150 may provide the WTRUs 102a, 102h, 102c with access to packet-switched networks, such as the Internet 110, to facilitate communications between and the WTRUs 102a, 102b, 102c and IP-enabled devices.
As noted above, the core network 106 may also be connected to the networks 112, which may include other wired or wireless networks that are owned and/or operated by other service providers.
The RAN 104 may include eNode-Bs 160a, 160b. 160c, though it will be appreciated that the RAN 104 may include any number of eNode-Bs while remaining consistent with an embodiment. The eNode-Bs 160a, 160b, 160c may each include one or more transceivers for communicating with the WTRUs 102a, 102h, 102c over the air interface 116. In one embodiment, the eNode-Bs 160a, 160b, 160c may implement MIMO technology. Thus, the eNode-B 160a, for example, may use multiple antennas to transmit wireless signals to, and receive wireless signals from, the WTRU 102a.
Each of the eNode-Bs 160a, 160b, 160c may be associated with a particular cell (not shown) and may be configured to handle radio resource management decisions, handover decisions, scheduling of users in the uplink and/or downlink, and the like. As shown in
The core network 107 shown in
The MME 162 may be connected to each of the eNode-Bs 160a, 160b, 160c in the RAN 104 via an S1 interface and may serve as a control node. For example, the MME 162 may be responsible for authenticating users of the WTRUs 102a, 102b, 102c, bearer activation/deactivation, selecting a particular serving gateway during an initial attach of the WTRUs 102a, 102b, 102c, and the like. The MME 162 may also provide a control plane function for switching between the RAN 104 and other RANs (not shown) that employ other radio technologies, such as GSM or WCDMA.
The serving gateway 164 may be connected to each of the eNode-Bs 160a, 160b, 160c in the RAN 104 via the S1 interface. The serving gateway 164 may generally route and forward user data packets to/from the WTRUs 102a, 102b, 102c. The serving gateway 164 may also perform other functions, such as anchoring user planes during inter-eNode B handovers, triggering paging when downlink data is available for the WTRUs 102a, 102b, 102c, managing and storing contexts of the WTRUs 102a, 102b, 102c, and the like.
The serving gateway 164 may also be connected to the PDN gateway 166, which may provide the WTRUs 102a, 102b; 102c with access to packet-switched networks, such as the Internet 110, to facilitate communications between the WTRUs 102a, 102b, 102c and IP-enabled devices.
The core network 107 may facilitate communications with other networks. For example, the core network 107 may provide the WTRUs 102a, 102b, 102c with access to circuit-switched networks, such as the PSTN 108, to facilitate communications between the WTRUs 102a, 102b, 102c and traditional land-line communications devices. For example, the core network 107 may include, or may communicate with, an IP gateway (e.g., an IP multimedia subsystem (IMS) server) that serves as an interface between the core network 107 and the PSTN 108. In addition, the core network 107 may provide the WTRUs 102a, 102b, 102c with access to the networks 112, which may include other wired or wireless networks that are owned and/or operated by other service providers.
As shown in
The air interface 117 between the WTRUs 102a, 102b, 102c and the RAN 105 may be defined as an R1 reference point that implements the IEEE 802.16 specification. In addition, each of the WTRUs 102a, 102b, 102c may establish a logical interface (not shown) with the core network 109. The logical interface between the WTRUs 102a, 102b, 102c and the core network 109 may be defined as an R2 reference point, which may be used for authentication, authorization, IP host configuration management, and/or mobility management.
The communication link between each of the base stations 180a, 180b, 180c may be defined as an R8 reference point that includes protocols for facilitating WTRU handovers and the transfer of data between base stations. The communication link between the base stations 180a, 180b, 180c and the ASN gateway 182 may be defined as an R6 reference point. The R6 reference point may include protocols for facilitating mobility management based on mobility events associated with each of the WTRUs 102a, 102b, 102c.
As shown in
The MIP-HA may be responsible for IP address management, and may enable the WTRUs 102a, 102b, 102c to roam between different ASNs and/or different core networks. The MIP-HA 184 may provide the WTRUs 102a, 102b, 102c with access to packet-switched networks, such as the Internet 110, to facilitate communications between the WTRUs 102a, 102b, 102c and IP-enabled devices. The AAA server 186 may be responsible for user authentication and for supporting user services. The gateway 188 may facilitate interworking with other networks. For example, the gateway 188 may provide the WTRUs 102a, 102b, 102c with access to circuit-switched networks, such as the PSTN 108, to facilitate communications between the WTRUs 102a, 102b, 102c and traditional land-line communications devices. In addition, the gateway 188 may provide the WTRUs 102a, 102b, 102c with access to the networks 112, which may include other wired or wireless networks that are owned and/or operated by other service providers.
Although not shown in
Embodiments contemplate viewing conditions adaptive multimedia delivery. Embodiments contemplate a system for multimedia delivery system which may use information about a user's viewing conditions to adapt encoding and/or a delivery process, perhaps for example to minimize usage of network bandwidth, power, and/or other system resources. The system may use sensors (e.g., front-faced camera, ambient light sensor, accelerometer, etc.) of the user equipment (e.g., smart phone or tablet) to detect the presence of the viewer. The adaptation system may use this information to determine parameters of visual content that a viewer may be able to see, and may adjust encoding and delivery parameters accordingly. This adaptation mechanism may allow the delivery system to achieve an improved (e.g., best) possible user experience, while perhaps saving network bandwidth and/or other system resources. Embodiments contemplate detection and/or adaptation to a user presence, perhaps using one or more sets of techniques to accommodate one or more sets of sensors (e.g., IR remote control, range finder, TV camera, smart phone or tablets used as remote controls and/or second screens, etc.) and/or capabilities available at home. A high-level diagram of an example bandwidth adaptive multimedia system for delivering content on a mobile and/or a home device is shown in
Embodiments contemplate that user presence, proximity to screen, and/or attention to video content can be established, perhaps using built-in sensors (camera accelerometer, etc.) in mobile devices and/or using built-in sensors in TV, set-top box, remote control, or other TV-attached devices (game consoles, Kinect, etc.) in home environment, among other environments. Information about user presence and/or proximity can be used to optimize multimedia delivery.
Embodiments recognize advertising and/or ad insertion in television. Since its inception, television has been used to show product advertisements. In its modern form, advertising occurs during breaks over the duration of a show. In the U.S., advertising rates are determined primarily by Nielsen ratings, which is an audience measurement system that uses statistical sampling to estimate viewership. The Nielsen system uses indirect means to estimate viewership, as it only records the time and channel to which the TV is tuned. But the Nielsen ratings have no techniques to determine whether viewers were actually present or how viewers may be responding to what they are seeing.
TV networks may distribute content to local affiliates and cable TV providers nationwide. These TV streams may carry advertisements meant to be shown at the national level, but also may allow for regional and/or local ads to be inserted in the stream. In analog TV, in-band dual-tone multi-frequency (DTMF) subcarrier audio “cue tones” may be used to trigger the cutover from a show and/or national ad to regional and/or local ads. In digital TV (e.g., IPTV), embodiments recognize that the Society of Cable Telecommunications Engineers (SCTE) has developed a set of standards for digital program insertion (e.g., SCTE 30 and 35) that may be used to (e.g., seamlessly) insert ads in TV systems by means of digital “cue messages”, as shown in
Embodiments recognize online advertising and ad insertion in Digital Media Delivery. A large number of web sites hosting media content (e.g., YouTube, Hulu, Facebook, CBS, Yahoo, etc.) may obtain revenue by showing advertisements to users during a multimedia delivery session (e.g., progressive download or streaming). Ads may be shown at the beginning (“pre-roll”), end (“post-roll”), and/or during (“mid-roll”) the delivery session. There may be certain rules that may be inserted to alter a user's control of the playback, perhaps for example when a video ad may be rendered, among other scenarios. For example, users may be prevented from skipping and/or fast-forwarding through the ad.
Embodiments recognize one or more different models by which advertisers may compensate web publishers for inserting ads in their content. In the “CPM” model, advertisers may pay for every thousand displays of their message to potential customers (e.g., Cost Per M, where M is Roman numeral standing for thousand). One or more, or each instance when an ad was displayed may be called an “impression” and the accuracy of counting and/or verification of such impressions may be useful to gauge. Embodiments contemplate that an impression that can be verified as one that was watched by the viewer might be worth more than an impression that may have no certainty of reaching the viewer's attention. Embodiments recognize other compensation models such as the “cost per click” (“CPC”) and/or the “cost per action” (“CPA”) models.
Embodiments contemplate Online Advertising and Ad Impression Verification. Embodiments recognize that a number of agencies and associations measure ad impressions and to develop techniques that measure ad impressions. Some are:
Embodiments recognize that the IAB describes a detailed set of methods and common practices for ad verification, although it focuses on techniques related to image ads, such as determining whether an ad has been served (e.g., using cookies or invisible/transparent images), whether the page with ads was requested by a human (to prevent fraud by inflating the number of impressions), or by determining the location of an ad within a web page (e.g., visible by the user on page load, referred to as “above the fold”).
In broadcast and cable TV, embodiments recognize that it presently might not be possible to verify ad impressions in a direct manner because there is no built-in feedback mechanism in the content delivery system (e.g., via a content delivery network (CDN)). In video streaming for laptops and PCs with internee connection, embodiments recognize that some attempts have been made to determine user presence by serving ads only when a user is active by using the mouse or the keyboard to make such a determination.
Embodiments recognize Targeted Online Advertisements. Targeted advertising is a type of advertising whereby advertisements may be placed so as to reach consumers based on various traits such as demographics, psychographics, behavioral variables (e.g., such as product purchase history), or other second-order activities which may serve as a proxy for these consumer traits. Embodiments recognize that most targeted new media advertising currently uses second-order proxies for targeting, such as tracking online or mobile web activities of consumers, associating historical webpage consumer demographics with new consumer web page access, using a search word as the basis for implied interest, and/or contextual advertising.
Addressable advertising systems may serve ads directly based on demographic, psychographic, and/or behavioral attributes that may be associated with the consumer(s) exposed to the ad. These systems may be digital and/or may be addressable (and in some embodiments perhaps must be addressable) in that the end point which may serve the ad (e.g., set-top box, website, or digital sign) may be capable of rendering an ad independently of any other end points, perhaps based on consumer attributes specific to that end point at the time the ad is served, among other factors. Addressable advertising systems may use consumer traits associated with the end point or end points as the basis for selecting and/or serving ads.
Embodiments recognize Demographic Estimation. The value of targeted advertisements may be substantially greater than network wide ads. The specificity with which the targeting is performed may be useful. Embodiments recognize techniques for estimation of age from facial stills. Embodiments recognize approaches to estimating other anthropometric parameters such as race, ethnicity, etc. These techniques may rely on image data as an input, perhaps for example in order to estimate demographic/anthropometric parameters. There are also approaches to demographic age estimation based on other sensor inputs, such as for example accelerometers, gyroscopes, IR cameras, etc.
Embodiments recognize that accelerometers may be used to monitor a user's essential physiological kinetic tremor which has characteristics that may correlate to age. Embodiments recognize the use of a smart phone platform for tremor parameter estimation. Other sensors (e.g., gyroscope) may also be used to obtain and/or complement this information. Additional demographic data, the accelerometer data may be mined for gender, height, and/or weight.
Embodiments contemplate that detection of user presence, his/her attention to visual content, and/or demographic and/or anthropometric information can be useful for introducing a new (e.g., heretofore undefined) category of ad impressions “certified ad impressions” (CAI) (a phrase used for explanation and not limitation), which can provide amore accurate basis for measuring effectiveness and/or successful reach of ads to target markets and/or derivation of compensation for their placements. Embodiments contemplate one or more techniques by which such certified ad impressions (CAT) can be obtained and/or used in systems for delivery of content (e.g., visual content) to the end users.
The techniques described herein may be used separately or in any combination. In some embodiments, the respective techniques may result in varying degrees of certainty of ad verification. The degree of certainty may also be computed and/or reported by the ad impression verification system. One or more embodiments described herein contemplate details on the information that clients may generate to enable ad impression verification. Embodiments contemplate client-side techniques as well as server-side techniques.
One or more embodiments contemplate client-side solutions. In one or more embodiments, user presence detection may be performed at the reproduction end 3002, as shown in the example of
Referring to
One or more embodiments contemplate server-side solutions. Some embodiments contemplate techniques for user presence detection that might not require any changes to the multimedia client.
One or more embodiments may be used in a variety of multimedia delivery frameworks, including but not limited to IPTV, progressive download, and/or bandwidth adaptive streaming. One or more embodiments may also be used with existing cable TV (or even broadcast TV) by capturing user presence detection information (e.g., in a set-top box or other device) and, either continuously and/or periodically (e.g., daily or weekly) uploading this information via the Internet or other data network to an ad agency server.
One or more embodiments contemplate using camera and/or IR imaging devices in reproduction devices. In one or more embodiments, it may be assumed that a mobile and/or home multimedia device (television, monitor, or set top box may include a provision for monitoring viewers that are within the field of view of a camera(s). A picture (or series of pictures) may be taken using the camera(s), followed by application of computer vision tools (e.g., face and facial feature detectors) for detecting the presence and/or demographics of viewers.
Embodiments contemplate that specific tools for user presence and/or attention detection can include face detection algorithms (e.g. Viola-Jones framework). Certain human body features—such as eyes, nose, etc., may be further detected and/or used for increasing assurance that a detected user is facing the screen while an ad is being played. Eye tracking techniques may be used to ensure viewers are actually watching the screen. The duration of time for which a user was detected facing the screen during ad playback can be used as a component metric of a user's interest/attention to the ad content. Human body feature detection and/or eye tracking may be used, perhaps to further improve accuracy of results among other reasons.
Techniques like face detection and/or human body feature detection may return the detection result, perhaps along with the probability that the detection is correct. In particular, face detection algorithms may be sensitive to occlusion (e.g., part of the face is not visible), illumination, and/or expression. Some face detection implementations may provide probability as part of their results. For example, Android's face detection API returns a confidence factor between 0 and 1 which indicates how certain what has been found is actually a face. This is also the case for OpenCV's face detection API. Embodiments contemplate that this probability may be used by the ad verification system to classify and/or rank the results and/or take further actions (e.g., bill high probability results at a higher rate).
Embodiments recognize techniques for demographic data estimation. In some embodiments, perhaps following an ad impression, among other scenarios, verification of the ad impression and/or estimated user demographics, e.g., age, gender, ethnicity, etc., may be passed to the ad agency via the content server or directly to an agency server. This information may be used by the agency to assess whether their ads are reaching their desired target market segment.
In some embodiments, it may be possible to use advanced computer vision techniques for recognizing emotion from facial expressions. The results for emotion may also be reported to the ad verification system where they could be used to determine the impact of an ad campaign.
One or more embodiments may be used with certain TVs and/or gaming consoles (e.g., Xbox/Kinect) that may be equipped with cameras and/or IR laser and/or sensors for gesture recognition. In such scenarios, the functions of user presence detection and/or pose estimation may already be implemented by gaming consoles and this information may be used as input.
In one or more embodiments, a “User Presence Result” that may be sent back by the client may contain one or more of the items listed below. Additional information (e.g., anthropometric, biometric and/or emotional state) obtained using techniques described herein may also be part of the report.
Time, date, channel and/or content being watched;
Whether user presence was detected (e.g., true or false);
Confidence level and/or probability of accuracy of user presence detection; and/or
Estimated demographics data (e.g., if available).
Embodiments recognize privacy concerns by some users. The concern has no technical basis—as imaging devices are not really used to record anything. This concern may gradually disappear as more and more TV devices using cameras for gesture recognitions and gaming enter society. Embodiments contemplate one or more techniques to manage privacy concerns:
Assurance that only non-personal/non user identifying information may be shared; and/or
The front facing camera may be disabled altogether.
In one or more embodiments it may be assumed that a mobile device and/or gaming console control contains a set of sensors capable of detecting movement (e.g., accelerometer, gyroscope). Embodiments recognize the use of an accelerometer to classify the viewing position of a smart phone or tablet, for example: a user is holding the device in hand, the device is on the user's lap (for tablets), the user is in motion, the device has been placed on a stand, or on a table facing up/down. The information of the viewing position may be sent to the ad server and/or content provider where it may be used to verify ad impression. Advertisers may use this information differently. For example, some may verify an ad impression if the user is holding a device in hand (e.g., perhaps only if so), while others may charge different rates depending on the viewing position.
User presence may also be determined by using a microphone, touch sensors, and/or proximity sensors, etc. More uses of sensors are contemplated. For example, one or more of:
One or more embodiments may be used in a home environment, for example as mobile devices are now being used as remote controls for TVs. Similarly, mobile devices may also be used as second screens for delivering video content and/or supplementary information (e.g. scheduling information, program metadata, and/or advertisements) from the Internet and/or by cable TV providers. In such scenarios, among others, sensors may be used to determine user presence. Embodiments contemplate that age estimation can be performed in a number of ways. Gender, height, and/or weight may be estimated in a number of ways as well.
The estimated user age and gender may be passed to the ad agency via the content server or directly to an agency server, perhaps following an ad impression, and/or perhaps in addition to verification of ad impression, among other scenarios. This information may be used by the agency to assess whether their ads are reaching their desired target market segment. A flowchart of an example technique is shown in
Embodiments contemplate inferring a user's state/activity from his/her input. In one or more embodiments, it may be assumed that the mobile and/or home multimedia device has capabilities for detecting user activity, such as touching the screen to control the media (volume, fast forward, pause or rewind, etc.) and/or by operating a remote control. It can be established that a user is present, perhaps for example when the interaction occurs. That type of interaction may be reported to the ad server and/or content provider, where for example it may be used to verify ad impression.
One or more embodiments contemplate adapting the ads based on detected user activity. For example, the user might be multi-tasking and/or the video window that shows ads may be minimized. This information may be reported back to the ad tracking server, perhaps for example when this type of user activity may be detected, and perhaps so that the ad may be made to become more interesting to get the user's attention. The adaptation may be done in real-time and/or after some period of time (e.g., after an activity analysis period, an ad impression analysis period, and/or at a later presentation of the advertisement). An example implementation of such a user presence detection is illustrated in
Embodiments contemplate using input from microphones. Some TV and gaming consoles come equipped with external or built-in microphones and/or some may use accessories such as a Skype camera that conic equipped with a microphone array. The microphones may be used to capture the viewer's speech, which could then be used to determine user presence. Some recent TVs (e.g., Samsung 2013 TV with “Smart Interaction”) can perform speech recognition requiring the user to speak into the remote control. In some embodiments, perhaps if speech recognition were to be done on the TV set itself, among other scenarios, this may be also be used in determining user presence. Such techniques may be complementary to other techniques described herein, perhaps to further improve the accuracy of determining user presence, among other reasons.
Embodiments contemplate inference of a user presence by analysis of multimedia traffic. One or more embodiments described herein may include detection of the user at the reproduction end (e.g., client-side) and signaling of this information to an ad-verification server. A factor in such embodiments may be a user's privacy concerns, in that a user's presence may be identified at the premises where the user is located (e.g., home or office) and then may be sent to another entity in the network.
Perhaps to address such privacy issues, among other scenarios, embodiments contemplate that server-side techniques may determine a user presence by indirect means where no additional equipment may be required at the premises, perhaps for example beyond what may be used for conducting a user adaptive video delivery session.
One or more embodiments may assume that the client has built-in logic for user adaptive multimedia delivery and/or may select content adaptively based on a user activity. Embodiments contemplate situations where, for example, a multimedia client may reside in a mobile device, and it may be playing a presentation including the set of example encoded streams illustrated in
More specifically, streams “720p_A28” and/or “720p_A14” may be suitable for watching videos when a user may be holding the phone in hand, for example. These streams may be selected when the client may have sufficient bandwidth to load them (e.g., perhaps selected only when sufficient bandwidth is available). In some embodiments, the highest rate stream up to a bandwidth capacity that may be available may be loaded, perhaps for example without such a bandwidth estimation.
One or more embodiments on the server side contemplate logic to estimate effective bandwidth of connection between the client and the server. In some embodiments, this can be inferred by analysis of TCP operation in a way it may implement transmission of data from a server to the client. Some embodiments contemplate the comparison of estimated available bandwidth with the rate of video stream(s) requested by the multimedia client.
In some embodiments, perhaps if the result of such a comparison shows that a sufficient amount of bandwidth is available, but the client has decided to select a stream normally dedicated to “in hand” watching of the content (e.g., requests a stream at a lower bit rate than the available bandwidth)—this may imply that the user may be holding the phone when an ad is being rendered, and this in turn, can be used for verification of ad impression, for example.
Embodiments contemplate that smart phones or tablets with user adaptive streaming clients, and the like, may be used in one or more of the described client-side embodiments, as these devices may already have a number of built-in sensors that may be capable of providing more information that can be used to detect user presence. This information may be combined with server-side analytic techniques to improve the accuracy of the detection.
Embodiments contemplate reporting user presence results and/or ad impression verification. Embodiments recognize that in many streaming systems, the client may receive a description at the beginning of the session listing the components of the multimedia presentation (e.g., audio, video, closed caption, etc.) and/or a name of one or more, or each, component, perhaps so they may be retrieved from the content server, among other reasons. Components may be encoded at different levels (e.g., bit rates or quality levels) and/or may be partitioned into segments, for example to enable adaptation (e.g., to bandwidth or quality). In such scenarios, among others, advertisements may be added (e.g., perhaps easily added) to a presentation by inserting them into the description, perhaps at the time when the description may be first retrieved (e.g., for on-demand content) and/or by updating it during the session (e.g., for live events). An example of a multimedia presentation description with an advertisement is shown in
In some embodiments, the client may retrieve the description from the content provider, and/or may request one or more, or each, of the segments of the ad/show, for example, perhaps to play back the presentation in
Embodiments contemplate one or more techniques that the client may use for reporting user presence results. These techniques may be used separately or in combination with the client-side techniques described herein. In some embodiments, clients in some server-side techniques might not report back results, perhaps because user presence detection may be performed at the server, among other reasons.
One or more embodiments contemplate that user presence results may be reported to the content provider. In some embodiments, clients may report back user presence results to the content provider (e.g.,
In some embodiments, results may be reported during a streaming session. Perhaps as part of a streaming session, among other scenarios, the HTTP GET request from the client may include special headers to report the user presence results to the server. The results may refer to a previously fetched ad, and/or they may include sufficient information to identify the ad (e.g., “contentId”), the time it was played, and/or the corresponding user presence results. One or more of these headers may be logged by the server, and/or may be sent to the ad server for ad impression verification, reporting, and/or billing, etc. The following shows a sample set of example custom HTTP headers:
In some embodiments, more detailed results may be provided by the client. For example, clients may provide the actual sensor readings, perhaps so that the ad agency server may perform more sophisticated analysis of the data for determining user presence, for auditing, and/or other purposes.
In some embodiments, the ad server may use the results received from the client, for example to do ad impression verification. Ad agencies may have different criteria to certify impressions. For example, some may require a 90% confidence, perhaps while others may bill advertisers at different rates based on the confidence level.
In some embodiments, one result at a time may be reported, perhaps in scenarios where HTTP headers might not be extended, among other scenarios. Results in headers may be compressed, encoded, encrypted, and/or otherwise obfuscated, perhaps to prevent eavesdropping, among other reasons, for example.
Embodiments contemplate reporting one or more results outside of a streaming session. In some embodiments, a client may report user presence results outside of a streaming session, perhaps to eliminate dependencies and/or to minimize data traffic during streaming, for example, among other reasons. Results may be reported to the server on a per-ad basis, may be aggregated by the client and/or reported periodically (e.g., once every 10 minutes), and/or at the end of a session (e.g., upon user logout). Any method for uploading data may be used by the client, for example using HTTP POST, SOAP/HTTP, FTP, email, and/or any other data transfer method. In Semite embodiments, clients may already know the address of the content provider, perhaps because they requested content from the provider, among other reasons. In some embodiments, techniques may be used to report multiple results, perhaps by sending multiple entries at a time, for example.
In some embodiments, perhaps if using HTTP POST, among other scenarios, the request may use a set of custom HTTP headers, as described herein, and/or may include the results in the body of the HTTP request, as shown in the example below.
In some embodiments, a simplified example of using SOAP/HTTP that may be used for user presence results is shown below.
Embodiments contemplate that user presence results may be reported to one or more Ad Agency Servers. In some embodiments, clients may also report user presence results directly to the ad agency server (e.g.,
As described herein, clients may report user presence results on a per-ad basis, periodically, and/or at the end of the session. Also, clients may use HTTP POST, SOAP/HTTP, FTP, email, and/or any other data transfer method.
Embodiments contemplate an ad verification proxy at the client. In one or more embodiments as described herein, the ad server may process the results received from clients and may verify ad impressions based on the results. The architecture of the system (e.g., of the ad server) may be adjusted (e.g., reduced complexity) by using an ad verification proxy (e.g.,
In some embodiments, the proxy may get the server's address from another module in the multimedia client, perhaps for example if results may be sent to the content provider (e.g.,
As described herein, the ad verification proxy may report user presence results on a per-ad basis, periodically, and/or at the end of the session. Also, clients may use HTTP POST, SOAP/HTTP. FTP, email, and/or any other data transfer method.
In one or more embodiments, ad impression results may include the ad ID and/or whether an ad impression may be true or false. Results may also include additional information (e.g., emotional state, demographics, etc.) for reporting, and/or billing, etc. In some embodiments, results may or might not include low-level data (e.g., accelerometer reading, confidence level, etc.), perhaps because the proxy may have already verified the impression. Such data may be reported to the server for auditing and/or other purposes. A sample ad impression example result message sent to the ad agency server using HTTP/SOAP is shown below.
Embodiments contemplate one or more techniques for calculating an attention score. The attention score may, for example, provide advertisers with a quantification and/or characterization of a user's impression of an advertisement and/or the advertisement's effectiveness.
As described herein, sensors from mobile devices and/or face detection algorithms may provide results that may be reported in a raw format to the content provider and/or ad agency. Embodiments contemplate that raw data may be different across devices (e.g., smartphone, tablet, laptop, etc.) and/or operating systems (e.g., Android, iOS, Windows, etc.). These differences may motivate the content provider and/or ad agency to understand the data being reported and/or to implement one or more algorithms to transform raw data into information that may be used to determine whether an ad impression occurred.
Embodiments contemplate that raw data may be synthesized by one or more techniques, which may provide more useful information that can be used to determine whether an ad impression occurred.
Embodiments contemplate one or more techniques that may synthesize raw data from various sources and/or may output information (e.g., “an attention score”) that may be used for ad impression verification. An example technique is shown in
Embodiments contemplate user presence detection, device interaction detection, and/or ad impression verification for content other than ads, such as but not limited to, for example, TV shows, newscasts, movies, teleconferences, educational seminars, etc. As described herein, audience measurement systems may estimate viewership (e.g., perhaps may only estimate viewership), as determining exact numbers of viewer may be difficult. Embodiments contemplate that one or more techniques may yield more accurate viewership numbers by detecting user presence during the time a show or movie plays.
Embodiments contemplate viewer detection. Embodiments recognize that face detection in frameworks such as Android OS, iOS, or other mobile device operating systems, may provide results with some level of granularity such that these results may be interpreted in a variety of ways. For example, in Android OS, face detection may return one or more of the following set of information for each face detected in a video frame:
Number of faces detected; and/or
For each detected face:
Embodiments contemplate that face detection results may be obtained several times per second (e.g., 10-30 face detection results per second). Over the time an ad plays (e.g., 10-60 seconds), a (e.g., relatively large) number of results may be obtained. Embodiments contemplate it may be useful to summarize this information and/or combine it with other data (e.g., sensors) to obtain a more reliable result, perhaps for example to detect user presence.
Embodiments contemplate one or more user detection algorithms. In some embodiments, it may be assumed that the camera in the mobile device is able to provide face detection results. Other devices may be used for user detection, perhaps for example if the camera feature may not be available. In some embodiments, an ambient light sensor may be assumed to be available. Other devices may be used to determine illumination level (e.g., analyzing pixel data from camera), perhaps for example if an ambient light sensor might not available.
As shown in
The number of faces that may be detected over the analysis period may vary, perhaps for example due to viewers that may be coming in or out of the field of view of the camera, due to occlusion, rotation, tilt, and/or due to limitations of the face detection algorithm used in the mobile device. An example of face detection is shown in
In some embodiments, perhaps at least some of the face detection results may be invalid because of poor lighting conditions. That is, for example, perhaps even if camera face detection may be available, but the viewer(s) may be in a dark room, face detection may yield zero faces. In such scenarios, among others, readings from an ambient light sensor (ALS) may be used to determine whether results of face detection may be valid. Other techniques may be for detecting user presence, perhaps for example if the ALS reading may show that the viewing takes place under dark conditions which may render face detection ineffective. In some embodiments, it may be inferred that content on the screen may be difficult to see, perhaps for example if the ALS reading shows that viewing takes place under extremely high lighting conditions (e.g., outdoors on a sunny day). This information may be used to determine whether an ad and/or content is being watched and/or watched effectively.
In some embodiments, perhaps as the number of faces detected may vary over time, a summary of the results may be obtained by using one or more statistical analysis techniques. For example, the average number of viewers over the analysis period may be used to determine user presence. In such scenarios, among others, it may be the case where the average number of viewers is a non-integer number. In some embodiments, rounding or a floor operation may be used to obtain an integer number of viewers, In some embodiments, a median operation may be used to obtain the number of viewers over the analysis period.
In some embodiments, face detection results might not be available to the viewer detection module, perhaps for example because no camera might be available in the device, and/or because the user (e.g., due to privacy concerns or other reasons) might not grant permission for the camera to be used for ad impression verification. In such scenarios, among others, other techniques (e.g., use device state detection) may be used for ad impression verification.
Embodiments contemplate device state detection. Embodiments recognize that sensors such accelerometers and/or gyroscopes may be in modern mobile devices (e.g., Android, iOS and Microsoft smartphones and tablets, etc.) The input from these sensors may be used to determine the device state (e.g., in hand, on a stand, on a table facing up or down, etc.). The device state information may be useful as it may be used to gauge user interest and/or attention while an ad is playing. For example, it may be inferred that a user's attention is likely on the screen of the device, perhaps for example if the user holds the mobile device in the user's hand while the ad is playing. A higher ad impression may be more likely than if the user puts the device on a table, and/or perhaps on a table facing down, for example perhaps if it may be detected that the mobile device is held in the user's hand.
Accelerometer and gyroscope data may be analyzed to determine the device state. Embodiments contemplate that these sensors may produce noisy data, that is, raw data may vary, perhaps significantly, between readings. Advanced signal processing techniques may be used to analyze the data and/or produce a meaningful result. Statistical analysis, among other techniques, may be used.
In statistical analysis, data may be analyzed over a period of time (e.g., one second) to obtain a figure of merit that represents the data. Examples of figure of merits are the average, median, variance, and/or standard deviation. Any of these (or a combination of them or other figures of merit) may be used to represent the data over the analysis period. For device state detection, de variance may be useful as it may capture the variations of the data over the analysis period. A device state may be reliably determined, perhaps for example, based on these variations, in some embodiments, variance may be calculated using the example equation shown below:
where “x” is the data from accelerometer and/or gyroscope (X, Y and Z axis), and “N” is the number of data points over the analysis period.
Variance may be used to determine device state, perhaps for example using the classifier shown in
Again referring to
Embodiments contemplate consideration of “Gyro” sensor data, perhaps for example but not limited to, if the variance may be below Th, which may indicate that motion level is very low (e.g., close to zero). Gyro sensor data may indicate an actual orientation of the device, perhaps using a z-axis Gyro (Gyro(z)), for example. It may be assumed that a device is propped up (e.g., on a stand) and/or may be at a reasonable viewing angle, perhaps for example if the z-axis position may exceed a threshold Tu. This may indicate that a user may have propped up the device to watch the screen.
It may be assumed the device is on a surface facing up, perhaps for example if the z-axis position may be less than Tu and/or may be larger than another threshold Td. In some embodiments, this may be interpreted as a user who may have put down the device and/or might or might not be watching the screen while it is on the surface. Otherwise, the device may be facing down and/or there may be a high probability that the screen may not visible to any users.
Embodiments contemplate ad impression analysis. In some embodiments, the output of the “viewer/user presence detection” modules and/or the “device state detection” modules described herein may be used by the “ad impression verification analysis” modules described herein to calculate an “attention score”. Since the “viewer/user presence detection” and/or “device state detection” modules may output different information, the “ad impression verification analysis” modules may perform different analysis based on the differing inputs.
For example, referring to
For example, the attention score may be one of the states listed below Other states are contemplated and may be used.
The example classifier technique such as the one shown in
Although features and elements are described above in particular combinations, one of ordinary skill in the art will appreciate that each feature or element can be used alone or in any combination with the other features and elements. In addition, the methods described herein may be implemented in a computer program, software, or firmware incorporated in a computer-readable medium for execution by a computer or processor. Examples of computer-readable media include electronic signals (transmitted over wired or wireless connections) and computer-readable storage media. Examples of computer-readable storage media include, but are not limited to, a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs). A processor in association with software may be used to implement a radio frequency transceiver for use in a WTRU, UE, terminal, base station, RNC, or any host computer.
This application claims the benefit of U.S. Provisional Application No. 61/880,815, titled “Verification of Ad Impressions in User-Adaptive Multimedia Delivery Framework”, filed on Sep. 20, 2013, and U.S. Provisional Application No. 61/892,422, titled “Verification of Ad Impressions in User-Adaptive Multimedia Delivery Framework”, filed on Oct. 17, 2013, the disclosures of bath applications hereby incorporated by reference in their respective entirety as if fully disclosed herein, for all purposes.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2014/056663 | 9/19/2014 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
61880815 | Sep 2013 | US | |
61892422 | Oct 2013 | US |