Advertising is a tool for marketing goods and services, attracting customer patronage, or otherwise communicating a message to a widespread audience. The advertisements are typically presented through various types of media including, but not limited to, television, radio, print, billboard (or other outdoor signage), Internet, digital signage, mobile device screens, etc.
Digital signs, such as LED, LCD, plasma and projected images, can be found in public and private environments, such as retail stores and corporate locations. The components of a typical digital signage installation include display screen(s), media player(s), and a content management server. Sometimes two or more of these components are present in a single device but typically there is a display screen, a media player, and a content management server that is connected to the media player over a private network. One content management server may support multiple media players and one media player may support multiple screens.
Regardless of the media, whether it be via a digital sign or other media, advertisements are presented to the general public with the intention of commanding the attention of the audience and to induce prospective customers to purchase the advertised goods or service, or otherwise be receptive to the message being conveyed.
The present disclosure may be better understood and its numerous features and advantages made apparent by referencing the accompanying drawings.
Conventional mass advertising, including digital signs, is a non-selective medium. As a consequence, it is difficult to reach a precisely defined market segment. The volatility of the market segment, especially with placement of digital signs in public settings, is heightened due to the changing variations in the composition of audiences. In many circumstances, the content may be selected and delivered for display by a digital sign based on a general understanding of the consumer tendencies considering time of day, geographic coverage, etc. For large scale deployments in public venues (e.g. malls, airports, hospitals, etc.), there are numerous simultaneous audience members within the immediate range. Typical digital signage implementations do not serve customized content to multiple users, which may make it difficult for the message to have its intended impact.
As described herein, systems and methods for providing targeted content to multiple users are provided. An image and a set of faces in the image are determined. Each face in the set of faces corresponds with a user of a display screen. For each face in the set, an identifier is determined, content targeted for the face is selected, a location of the face within the image is determined, and using the face location, a location on the display screen for providing a visualization of the targeted content is determined, such that targeted content is displayed close to the user or directly in front of the user.
The content computer 18 is a video image analysis computing device that is configured to analyze visual images taken by the imaging device 12. The imaging device 12 can be configured to take video images (i.e. a series of sequential video frames that capture motion) at any desired frame rate, or it can take still images. The term “content computer” is used to refer to the computing device that is interconnected to the imaging device 12, and is not intended to limit the imaging device to a camera per se. A variety of types of imaging devices can be used. It should also be recognized that the term “computer” as used herein is to be understood broadly as referring to a personal computer, portable computer, content server, a network PC, a personal digital assistant (PDA), a cellular telephone or any other computing device that is capable of performing the functions for receiving input from and/or providing control or driving output to the various devices associated with the interactive display system.
The imaging device 12 is positioned near a changeable display device 20, such as a CRT, LCD screen, plasma display, LED display, display wall, projection display (front or rear projection) or other type of display device. For a digital signage application, this display device can be a large size public display, and can be a single display, or multiple individual displays that are combined together to provide a single composite image in a tiled display. This can include projected image(s) that can be tiled together or combined or superimposed in various ways to create a display. The display device can also be comprised of multiple independent displays, each corresponding to a region of an image. For example, system 10 may use a dedicated display for each zone. An audio broadcast device, such as an audio speaker 22, can also be positioned near the display to broadcast audio content along with the video content provided on the display.
The digital display system 10 also includes a display computer 24 that is interconnected to provide the desired video and/or audio output to the display 20 and the audio speaker 22. The content computer 18 is interconnected to the display computer 24, allowing feedback and analysis from the content computer 18 to be used by the display computer 24. The display computer 24 may also provide feedback to the video camera computer regarding camera settings to allow the change of focus, zoom, field of view, and physical orientation of the camera (e.g. pan, tilt, roll), if the mechanisms to do so are associated with the camera.
A single computer can be used to control both the imaging device 12 and the display 20. For example, the single computer can be programmed to handle all functions of video image analysis, content selection, determination of display coordinates and control of the imaging device, as well as controlling output to the display. For example, the content computer 18 is configured to perform the functions of display computer 24, in addition to its own functions. Moreover, content computer 18 and display computer 24 may be embedded into display 20, camera 12, or the functionalities thereof may be split between content computer 18 and display computer 24.
Additionally, the digital display system 10 can be a network or part of a network or it can be interconnected to a network. The network can be a local area network (LAN), or any other type of computer network, including a global web of interconnected computers and computer networks, such as the Internet.
The content computer 18 can be any type of personal computer, portable computer, or workstation computer that includes a processing unit, a system memory, and a system bus that couples the processing unit to the various components of the computer. The processing unit may include processor(s), each of which may be in the form of any one of various commercially available processors. Generally, each processor receives instructions and data from a read-only memory and/or a random access memory. The controller can also include a hard drive, a floppy drive, and CD ROM drive that are connected to the system bus by respective interfaces. The hard drive, floppy drive, and CD ROM drive contain respective computer-readable media disks that provide non-volatile or persistent storage for data, data structures and computer-executable instructions. Other computer-readable storage devices (e.g., magnetic tape drives, flash memory devices, and digital versatile disks) can also be used with the content computer 18.
The imaging device 12 is oriented toward an audience 14 of individual people, who are gathered in the audience area, designated by outline 16. While the audience area is shown as a definite outline having a particular shape, this is intended to represent that there is some area near the imaging device 12 in which an audience can be viewed. The audience area can be of a variety of shapes, and can comprise the entirety of the field of view 17 of the imaging device, or some portion of the field of view. For example, some individuals can be near the audience area and perhaps even within the field of view of the imaging device, and yet not be within the audience area that will be analyzed by the content computer 18.
In operation, the imaging device 12 captures an audience view, which may involve capturing a single snapshot or a series of frames/video. It can involve capturing a view of the entire camera field of view, or a portion of the field of view (e.g. a region, black/white vs. color, etc). Additionally, it is to be understood that multiple imaging devices can be used simultaneously to capture video images for processing.
Content computer 18 detects faces in the snapshot or frame. Any face or object detection methodology may be used. In certain deployments (e.g., public settings), the displays are becoming larger and often can serve multiple users at the same time. For example, the frame may include numerous people, some of whom might be looking away from the imaging device 12 and/or screen 20 or might be engaging in some other action which prevents the detection of faces. Where faces are detected, the content computer 18 determines the location of the face within the frame. For example, the camera coordinates of the face are determined.
For each face, the camera coordinates are used to determine where the targeted content is shown on the display screen of display 20. In order to increase the reachability of the content's message to the intended recipient (i.e., target audience member), the content is displayed within proximity to the intended recipient (e.g., in front of the recipient). The specific proximal location on the display 20 is determined by mapping the camera coordinates to display coordinates. In another embodiment, the specific proximal location on the display 20 is determined by identifying a zone within which the face is located, and using the corresponding zone in the display screen to present the content. Furthermore, the display 20 can serve targeted content to multiple audience members at the same time. For example, where the location of the audience members vary, specific content for each of the audience members may be displayed.
The display 20 (and the audio speaker 22) provides the selected content to the targeted audience members. The content can be in the form of commercial advertisements, entertainment, political advertisements, survey questions, or any other type of content.
System 200 further includes a content computer 205, which includes a facial detection module 220, a location mapping module 230, a fixed zone mapping table 240, a transformation matrix 245, a calibration module 260, and a content player 250.
The facial detection module 220 is configured to detect faces within a snapshot, a series of frames or a video stream (hereinafter, “frame”). Various methods of facial detection may be used. For example, facial detection begins by locating a face outline, a mouth, eyes, etc. The facial detection module 220 determines boundaries of the detected face, and extracts facial attributes.
The location mapping module 230 is operatively coupled to the facial detection module 220 and is configured to determine a unique identifier for the face (as calculated by a face identifier module 231), by tracking face boundaries between the frames and assigning unique identifiers. The bounding rectangle provides a rough location of the face within the frame. Furthermore, the location mapping module 230 is configured to determine the display region, in display pixel coordinates, that is closest to the audience member that corresponds to the detected face by using the transformation matrix 245, for example, to map camera coordinates of the bounding rectangle to correlating display pixel coordinates.
in another embodiment, location mapping module 230 is configured to determine the display region, in display pixel coordinates, using the fixed zone mapping table 240, for example, to map the user location (i.e., location of the face in the image) to a predefined zone in the display. More specifically, camera coordinates of the detected face are mapped to a fixed zone (“interaction zone”) in the display screen of display 270. Each zone services a distinct set of audience members, based on the location of the audience member's location (i.e., face location n the image captured by the data source 210). For example, a sample fixed zone mapping table is shown below:
Using the fixed zone mapping table as shown, a camera coordinate (x1, y1) maps to zone 1 in the display. Zone 1 is defined in this example as being made up of the range (x′1, y′1) to (x′2, y′2) in display coordinates. When the content is actually displayed, it will be shown in zone 1 of the display screen. One methodology for defining the camera and display coordinates/zones is illustrated in
The calibration module 260 is operatively coupled to the transformation matrix 245 and the content player 250. Where the system 200 is configured to display content that is closest to the audience member, without any predefined zones, calibration module 260 is configured to calculate the transformation matrix 245 or a function to determine how an audience member's face (as seen by the data source, e.g., camera) is mapped to the display screen of display 270. Several methods may be used for such calculation, for example, using the relative size of the display and location, orientation and field of view of the camera or other image capture device.
The transformation matrix 245 may also be calculated by asking an audience member to step in front of a marker that is determined, for example, by the calibration module 260 and shown in the display screen of display 270. The markers may be displayed at known display pixel coordinates. The markers allow the system 200 to capture the audience member's image using, for example, a camera, locate the audience member's face on the camera image, and record the camera pixel coordinates. The display coordinates and the corresponding camera pixel coordinates are compared to calculate the transformation matrix 245 or function. In another embodiment, instead of stepping in front of the marker, the audience member may hold up a clearly recognizable object in front of the marker on the display.
The content player 250 is operatively coupled to the calibration module 260 and the location mapping module 230. The content player 250 is configured to receive display coordinates from the location mapping module 230 and send instruction to display 250 to show the selected content for a particular audience member in the zone or display region closest to the user.
The display 270 is operatively coupled to content player 250 and is configured to provide a visualization of the selected content in a location of the display screen specified by the display coordinates.
At step 305, an audience view is captured. For example, imaging device(s) may capture an image of the audience members who are gathered in an audience area. The image (e.g. frame) is provided to the content computer for further analysis. At step 310, for each image, facial detection is performed. Specifically, the boundaries of each detected face are determined, for example a bounding rectangle is generated. In one embodiment, facial attributes are determined in addition to the boundaries.
One of the faces in the image is retrieved, at step 315. An identifier for the face is determined at step 320. In one embodiment, each detected face is associated with a unique identifier. The identifier is used to track faces of audience members, for example, as they move across frames. Faces which have not been seen before receive a new identifier. On the other hand, if a detected face was previously seen, for example in the preceding frame, the identifier of the face in the preceding frame is assigned to the face in the current frame. The process of determining identifiers is further described with respect to
At step 330, content is selected. The content (e.g., multimedia content) may be stored in a repository and selected based on attributes of the detected face. The attributes may include such features as the age, gender, and ethnicity of the detected face. Content selection is further described with respect to
Using the face location, a location on the display screen within which to provide a visualization of the selected content is determined, at step 350. The location on the display may be determined in various manners. In one embodiment, a display screen is partitioned into multiple predefined zones. Each zone is slated to serve a distinct set of audience members, depending on the location of the audience member's face in the image. In other words, each facial location in an image maps to a particular zone in the display screen, such that the selected content is displayed specifically in that zone. The use of the zones enables the display screen to serve targeted content to multiple audience members at the same time. It should be noted that the process of selecting the content and determining the display screen location can occur in parallel. The display screen location can be determined before the content is actually selected. The selected content may then be presented to the user.
In another embodiment, the screen is not partitioned into zones, but rather, the display screen location is determined by directly mapping the camera coordinates to display coordinates. Specifically, the face location (in camera coordinates) is mapped to the corresponding display screen coordinates, such that the content is displayed closest to the audience member. This is another way to serve targeted content to multiple users at the same time. Determining the location on the display screen is further described with respect to
At step 360, it is determined whether there are additional faces in the image. If there are, processing continues back to step 315 where another face is retrieved and the display screen location is determined for that face. After all faces in the image have been analyzed, the process can repeat itself, whereby another image is received. The face table may be cleaned, for example by removing identifiers in the table for faces that are no longer present in the image.
Each detected face is associated with a unique identifier. As previously described, the identifier can be used to track faces of audience members, for example, as they move across frames. To accomplish this, a face table may maintain records of historical (i.e., previously detected) faces. The face table includes the face identifier (“face ID”) and camera pixel coordinates (x1, y1, x2, y2) of the bounding rectangle associated with a detected face, e.g., the top left corner (x1, y1) and the lower right corner (x2, y2). The face table may include the camera pixel reference coordinates (x, y). The reference coordinates may be midpoint coordinates, which can be calculated in many ways using for example, the center of the face rectangle, nose, midpoint between the eyes, etc. For example, the midpoint coordinates may be calculated by the following: where x=(x1+x2)/2, and y=(y1+y2)/2. Furthermore, the face table may include values of various attributes of the detected face, as determined by facial analysis or detection methodologies. The attributes may include age or age group of the face, gender, ethnicity, etc. In another embodiment, the face table includes the ID of the content that is being played for the user and the location of the content in the display.
At step 322, it is determined whether a face is known. Specifically, it is determined whether the bounding rectangle of the current face significantly overlaps with a previously detected face. The rationale is that any movement of an audience member from one frame to the next frame should satisfy a minimum overlap threshold. If there is not enough overlap between the bounding rectangles in the current frame and a previous frame, it is unlikely that the audience member who corresponds with the current face is moving among the frames. Rather, it is more likely that the current face is that of a new audience member. The face table may be referenced to obtain the bounding rectangle coordinates of the previously detected faces.
In another embodiment, a unique signature may be calculated based on facial features and used to determine if a face is known or unknown. More specifically, a unique signature of the current face is determined, using the facial features. This unique signature of the current face is then compared to the unique signatures of historical faces. If there is a match between the signature of the current face with that of a historical face, it is determined that the face is known.
Where there is no match at step 322, it is determined the current face is a new face, and a new face ID is assigned at step 328. A new record is added to the face table, at step 329, to reflect the data of the new face, at step 329. For example, the face ID assigned at step 328 is recorded along with the bounding rectangle coordinates and/or reference coordinates.
One the other hand, where a match is determined at step 322, it is determined the same audience member has moved along a trajectory in real space. At step 324, the face ID of the matching historical face is assigned to the current face. Furthermore, at step 326, the record of the matching historical face is updated with the information of the current face. Specifically, the bounding rectangle coordinates and reference coordinates are updated in the face table to reflect the values of the current face.
As such, the audience member may be tracked among the images or frames. By doing so, the content may follow the trajectory of the targeted audience member when it is displayed. For example, if the audience member is associated initially with zone 1 and then moves to zone 2, the content selected for that member may be displayed in zone 2 (after initially being displayed in zone 1).
The selection of content for specific targets may be performed in many ways. In one embodiment, the face image is extracted from the image/frame, at step 333. Various facial features may then be extracted, at step 335.
A facial pattern storage may be implemented to maintain a listing of facial attributes, such as age, gender, and ethnicity. The extracted facial features are mapped to attributes, at step 337. For example, the attributes of the face may be 30, female, Asian which correspond to the age, gender, and ethnicity attributes. At step 339, the attribute values of the face are recorded, for example in the face table. At step 340, content is selected based on the attributes. Various methods of content selection may be used. For example, certain advertisements may be targeted specifically for an age and gender group.
The placement of the content on the display screen can impact the efficacy with which the content's message can reach the intended target. Moreover, multiple audience members are serviced with targeted content on a single display (e.g., large scale deployment such as a display wall) by selective placement of the content.
To determine the location on the display screen where the content will be presented, it is determined whether the digital display system is configured with fixed zones or fluid mapping, at step 352.
Fixed Zones
In one embodiment, the display screen is partitioned into Z number of zones, where each zone services a distinct set of audience members. At step 353, the display resolution is determined by dividing the total number of pixels in the image by the total number of zones Z.
At step 354, the system determines the zone in which the reference point of the face is located. For example, the midpoint of the face in camera pixel coordinates is determined, i.e., (x,y). The reference point coordinates (e.g., midpoint coordinates) map to a particular zone. For example, a fixed zone mapping table may include a mapping of ranges of camera coordinates to zones. It is determined which range the midpoint coordinates fall into, and the corresponding zone is identified. The zones may be partitioned in many ways, i.e., in various orientations, configurations, number of zones, etc.
In one embodiment, assuming the display screen is partitioned into Z horizontal zones, each one serving a distinct set of audience members, the display pixel coordinates along the x-axis for each zone may be determined by:
where, Zw=Zone width in display pixel
It should be noted, the INT function returns the integer portion of a number.
The corresponding zone is reserved for the current face, at step 355. Specifically, it is determined whether the zone is free and available for presenting the content targeted for the audience member associated with the current face. A zone reservation table may be implemented to maintain a list of zone identifiers, and a face ID to which the zone is assigned. Once the zone has been assigned to a face ID, the content targeted to that face ID may be displayed in the particular zone. In one embodiment, when the content has been displayed for a set period of time, the zone assignment may be cleared, allowing other assignments to take place. As such, the reservation table can be thought of as a resource allocation mechanism. Other resource allocation methodologies may be implemented as well.
At step 356, the content display area is determined to be that of the reserved zone. The content may then be presented within the boundaries of the reserved zone.
Fluid Mapping
In another embodiment, fluid mapping is performed to determine the most impactful placement of the content on the display screen. Instead of partitioning the display screen into fixed zones, the content is presented such that it is closest to the audience member.
At step 357, the reference point pixel coordinates (which are in camera coordinates) are mapped to display coordinates. The camera coordinates are translated to display coordinates, for example, by:
The result is a display coordinate (x, y). At step 358, the content display area is determined to be that of the display coordinate. The content may then be presented on the display screen such that it is centered on the display coordinate (x, y). As such, instead of fixed zones, fluid mapping enables the content to be placed on the display screen in the same relative location at which the face appears in the image.
Both fixed zones and fluid mapping allow multiple audience members to be serviced at the same time, since the entire display screen is not occupied by a single piece of content.
Zone Configuration
Typically, displays and cameras have different resolutions. In many scenarios, it is the display view with the greater level of resolution. For example, a liquid-crystal display (LCD) television has a pixel resolution of 1920×1080, whereas many cameras have less with respect to image resolution. As described herein, users define boundaries of the zones in the camera view, display view, and/or mapping between the camera view and display view.
In one embodiment, the corresponding zones in the display view 620 are also user-defined. In another embodiment, the corresponding zones in the display view 620 are determined by translating the camera coordinates of each zone to display coordinates. The Table 1 can be generated by the user's selection of zones using an interface for the camera view and/or display view.
In one embodiment, the camera 602 captures an image of an audience, which includes member 603. It is determined that member 603 is within zone 2 in the camera coordinates, by determining the bounding rectangle of the face of member 603 in the camera coordinates. The bounding rectangle is expressed using (x,y) coordinates, i.e., a two-dimensional coordinate. When an advertisement is selected, it is presented to the member 603 on display 607 which services zone 2 of floor plan 604.
In one embodiment, zones 1-3 are within one field of view, and zones 4-6 are in another field of view of the camera. Zone 1 in camera view 710 is defined in z coordinates as a member being within two and three feet from the camera. Other zones are defined similarly.
The camera 720 is oriented toward an audience (including member 730), who is gathered in an audience area within the field of view 721 of the camera 720, and an audience (including member 735) who is gathered in an audience area within the field of view 722.
Each of the independent displays 726-728 and 746-748 corresponds to a distinct region of an image. For example, display 726 corresponds to zone 1 in camera coordinates of floor plan 724, display 727 corresponds to zone 2 in camera coordinates, and display 728 corresponds to zone 3 in camera coordinates. Furthermore, display 746 corresponds to zone 4 in camera coordinates of floor plan 724, display 747 corresponds to zone 5 in camera coordinates, and display 748 corresponds to zone 6 in camera coordinates.
In one embodiment, the camera 720 captures an image of an audience, which includes member 730. It is determined that member 730 is within zone 2 in the camera coordinates, by determining the bounding rectangle of the face of member 730 in the camera coordinates. The bounding rectangle is expressed using (x,y) coordinates, i.e., a two-dimensional coordinate. For an image which includes member 735, it is determined that the bounding rectangle using two-dimensional coordinates (x,y) is the same as that of member 730. As such, the ad targeted for member 735 may be presented on display 727, rather than display 747.
By using the z coordinate, the zone location of member 735 may be correctly identified as zone 5 which corresponds to display 747, using a distance 750 measurement of member 735 to the camera 720. Likewise, the zone location of member 730 may be correctly identified as zone 2 which corresponds to display 727, using a distance 755 measurement of member 730 to the camera 720.
The computer system 800 may additionally include a computer-readable storage media reader 812, a communications system 814 (e.g., a modem, a network card (wireless or wired), an infra-red communication device, etc.), and working memory 818, which may include RAM and ROM devices as described above. In some embodiments, the computer system 800 may also include a processing acceleration unit 816, which can include a digital signal processor (DSP), a special-purpose processor, and/or the like.
The computer-readable storage media reader 812 can further be connected to a computer-readable storage medium 810, together (and in combination with storage device 808 in one embodiment) comprehensively representing remote, local, fixed, and/or removable storage devices plus any tangible non-transitory storage media, for temporarily and/or more permanently containing, storing, transmitting, and retrieving computer-readable information (e.g., instructions and data). Computer-readable storage medium 810 may be non-transitory such as hardware storage devices (e.g., RAM, ROM, EPROM (erasable programmable ROM), EEPROM (electrically erasable programmable ROM), hard drives, and flash memory). The communications system 814 may permit data to be exchanged with the network and/or any other computer described above with respect to the system 800. Computer-readable storage medium 810 includes a multi-user content module executable 827.
The computer system 800 may also comprise software elements, which are machine readable instructions, shown as being currently located within a working memory 818, including an operating system 820 and/or other code 822, such as an application program (which may be a client application, Web browser, mid-tier application, etc.). Alternate embodiments of a computer system 800 may have numerous variations from that described above. For example, customized hardware might also be used and/or particular elements might be implemented in hardware, software (including portable software, such as applets), or both. Further, connection to other computing devices such as network input/output devices may be employed.
The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. It will, however, be evident that various modifications and changes may be made.
Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example of a generic series of equivalent or similar features.