1. Field of the Invention
The present invention is a system and method for designing a comprehensive media audience measurement platform starting from a site, display, and crowd characterization, to a data sampling planning and a data extrapolation method.
2. Background of the Invention
The role of digital media for advertisement in public spaces is becoming increasingly important. The task of measuring the degree of media exposure is also deemed as very important both as a guide to the equipment installation (equipment kind, position, size, and orientation) and as a rating for the content programming. As the number of such displays is growing, measuring the viewing behavior of the audience using human intervention can be very costly.
Unlike the traditional broadcast media, the viewing typically occurs in public spaces where a very large number of unknown people can assemble to comprise an audience. It is therefore hard to take surveys of the audience using traditional interviewing through telephone or mail methods, and on-site interviews can be both very costly and potentially highly biased.
There are technologies to perform automated measurement of viewing behavior; the viewing behavior in this context is called ‘viewership’. These automatic viewership measurement systems can be maintained with little cost once installed, and can provide a continuous stream of viewership measurement data. These systems typically employ electro-optical visual sensor devices, such as video cameras or infrared cameras, and provide consistent sampling of the viewing behavior based on visual observations. However, due to the high initial installation cost, the sensor placement planning is extremely important. While the equipments deliver consistent viewership measurements, they have limitations of measuring the view from their individual fixed positions (and orientations, in most cases). However, relocating the equipment can affect the integrity of the data.
In a typical media display scenario, it is unrealistic, if not impossible, to detect and record all instances of viewership occurring in the site using these sensors. Any optical sensor has a limited field of coverage, and its area of coverage can also depend on its position and orientation. A large venue can be covered by multiple sensors, and their individual lens focal lengths, positions, and orientations need to be determined.
The data delivered from these sensors also needs to be properly interpreted, because the viewership data that each equipment provides has been spatially sampled from the whole viewership at the site. The ultimate goal of the audience measurement system is to estimate the site-wide viewership for the display; it is crucial to extrapolate the site-wide viewership data from the sampled viewership data in a mathematically sound way.
The present invention provides a comprehensive solution to the problem of automatic media measurement, from the problem of sensor placement for effective sampling to the method of extrapolating spatially sampled data.
There have been prior attempts for measuring the degree of public exposure for the media, including broadcast media or publicly displayed media.
U.S. Pat. No. 4,858,000 of Lu, et al. (hereinafter Lu U.S. Pat. No. 4,858,000) disclosed an image recognition method and system for identifying predetermined individual members of a viewing audience in a monitored area. A pattern image signature is stored corresponding to each predetermined individual member of the viewing audience to be identified. An audience scanner includes audience locating circuitry for locating individual audience members in the monitored area. A video image is captured for each of the located individual audience members in the monitored area. A pattern image signature is extracted from the captured image. The extracted pattern image signature is then compared with each of the stored pattern image signatures to identify a particular one of the predetermined audience members. These steps are repeated to identify all of the located individual audience members in the monitored area.
U.S. Pat. No. 5,771,307 of Lu, et al. (hereinafter Lu U.S. Pat. No. 5,771,307) disclosed a passive identification apparatus for identifying a predetermined individual member of a television viewing audience in a monitored viewing area, where a video image of a monitored viewing area is captured first. A template matching score is provided for an object in the video image. An Eigenface recognition score is provided for an object in the video image. The Eigenface score may be provided by comparing an object in the video image to reference files. The template matching score and the Eigenface recognition score are fused to form a composite identification record from which a viewer may be identified. Body shape matching, viewer tracking, viewer sensing, and/or historical data may be used to assist in viewer identification. The reference files may be updated as recognition scores decline.
U.S. Pat. No. 6,958,710 of Zhang, et al. (hereinafter Zhang) disclosed systems, methods and devices for gathering data concerning exposure of predetermined survey participants to billboards. A portable transmitter is arranged to transmit a signal containing survey participant data, and a receiver located proximately to the billboard serves to receive the signal transmitted by the transmitter.
U.S. Pat. No. 7,176,834 of Percy, et al. (hereinafter Percy) disclosed a method directed to utilizing monitoring devices for determining the effectiveness of various locations, such as media display locations, for an intended purpose. The monitoring devices are distributed to a number of study respondents. The monitoring devices track the movements of the respondents. While various technologies may be used to track the movements of the respondents, at least some of the location tracking of the monitoring device utilize a satellite location system, such as the global positioning system. These movements of the respondent and monitoring device at some point coincide with exposure to a number of media displays. Geo data collected by the monitoring device are downloaded to a download server, for determining to which media displays the respondent was exposed. The exposure determinations are made by a post-processing server.
U.S. Pat. Application No. 20070006250 of Croy, et al. (hereinafter Croy) disclosed portable audience measurement architectures and methods for portable audience measurement. The disclosed system contains a plurality of portable measurement devices configured to collect audience measurement data from media devices, a plurality of data collection servers configured to collect audience measurement data from the plurality of portable measurement devices, and a central data processing server. A portable measurement device establishes a communication link with a data collection server in a peer-to-peer manner and transfers the collected audience measurement data to the data collection server. Because the portable measurement device is not dedicated to a particular local data collection server, the portable measurement device periodically or aperiodically broadcasts a message attempting to find a data collection server with which to establish a communication link.
U.S. patent application Ser. No. 11/818,554 of Sharma, et al. (hereinafter Sharma) presented a method and system for automatically measuring viewership of people for displayed objects, such as in-store marketing elements, static signage, POP displays, various digital media, retail TV networks, and kiosks, by counting the number of viewers who actually viewed the displayed object vs. passers-by who may appear in the vicinity of the displayed object but do not actually view the displayed object, and the duration of viewing by the viewers, using a plurality of means for capturing images and a plurality of computer vision technologies, such as face detection, face tracking, and the 3-dimensional face pose estimation of the people, on the captured visual information of the people.
Lu U.S. Pat. No. 4,858,000 and Lu U.S. Pat. No. 5,771,307 introduce systems for measuring viewing behavior of broadcast media by identifying viewers from a predetermined set of viewers, based primarily on facial recognition. Sharma introduces a method to measure viewership of displayed objects using computer vision algorithms. The present invention also aims to measure viewing behavior of an audience using visual information, however, has focus on publicly displayed media for a general unknown audience. The present invention utilizes an automated method similar to Sharma to measure the audience viewership by processing data from visual sensors. The present invention provides not only a method of measuring viewership, but also a solution to a much broader class of problems including the data sampling plan and the data extrapolation method based on the site, display, and crowd analysis, to design an end-to-end comprehensive audience measurement system.
All of the systems presented in Zhang, Percy, and Croy involve portable hardware and a central communication/storage device for tracking audience and transmitting/storing the measured data. They rely on a predetermined number of survey participants to carry the devices so that their behavior can be measured based on the proximity of the devices to the displayed media. The present invention can measure a very large number of audience behaviors without relying on recruited participants or carry-on devices. It can accurately detect not only the proximity of the audience to the media display, but the actual measurement of the viewing time and duration based on the facial images of the audience. These prior inventions can only collect limited measurement data sampled from a small number of participants; however, the present invention provides a scheme to extrapolate the measurement data sampled from the camera views, so that the whole site-wide viewership data can be estimated.
There have been prior attempts for estimating an occupancy map of people based on the trajectory of the people.
U.S. Pat. No. 6,141,041 of Carlbom, et al. (hereinafter Carlbom) disclosed a method and apparatus for deriving an occupancy map reflecting an athlete's coverage of a playing area based on real time tracking of a sporting event. The method according to the invention includes a step of obtaining a spatiotemporal trajectory corresponding to the motion of an athlete and based on real time tracking of the athlete. The trajectory is then mapped over the geometry of the playing area to determine a playing area occupancy map indicating the frequency with which the athlete occupies certain areas of the playing area, or the time spent by the athlete in certain areas of the playing area. The occupancy map is preferably color coded to indicate different levels of occupancy in different areas of the playing area, and the color coded map is then overlaid onto an image (such as a video image) of the playing area. The apparatus according to the invention includes a device for obtaining the trajectory of an athlete, a computational device for obtaining the occupancy map based on the obtained trajectory and the geometry of the playing area, and devices for transforming the map to the camera view, generating a color coded version of the occupancy map, and overlaying the color coded map on a video image of the playing area.
It is one of the features of the invention to estimate the occupancy map of the crowd with a method similar to Carlbom. The invention makes use of other concepts, such as the visibility map and viewership relevancy map, for the purpose of characterizing an audience behavior, and ultimately for the purpose of finding an optimal camera placement plan.
There have been attempts for designing a camera platform or a placement method for the purpose of monitoring a designated area.
U.S. Pat. No. 3,935,380 of Coutta, et al. (hereinafter Coutta) disclosed a closed circuit TV surveillance system for retail and industrial establishments in which one or more cameras are movable along a rail assembly suspended from the ceiling which enables the cameras to be selectively trained on any area of interest within the establishment. Employing two cameras, one may be both tilted and horizontally trained to observe any location within the line of sight of the camera and the other one particularly tilted and trained to observe the amount showing on a cash register.
U.S. Pat. No. 6,437,819 of Loveland, et al. (hereinafter Loveland) disclosed an automated system for controlling multiple pan/tilt/zoom video cameras in such a way as to allow a person to be initially designated and tracked thereafter as he/she moves through the various camera fields of view. Tracking is initiated either by manual selection of the designated person on the system monitor through the usage of a pointing device, or by automated selection of the designated person using software. The computation of the motion control signal is performed on a computer through software using information derived from the cameras connected to the system, and is configured in such a way as to allow the system to pass tracking control from one camera to the next, as the designated person moves from one region to another. The system self-configuration is accomplished by the user's performance of a specific procedure involving the movement and tracking of a marker throughout the facility.
U.S. Pat. No. 6,879,338 of Hashimoto, et al. (hereinafter Hashimoto) disclosed asymmetrical camera systems, which are adapted to utilize a greater proportion of the image data from each camera as compared to symmetrical camera systems. Specifically, an outward facing camera system in accordance with one embodiment of the invention includes a plurality of equatorial cameras distributed evenly about an origin point in a plane. The outward facing camera system also includes a first plurality of polar cameras tilted above the plane. Furthermore, some embodiments of the invention include a second plurality of polar cameras tilted below the plane. The equatorial cameras and polar cameras are configured to capture a complete coverage of an environment.
Coutta presented a method to place multiple cameras for monitoring a retail environment, especially the cash register area. Because the area to be monitored is highly constrained, the method doesn't need a sophisticated methodology to optimize the camera coverage as the present invention aims to provide.
Loveland presents a pan/tilt/zoom camera system for the purpose of monitoring an area and tracking people one by one, while the present invention aims to find an optimal placement of cameras so that the cameras have maximal concurrent coverage of the area and of multiple people at the same time.
Hashimoto employs multiple outward facing cameras to have a full coverage of the surroundings, while the present invention provides a methodology to place cameras to have maximal coverage given the constraints of the number of cameras and the constraints of the site, display, and the measurement algorithm.
There have been prior attempts for counting or monitoring people in a designated area by automated means.
U.S. Pat. No. 5,866,887 of Hashimoto, et al. (hereinafter Hashimoto) disclosed a method to measure the number of people passing by a certain area. Plurality of rows is provided on a calling of sensors and each have a plurality of distance variation measuring sensors. The distance variation measuring sensors each include a light emitter and a light receiver arranged in the orthogonal direction to the direction in which human bodies pass. The number of passers is detected on the basis of the number of the distance variation measuring sensors that have detected a human body. The traveling direction of human bodies is detected on the basis of the change in the distance to the distance variation measuring human bodies measured by the sensors.
U.S. Pat. No. 6,987,885 of Gonzalez-Banos, et al. (hereinafter Gonzalez-Banos) disclosed systems, apparatuses, and methods that determine the number of people in a crowd using visual hull information. In one embodiment, an image sensor generates a conventional image of a crowd. A silhouette image is then determined based on the conventional image. The intersection of the silhouette image cone and a working volume is determined. The projection of the intersection onto a plane is determined. Planar projections from several image sensors are aggregated by intersecting them, forming a subdivision pattern. Polygons that are actually empty are identified and removed. Upper and lower bounds of the number of people in each polygon are determined and stored in a tree data structure. This tree is updated as time passes and new information is received from image sensors. The number of people in the crowd is equal to the lower bound of the root node of the tree.
U.S. Pat. No. 6,697,104 of Yakobi, et al. (hereinafter Yakobi) disclosed a video based system and method for detecting and counting persons traversing an area being monitored. The method includes the steps of initialization of at least one end unit forming part of a video imaging system, the end unit having at least one camera installed, the camera producing images within the field of view of the camera of at least part of the area being monitored, the end unit includes a plurality of counters; digitizing the images and storing the digitized images; detecting objects of potential persons from the digitized images; comparing the digitized images of objects detected in the area being monitored with digitized images stored in the working memory unit to determine whether the detected object is a new figure that has entered the area being monitored or whether the detected object is a new figure that has entered the area being monitored or whether the detected object is a known figure, that has remained within the area being monitored and to determine that a figure which was not detected has left the area being monitored; and incrementing at least one of the plurality of counters with an indication of the number of persons that have passed through the area being monitored.
U.S. Pat. No. 7,139,409 of Paragios, et al. (hereinafter Paragios) disclosed a system and method for automated and/or semi-automated analysis of video for discerning patterns of interest in video streams. In a preferred embodiment, the invention is directed to identify patterns of interest in indoor settings. In one aspect, the invention deals with the change detection problem using a Markov Random Field approach, where information from different sources are naturally combined with additional constraints to provide the final detection map. A slight modification is made of the regularity term within the MRF model that accounts for real-discontinuities in the observed data. The defined objective function is implemented in a multi-scale framework that decreases the computational cost and the risk of convergence to local minima. The crowdedness measure used is a geometric measure of occupancy that is quasi-invariant to objects translating on the platform.
U.S. Pat. No. 7,203,338 of Ramaswamy, et al. (hereinafter Ramaswamy) disclosed methods and apparatus to count people appearing in an image. One disclosed method reduces objects appearing in a series of images to one or more blobs; for each individual image in a set of the images of the series of images, represents the one or more blobs in the individual image by one or more symbols in a histogram; and analyzes the symbols appearing in the histogram to count the people in the image.
U.S. Pat. Application No. 20060171570 of Brendley, et al. (hereinafter Brendley) disclosed a “Smartmat” system that monitors and identifies people, animals and other objects that pass through a control volume. Among other attributes, an exemplary system implementation can count, classify and identify objects, such as pedestrians, animals, bicycles, wheelchairs, vehicles, rollerbladers and other objects, either singly or in groups. Exemplary Smartmat implementations differentiate objects based on weight, footprint and floor/wall pressure patterns, such as footfall patterns of pedestrians and other patterns. The system may be applied to security monitoring, physical activity monitoring, market traffic surveys and other traffic surveys, security checkpoint/gate monitoring, traffic light activation and other device activation such as security cameras, and other monitoring applications. Smartmat may be portable or permanently installed.
U.S. Pat. Application No. 20070032242 of Goodman, et al. (hereinafter Goodman) disclosed methods and apparatus for providing statistics on the number, distribution and/or flow of people or devices in a geographic region based on active wireless device counts. Wireless devices may be of different types, e.g., cell phones, PDAs, etc. Wireless communications centers report the number and type of active devices in the geographic region serviced by the wireless communications center and/or indicate the number of devices entering/leaving the serviced region. The active wireless device information is correlated to one or more targeted geographical areas. Population counts are extrapolated from the device information for the targeted geographic areas. Traffic and/or flow information is generated from changes in the device counts or population estimates over time and/or from information on the number of active devices entering/leaving a region. Reports may include predictions of crowd population characteristics based on information about the types and/or number of different wireless devices being used.
Hashimoto and Brendley use special sensors (distance measuring and pressure mat sensors, respectively) placed in a designated space, so that they can count the number of people passing and, in the case of Brendley, classify the kind of traffic, whereas in the present invention the visual sensor based technology can measure not only the amount of traffic, but also the direction of the traffic, and on wider areas.
Goodman introduces using the tracking of active wireless devices, such as mobile phones or PDAs, so that the people carrying these devices can be detected and tracked. The crowd estimation method of the present invention can measure the crowd traffic without any requirement of the people carrying certain devices, and without introducing potential bias toward business people or bias against seniors or children.
Gonzales-Banos, Yakobi, and Ramaswamy detect and count the number of people in a scene by processing video frames to detect people. One of the exemplary embodiments of the present invention utilizes top-down view cameras so that person detection and tracking can be carried out effectively, where an individual person in the crowd is being tracked so that both the crowd density and direction can be estimated. These prior inventions don't concern the crowd directions.
Paragios measures the pattern of crowd motion without explicitly detecting or tracking people. One of the exemplary embodiments of the present invention also makes use of such crowd dynamics estimation, however, it is a part of the comprehensive system where the goal is to extrapolate the sampled viewership measurement based on the crowd dynamics.
There have been prior attempts for estimating a motion vector field based on video image sequences.
U.S. Pat. No. 5,574,663 of Ozcelik, et al. (hereinafter Ozcelik) disclosed a method and apparatus for regenerating a dense motion vector field, which describes the motion between two temporally adjacent frames of a video sequence, utilizing a previous dense motion vector field. In this method, a spatial DVF (dense motion vector field) and a temporal DVF are determined and summed to provide a DVF prediction. This method and apparatus enables a dense motion vector field to be used in the encoding and decoding process of a video sequence. This is very important since a dense motion vector field provides a much higher quality prediction of the current frame as compared to the standard block matching motion estimation techniques. The problem to date with utilizing a dense motion vector field is that the information contained in a dense motion field is too large to transmit. The invention eliminates the need to transmit any motion information.
U.S. Pat. No. 6,400,830 of Christian, et al. (hereinafter Christian) disclosed a technique for tracking objects through a series of images. In one embodiment, the technique is realized by obtaining at least first and second representations of a plurality of pixels, wherein at least one grouping of substantially adjacent pixels has been identified in each of the first and second representations. Each identified grouping of substantially adjacent pixels in the first representation is then matched with an identified grouping of substantially adjacent pixels in the second representation.
U.S. Pat. No. 6,944,227 of Bober, et al. (hereinafter Bober) disclosed a method and apparatus for representing motion in a sequence of digitized images, which derives a dense motion vector field where vector quantizes the motion vector field.
The present invention makes use of a motion field computed in a similar manner as described in these prior inventions. The motion field computation at each floor position can be a dense optical flow computation as disclosed in Ozcelik or Bober. It can also be an object tracking as disclosed in Christian, so that the dense motion field can be computed using the motion trajectories of multiple objects.
There have been prior attempts for representing the pattern of motion.
U.S. Pat. No. 6,535,620 of Wildes, et al. (hereinafter Wildes) disclosed an invention, which is embodied in a method for representing and analyzing spatiotemporal data in order to make qualitative yet semantically meaningful distinctions among various regions of the data at an early processing stage. In one embodiment of the invention, successive frames of image data are analyzed to classify spatiotemporal regions as being stationary, exhibiting coherent motion, exhibiting incoherent motion, exhibiting scintillation and so lacking in structure as to not support further inference. The exemplary method includes filtering the image data in a spatiotemporal plane to identify regions that exhibit various spatiotemporal characteristics. The output data provided by these filters is then used to classify the data.
U.S. Pat. No. 6,806,705 of van Muiswinkel, et al. (hereinafter van Muiswinkel) disclosed an imaging method for imaging a subject including fibrous or anisotropic structures and includes acquiring a 3-dimensional apparent diffusion tensor map of a region with some anisotropic structures. The apparent diffusion tensor at a voxel is processed to obtain Eigenvectors and Eigenvalues. A 3-dimensional fiber representation is extracted using the Eigenvectors and Eigenvalues. During the extracting, voxels are locally interpolated in at least a selected dimension in a vicinity of the fiber representation. The interpolating includes weighting the voxels by a parameter indicative of a local anisotropy. The interpolating results in a 3-dimensional fiber representation having a higher tracking accuracy and representation resolution than the acquired tensor map.
In Wildes, the image motion is estimated and represented using plurality of spatiotemporal filter banks. In van Muiswinkel, the 3-dimensional structure is represented as diffusion tensor map. The present invention makes use of similar tensorial (but 2×2 tensor) representation of crowd motion, where the motion anisotropy is computed using the Eigenvalues of the motion tensor in the same way.
There have been prior attempts for learning a general mapping based on available training data.
U.S. Pat. No. 5,682,465 of Kil, et al. (hereinafter Kil) disclosed a function approximation method, which is based on nonparametric estimation by using a network of three layers, such as an input layer, an output layer and a hidden layer. The input and the output layers have linear activation units, while the hidden layer has nonlinear activation units, which have the characteristics of bounds and locality. The whole learning sequence is divided into two phases. The first phase estimates the number of kernel functions based on a user's requirement on the desired level of accuracy of the network, and the second phase is related to parameter estimation. In the second phase, a linear learning rule is applied between output and hidden layers, and a non-linear (piecewise-linear) learning rule is applied between hidden and input layers. Accordingly, an efficient way of function approximation is provided from the view point of the number of kernel functions as well as increased learning speed.
U.S. Pat. No. 5,950,146 of Vapnik, et al. (hereinafter Vapnik) disclosed a method for estimating a real function that describes a phenomenon occurring in a space of any dimensionality. The function is estimated by taking a series of measurements of the phenomenon being described and using those measurements to construct an expansion that has a manageable number of terms. A reduction in the number of terms is achieved by using an approximation that is defined as an expansion on kernel functions, the kernel functions forming an inner product in Hilbert space. By finding the support vectors for the measurements, one specifies the expansion functions. The number of terms in an estimation according to the invention is generally much less than the number of observations of the real world phenomenon that is being estimated.
The present invention makes use of a statistical learning method similar to Kil or Vapnik, where the input-output relation between a large number of data can be used to learn a regression function. In the present invention, the regression function is used to compute the viewership extrapolation mapping.
In summary, the present invention aims to measure the media viewership using automated and unobtrusive means that employ computer vision algorithms, which is a significant departure from methods using devices that need to be carried by a potential audience. It also provides comprehensive solutions to the sensor placement issue and data extrapolation issue, based on the site and display analysis; these features also contrast with the inventions just introducing measurement algorithms. There have been prior inventions that address the problem of sensor placement for the purpose of monitoring people's behavior, but the present invention provides an optimal solution to the issue based on the analysis of the site and the display. The present invention utilizes crowd tracking or dense motion computation, similar to some of the prior inventions, but in a way that the estimated crowd motion along with its tensorial formulation represent the collective motion of the crowd, both in spatial and temporal dimensions. The present invention employs statistical machine learning approaches, similarly to some of the prior inventions, to extrapolate the sampled viewership data to estimate the site-wide viewership data; the method of the present invention utilizes the learning approach to achieve time-dependent extrapolation of the viewership data based on the insights from the crowd and viewership analysis.
The present invention is a system and method for designing a comprehensive media audience measurement platform starting from a site, display, and crowd characterization, to a data sampling planning and a data extrapolation method.
It is one of the objectives of the first step of the processing to identify the system elements that affect the crowd and audience behaviors, so that an effective data sampling and extrapolation plan can be studied and designed around the identified parameters.
The step identifies the site, display, crowd, and audience as four major system parameters that need to be considered when designing the media audience measurement solution. The step also identifies subparameters from each of these elements that affect other system variables. The site parameters, including size, location, direction, and width of the pathways in the site, the obstacles for the passers-by, and attractions for the viewers are some of the relevant site parameters that affect the crowd behavior. The position and the orientation of the display within the site, and the size, brightness, and content of the display are the display-related parameters that affect the viewing behavior. The crowd and the audience parameters are assumed to depend on the site and the display parameters. The viewership measurement algorithm is also an important element of the system. The algorithm is assumed to be already calibrated according to the site and the display; designing or calibrating the measurement algorithm is not within the scope of the present invention.
It is one of the objectives of the second step of the processing to model the viewership in relation to the site and display for the purpose of deriving the data sampling plan. The step identifies the display parameters as one of the primary factors that directly affect how the crowd (potential viewers) responds to the display and the displayed media. The display parameters are therefore the first factors to consider when determining the specifications and placement for the sensors; the sensors should be placed to cover the locations where the most viewing is expected to occur. For a realistic data sampling scheme, sensors should also be placed so that they can capture the largest number of potential viewers. The step identifies the crowd occupancy as one of the parameters to consider when planning the sensor placement; it measures how much traffic each floor location will have. The notion of occupancy includes both the number and the moving speed of people; given a unit area in the floor space, the occupancy map represents how many people stay in that area for a given time period.
A subset of the site parameters, the direction of the crowd dynamics and attractions (potential distractions from the media display) in the site, are also crucial in determining the sensor positions and orientations when the specific method to measure the viewing depends on the direction that the viewer is facing. The step identifies the viewership relevancy measure as one of the primary factors to consider when designing a sensor placement plan.
It is one of the objectives of the third step of the processing to compute the viewership sampling map after computing the visibility map, the occupancy map, and the viewership relevancy map, and place the sensor based on the viewership sampling map.
The visibility measure is realized by the visibility map on top of the site map; it is computed based on the display parameters (the position, orientation, and the size of the display) and the visibility according to the characteristics of the human visual perception. The occupancy measure is realized by the occupancy map on top of the site map; it can be determined based on the site parameters alone or based on the actual measured traffic density. The viewership relevancy measure is realized by the viewership relevancy map; it is determined by comparing the local viewership counts from the system and the site-wide ground truth viewership counts. If a certain floor position has higher correlation between the measured viewership counts and the ground truth site-wide viewership counts, then it has higher viewership relevancy.
After the series of analysis, the visibility map, the occupancy map, and the viewership relevancy map are combined to determine the viewership sampling map. The sensors are placed and oriented, and the lens specifications are determined so that the floor area covered by the sensors can sample the viewership in an optimal manner. There may be other constraints to consider, such as physical constraints, the cost for specific placement, and the requirement to hide the sensors from public view, etc.
In one of the exemplary embodiments of the present invention, the set of sensors for measuring the viewership can also be used to measure the crowd dynamics.
In another exemplary embodiment of the present invention, a dedicated set of sensors can be used to measure the crowd dynamics. The sensors can be ceiling mounted so that they look down and potentially provide more reliable estimates of the crowd dynamics. The sensor placement can follow the same principle as the placement of the viewership measurement sensors; the placement only needs to consider the occupancy map so that the arrangement of the sensors can achieve a maximum coverage of the crowd motion.
It is one of the objectives of the fourth step of the processing to come up with a model of the viewership in relation to the crowd behavior, to estimate the total viewership from the sampled data.
The arrangement of the sensors generates certain sampled measurement of the viewership at the site. It is necessary to determine the parameters that affect the relation between the sampled measurement and the total viewership, so that one can design an extrapolation map based on these parameters. In a simplest scenario where the viewing behavior only depends on the floor position, the extrapolation map will be a simple function that is fixed over time; the function is just a constant factor that is multiplied to the measured viewership to estimate the total site-wide viewership.
In a more realistic scenario, the viewing behavior at a given floor position can change over time. The time-changing likelihood of the viewership as a function of the floor position at a given time instance is called the viewership map. The step identifies the crowd dynamics as the single most decisive variable that affects the viewing behavior and consequently the viewership map. It is assumed that other parameters, such as the site and time elements, are all reflected in the crowd dynamics. The present system formulates the extrapolation map as parameterized by the crowd dynamics. The system measures the crowd dynamics once for each time period, and determines the extrapolation map. The crowd dynamics parameter can be continuous (numerical) or discrete (categorical). Given that the crowd dynamics is the major factor in how the viewing behavior changes over time, it is important to have a proper mathematical representation of it. Because each floor position can have potentially multiple directions that the crowd frequently travels, vector field representation is not general enough. The step identifies the crowd velocity tensor field as the mathematical representation of the crowd dynamics for the purpose of extrapolating sampled measurements. At each floor position, the distribution of the crowd velocity is accumulated during the time interval, and the covariance matrix is computed. The directions of the Eigenvectors of the matrix represent the dominant directions of the crowd motion, and the magnitudes of the Eigenvalues represent the average velocities of the corresponding directions.
To be able to determine the extrapolation map using the estimated tensor field, it is necessary to understand how the changing crowd velocity tensor field affects the viewership map—how the crowd dynamics change the viewing behavior. More specifically, one needs to identify a set of features from the crowd velocity tensor field that are most relevant to encoding the relation between the crowd dynamics and the viewership map. The step identifies the motion anisotropy and the crowd speed (average speed at the floor position) as the most relevant features from the crowd dynamics. The motion anisotropy is the ratio of the two Eigenvalues of the velocity tensor, and is determined at every sampled point on the floor space. It measures the ratio of the primary speed to the secondary speed: the degree of dominance of the primary motion direction. The motion anisotropy is computed using the crowd velocity tensor at each sampled floor space.
It is one of the objectives of the fifth step of the processing to come up with a method to compute an extrapolation map as parameterized by the crowd dynamics, which extrapolates the viewership measurement in the sampled area to the viewership estimate for the whole site.
In a simplified scenario where the viewership occurs uniformly across the whole site, simply multiplying the sampled number of viewership by a constant (the ratio between the area of the sampled site and the area of the whole site) will effectively perform the extrapolation. In real scenarios, the distribution of viewership is not uniform and, from our assumption, it changes according to the crowd dynamics. Therefore, the multiplying factor should be a function of the invariant features of the crowd dynamics.
The relationship from the set of invariant features of the crowd dynamics to the extrapolation factor is estimated using the training data. The viewership data from the sampled region and the ground truth viewership measurement from the whole site are collected over some period of time. A statistical learning or regression method can be used to find the relationship from the crowd velocity anisotropy histogram to the viewership extrapolation factor.
A preferred embodiment of the closure of the present invention is illustrated in
The visibility map 244, the occupancy map 234, and the viewership relevancy map 252 are combined to determine the viewership sampling map 260. In an exemplary embodiment of the present invention, the viewership sampling map 260 can be computed by multiplying the corresponding visibility 240, occupancy 230, and viewership relevancy 250 values at each floor position. In a certain scenario, the collection of viewership measurement data 432 and the ground-truth viewership data can be so expensive that estimating the viewership relevancy map is not feasible. In this case, the viewership sampling map 260 can be determined by the visibility map 244 and the occupancy map 234. In a more limited scenario where the crowd analysis or measurement is too costly, only the visibility map 244 can be used to determine the viewership sampling map 260.
The sensors are placed, oriented, and the lens specifications are determined so that the floor area covered by the sensors can sample the viewership in an optimal manner. There may be other constraints to consider, such as physical constraints, the cost for specific placement, and the requirement to hide the sensors from public view, etc; these are captured through the sensor placement constraints 332.
In a simplified scenario where the viewership occurs uniformly across the whole site, simply multiplying the sampled number of viewership by a constant (the ratio between the sampled and the whole area) will effectively perform the extrapolation. In real scenarios, the viewership distribution is not uniform; from our assumption, it changes according to the crowd dynamics. The relationship from the set of invariant features for the crowd dynamics to the extrapolation factor is estimated using the training data. The viewership data from the sampled region and the ground truth viewership measurement from the whole site are collected over some period of time. A statistical learning or regression method can be used to find the relationship. In general, the relation between the crowd dynamics invariant feature 293 and the viewership map 265 is learned in the feature to viewership map learning 358 step. Then the learned feature to viewership map 357 is used to compute the viewership map 265, which is then used to compute the viewership extrapolation map 350. In a typical scenario, the sampled to site-wide viewership ratio 448 can be an ultimately useful quantity that is used as an extrapolation factor; it extracts the relation between areas in the viewership map into a single scalar value. The viewership extrapolation factor 352 should be a function of the invariant features of the crowd dynamics.
Once the feature to viewership map 357 has been learned off-line, the process of determining the viewership extrapolation map 350 from the crowd velocity tensor field 285 is carried out on-line using the learned feature to viewership map 357. This on-line process is called viewership extrapolation map determination 360.
While the above description contains much specificity, these should not be construed as limitations on the scope of the invention, but as exemplifications of the presently preferred embodiments thereof. Many other ramifications and variations are possible within the teachings of the invention. Thus, the scope of the invention should be determined by the appended claims and their legal equivalents, and not by the examples given.