Imaging sensors are light sensitive devices that convert light into electrical charges. Imaging sensors used in most digital imaging systems are charge-coupled devices (CCD) or complementary metal oxide semiconductor (CMOS) devices. These devices are typically composed of an array of light sensitive diodes called pixels that convert photons into electrons when exposed to light. The electrical charge that accumulates at each pixel on the imaging sensor array is proportional to the brightness of the light that reaches the pixel during the period of exposure. An electrical signal output from the device represents the collection of charges from the pixels and is used to generate an image.
Commercially available CCD and CMOS sensors have a limited range of light levels they can linearly convert into image pixels. This linear range can be adjusted by altering combinations of image acquisition property settings of image acquisition hardware, such as a video camera. Acquisition property settings comprise those properties that alter the acquisition hardware and, thus, the linear range of the image acquisition system. The properties include, but are not limited to, exposure time, gain, offset, contrast, and brightness. Acquisition property settings also comprise properties controlling the light levels in the scene. Such properties include, but are not limited to, controlling the illumination provided by fixed lights, strobe lights, or structured lighting. Light levels outside the linear range are clamped or clipped to the minimum or maximum electrical signal.
For example,
It should be noted that the conversion of photons to an electrical signal is inherently linear in silicon based sensors. The conversion of photons to an electrical signal in other sensor technologies may not be linear. The remainder of this application assumes the conversion is linear to make the description of the invention clearer.
Auto-exposure is a process for dynamically setting the acquisition property settings according to the pixel intensity values of one or more recently acquired images. In other words, auto-exposure is the automated process of choosing the correct combination of exposure setting, gain setting, offset and other property settings to yield an optimal linear range for the machine vision application. The purpose of auto-exposure is to improve the quality of acquired images and, in turn, to improve one or more aspects of the performance of the machine vision application using the acquired images. Auto-exposure often improves the accuracy, robustness, reliability, speed, and/or usability of a machine vision application. Traditional auto-exposure techniques include the various methods for performing auto-exposure known to those skilled in the art.
Traditional auto-exposure techniques are implemented directly in the image acquisition hardware or software. Such techniques involve image analysis of one or more acquired images and subsequent adjustment of the acquisition property settings of the image acquisition hardware to yield the optimal linear range.
The image analysis typically involves computation of a histogram that represents the distribution of pixel values found within fixed portions of one or more acquired images. The fixed portions may include the entire image or a set of pixel blocks containing only part of the image. The pixel blocks are fixed in dimension and location within each acquired image. The computed histogram is then compared with a model histogram. Based on the differences between the computed and model histograms, the acquisition property settings of the image acquisition hardware are adjusted so that the computer histogram matches the desired pixel distribution of the model histogram.
By applying such traditional auto-exposure techniques against entire images or fixed pixel blocks, the acquisition property settings, and thus the linear range of differentiable pixel values, are adjusted to obtain an optimal level of grayscale information from subsequently acquired images. Changes in ambient lighting conditions, object reflectivity, and moving objects within the scene can cause the distribution of pixels represented in the computed histogram of pixel values to deviate from the model histogram. The auto-exposure process then re-adjusts the acquisition property settings to bring the computed histograms of subsequent acquisitions in line with the model histogram. When the computed histogram is performed on entire images or fixed pixel blocks, it is optimizing the acquisition property settings for those sets of pixels. As a consequence, machine vision applications that attempt to locate or detect particular objects with imaging characteristics that differ from the rest of the scene and/or that are not a dominant portion of the scene, can suffer due to a lack of sufficient grayscale information for the objects of interest.
For example, a particular machine vision application is described in U.S. patent application Ser. No. 10/702,059, filed Nov. 5, 2003, the entire teachings of which are incorporated herein by reference. Specifically, a three-dimensional (3D) imaging system is described that detects people passing through a doorway or other portal. According to one embodiment, the system employs a pair of video cameras having image sensors that capture images about a portal, such as a revolving door, a sliding door, a swinging door, or a man trap. These captured images are then used to generate a 3D model of the scene that can be analyzed to detect people attempting to pass through the portal. A person is identified by 3D features within the 3D model that correspond to a head and shoulder profile.
This system can be deployed at a main entrance doorway or other monitored portal exit where ambient light conditions in the form of sunlight can change throughout the course of a day. Such changes in lighting conditions can adversely affect the clarity of the images captured through the image sensors. For example, the acquired images may be substantially black when captured at midnight or substantially white when captured at midday. Shadows may also appear about the portal scene. Such lighting effects can cause the distribution of pixel values within the scene to be skewed either too dark or too bright, resulting in the linear range being set too large or too small, respectively. As a consequence, the 3D model of the scene generated by the 3D imaging system from the acquired 2D images can lack sufficient grayscale information to locate the desired head and shoulder profiles of the people candidates.
In contrast, the present invention is a system and method of auto-exposure control for image acquisition hardware using 3D information to identify a region of interest within an acquired 2D image upon which to apply traditional auto-exposure techniques. Specifically, the invention includes (i) acquiring 2D images of a scene from one or more cameras having an initial image acquisition property setting; (ii) detecting 3D features from the 2D images; (iii) locating a region of interest from the 3D features; and (iv) determining a next acquisition property setting from brightness levels within the region of interest to apply to the one or more cameras. The region of interest and the acquisition property setting can be adjusted as the 3D features are tracked across further 2D images. The region of interest can include 3D features that correspond to a human or other vertebrate body part, such as a head and shoulders profile.
By performing auto-exposure analysis over the region of interest as determined from 3D analysis of the acquired images, the acquisition property settings can be assigned such that the light levels within the region of interest fall within the linear range, producing sufficient grayscale information for identifying particular objects and profiles in subsequently acquired images. For example, in a machine vision application that detects people passing through a doorway, the region of interest can be the portion of the 2D image that generates 3D features of a head and shoulders profile within a 3D model of the doorway scene. With higher quality images, more accurate detection of people candidates within the monitored scene results.
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of preferred embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention.
A description of preferred embodiments of the invention follows.
Machine vision applications are directly affected by a sensor's ability to linearly represent the light levels of the objects of interest in the scene being acquired. Ideally, the linear range is adjusted so its dark end includes the light levels from the darkest objects of interest and its bright end includes the light levels from the brightest objects of interest.
For many applications, where the light level is static, the linear range can be adjusted once by configuring the acquisition property settings of the image acquisition hardware and then used indefinitely. For other applications, where the ambient light level is not static or the reflectivity of the objects of interest is changing, the linear range must be adjusted dynamically. The most difficult applications are those with changing ambient light levels, changing reflectivity of the objects of interest, and where the objects of interest are moving in the scene.
Traditional auto-exposure techniques determine the acquisition property settings based on an image analysis of the distribution of pixel values within the entire image or a fixed set of pixel blocks that contain only a portion of the image.
The use of the entire image or fixed pixel blocks causes the acquisition property settings to optimize the linear range for those sets of pixels. As a consequence, machine vision applications that attempt to locate or detect particular objects within the scene with imaging characteristics that differ from the rest of the scene and/or that are not a dominant portion of the scene, can suffer due to a lack of sufficient grayscale information for the objects of interest. If the set linear range is too small or too large for the objects of interest, then the greyscale information for the objects of interest is not optimal.
The present invention is a system and method of auto-exposure control for image acquisition hardware using 3D information to identify a region of interest within an acquired 2D image upon which to apply traditional auto-exposure techniques. By limiting auto-exposure analysis to the region of interest, the acquisition property settings can be assigned such that the light levels within the region of interest only fall within the linear range, producing differentiable and optimal grayscale information in subsequent acquired images. For example, in a machine vision application that detects people passing through a doorway, the region of interest can be the portion of the 2D image that generates 3D features of a head and shoulders profile within a 3D model of the doorway scene as shown in
Specifically, the invention includes (i) acquiring 2D images of a scene from one or more cameras having an initial set of image acquisition property settings; (ii) detecting 3D features from the 2D images; (iii) locating a region(s) of interest in the 2D images from the 3D features; and (iv) determining a next acquisition property setting from brightness levels within the region of interest to apply to the one or more cameras. The region of interest can include 3D features that correspond to a profile of a human or other vertebrate body part, such as a head and shoulders profile. The region of interest and the acquisition property settings can be adjusted as the 3D features of the head and shoulders profile are tracked across further 2D images.
As previously mentioned, one or more cameras may be used to acquire the 2D images of a scene from which 3D information can be extracted. According to one embodiment, multiple video cameras operating in stereo may be used to acquire 2D image captures of the scene. In another embodiment, a single camera may be used, including stereo cameras and so-called “time of flight” sensor cameras that are able to automatically generate 3D models of a scene. In still another embodiment, a single moving camera may be used to acquire 2D images of a scene from which 3D information may be extracted. In still another embodiment, a single camera with optical elements, such as prisms and/or mirrors, may be used to generate multiple views for extraction of 3D information. Other types of cameras know to those skilled in the art may also be utilized.
The sensor device 100 is generally comprised of a one- or two-dimensional (1D or 2D) array of wells 102. The purpose of each well is to convert the light (i.e., photons) striking the well into an electrical signal (i.e., volts). The two most widely used and available sensor technologies today are Charge Coupled Devices (CCDs) and Complementary Metal Oxide Semiconductors (CMOS). The details of each are readily available from the sensor and/or camera manufacturers.
The exposure setting 104 on the sensor device determines the duration of time each well collects photons. There is a tradeoff in choosing an exposure setting. A longer exposure setting allows more photons to be collected in each well thereby increasing the signal strength and, in turn, the signal to noise ratio. The downside to long exposure settings is motion in the scene being acquired will appear blurry and bright objects in the scene may cause the wells to overflow. This latter condition is called saturation.
The offset stage 110 allows an application to adjust the light level by an offset 112 that will correspond to a digital signal level of 0.
The gain stage 120 allows adjustment of the voltage range by a gain setting 122 applied to the A/D converter 130. The gain stage is normally adjusted such that the objects of interest in the acquired image have pixels values less than 255.
The A/D converter (block 130) is typically designed such that a 0.0 volt input results in a digital signal output of 0 and a 1.0 volt input results in the maximum digital signal output. For example, a 1.0 volt input to an 8 bit A/D converter would result in a digital signal output of 255. An 8 bit A/D converter will be assumed from hereon.
At 210, a set of initial acquisition property settings are chosen. The initial settings can be chosen in a number of ways. The initial settings can be default values determined and stored at system setup time or determined using traditional auto-exposure techniques.
At 220, the image acquisition hardware is updated with the acquisition property settings computed at 210, or 270. Details of this step depend on the specifics of the acquisition hardware used (e.g., sensors/surrounding circuitry, cameras, frame grabbers, and lights).
At 230, a new set of 2D images is acquired for 3D feature extraction. The acquisition process is hardware dependent.
At 240, 3D features are computed from the 2D images previously acquired. The process of computing 3D features from 2D images can be implemented by a number of techniques known to those skilled in the art. For example, U.S. patent application Ser. No. 10/702,059, filed Nov. 5, 2003 discloses a method for generating a 3D model of a scene from multiple, simultaneously acquired 2D images of the scene.
At 250, 3D features of interest are extracted from the complete set of 3D features. The goal in this step is to extract or segment the 3D features of interest from everything else in the 3D model of the scene. This feature extraction step can be done using a combination of 2D and 3D information. Feature extraction techniques include, but are not limited to, discrimination based on distance from the sensor(s), color, grayscale, position, shape, or topology. In an exemplary embodiment, the 3D contour of shapes is the discriminator. For example, in the U.S. patent application Ser. No. 10/702,059, filed Nov. 5, 2003, entitled “METHOD AND SYSTEM FOR ENHANCED PORTAL SECURITY THROUGH STEREOSCOPY,” the 3D features of interest include one or more combinations of 3D features that each represent a head and shoulder profile of a people candidate.
If there are no 3D features of interest found at 250, the process reverts back to using one of the traditional auto-exposure techniques of using the entire 2D image(s) or a fixed set of pixel blocks at 280. The condition where no features are found arises when there are no 3D features of interest present or when there are 3D features of interest present in the scene but undetectable in the acquired images because of incorrect acquisition property settings.
According to one embodiment, when no 3D features of interest are found, the entire image is used as the basis for performing auto-exposure. Other embodiments could handle the lack of 3D features of interest by other means. Below are examples of alternative strategies:
The best strategy depends on the application and, in particular, the variability of the ambient light, the object(s) of interest in the scene, and the robustness/aggressiveness of the feature computer/extractor.
Conversely, if 3D features of interest are found at 250, the process continues at 260. At 260 the 3D features of interest found at 250 are used to choose a region of interest (ROI) or multiple ROI in the two-dimensional (2D) acquired images or in the rectified images. The ROI(s) specify the set of pixels to be used for computing the acquisition property settings. The simplest technique is to define a ROI that contains the pixels corresponding to the 3D features of interest extracted at 250.
However, different machine vision libraries and applications have different ROI requirements/constraints. For example, some applications require ROI's be rectangular and aligned with the acquired image pixel grid. For such applications, one can define a ROI to be the bounding box of the pixels corresponding to a 3D feature. Other applications may choose to use morphology operations to alter the shape or size of the area of the pixels corresponding to the extracted features.
According to one embodiment, a single region of interest (ROI) is computed at 260 based on the topography found in the 3D model of the scene in the form of a depth image. For example, the region of interest (ROI) can be in the form of a bitmap containing “care” and “don't care” pixel values. The care pixels in the ROI specify the pixels in the 2D acquired image that are used for performing auto-exposure.
At 270 auto-exposure techniques are performed on the one or more regions of interest determined at 260 or 280. The result of step 270 is a new set of acquisition property settings yielding an optimal linear range for the objects of interest in the scene.
By applying traditional auto-exposure algorithms to a dynamic region of interest that depends on the location of 3D features of interest, the auto-exposure algorithms can determine the optimal range that will provide the best quality 2D images with respect to the portions of the image that include the 3D features of interest.
At 300, the current region of interest (ROI) is used to determine the set of pixels in the acquired image to be analyzed.
At 310, the set of pixels identified at 300 are analyzed using histogram analysis to generate a histogram of the set of pixels in the current region of interest as shown in
Referring back to
At 330, the acquisition property settings of the acquisition hardware are adjusted with the acquisition property settings computed at 320 so that the computed histogram of the next acquired image substantially match the model histogram (assuming no changes in the scene).
Embodiments of the invention may be applied to a number of machine vision applications in many industries, including: semiconductors, electronics, pharmaceuticals, automotive, healthcare, packaging, consumer products, and high speed inspection of materials such as steel, paper, nonwovens, and security. For example, a particular application of the invention includes auto-exposure control using three dimensional information for a people-sensing door security system, which is described in U.S. patent application Ser. No. 10/702,059, filed Nov. 5, 2003, entitled “METHOD AND SYSTEM FOR ENHANCED PORTAL SECURITY THROUGH STEREOSCOPY,” the entire teachings of the above application are incorporated herein by reference. Although specific reference is made to a door security system, embodiments of the invention may be applied to any type of portal security system including those without doors.
Automated and manual security portals provide controlled access to restricted areas. Security portals are usually equipped with card access systems, biometric access systems, or other systems for validating a person's authorization to enter restricted areas. Examples of automated security portals include revolving doors, mantraps, sliding doors, and swinging doors. A typical security issue associated with most access controlled portal security systems is that when one person obtains valid access, an unauthorized person may bypass the validation security by “piggybacking” or “tailgating.”
The '059 Application discloses a 3D imaging system for detecting and responding to such breaches of security. Specifically, the '059 Application discloses a portal security system that provides enhanced portal security through stereoscopy, such as a stereo door sensor. The stereo door sensor detects portal access events and optionally prevents access violations, such as piggybacking and tailgating. The stereo door sensor is a video based people sensor that generates three dimensional models from plural two dimensional images of a portal scene and further detects and tracks people candidates moving through a target volume within the model.
Embodiments of the present invention can improve the performance of the stereo door sensor by using 3D information to dynamically identify a region of interest within an acquired 2D image upon which to apply traditional auto-exposure techniques. By performing auto-exposure analysis over the region of interest, the acquisition property settings can be assigned such that the light levels within the region of interest fall within the linear range, producing differentiable grayscale information in subsequently acquired images. For example, in stereo door sensor, the region of interest can be the portion of the 2D image that generated 3D features of a head and shoulders profile within a 3D model of the doorway scene.
Because the revolving door may be installed at a main entrance, the lighting conditions may result in the capture of bright and dark portions within the scene image. Thus, by identifying a region of interest that includes substantially the head and shoulder profile of a people candidate, auto-exposure processing can set the acquisition property settings of the acquisition hardware to provide an optimal linear range for that region and thus sufficient grayscale information for more accurate detection of people candidates.
The sensor 600 preferably includes an image rectifier 620. Ideally, the image planes of the cameras 610a, 610b are coplanar such that a common scene point can be located in a common row, or epipolar line, in both image planes. However, due to differences in camera alignment and lens distortion, the image planes are not ideally coplanar. The image rectifier 620 transforms captured images into rectified coplanar images in order to obtain a virtually ideal image planes. The use of image rectification transforms are well known in the art for coplanar alignment of camera images for stereoscopy applications. Calibration of the image rectification transform is preferably performed during assembly of the sensor.
For information on camera calibration, refer to R. Y. Tsai, “A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses,” IEEE J. Robotics and Automation, vol. 3, no. 4, pp. 323-344 (hereinafter the “Tsai publication”), the entire contents of which are incorporated herein by reference. Also, refer to Z. Zhang, “A Flexible New Technique for Camera Calibration,” Technical Report MSR-TR-98-71, MICROSOFT Research, MICROSOFT CORPORATION, pp 1-22 (Mar. 25, 1999) (hereinafter the “Zhang publication”), the entire contents of which are incorporated herein by reference.
A 3D image generator 630 generates 3D models of scenes surrounding a door from pairs of rectified images. In particular, the 3D image generator 630 can generate a three dimensional model in 3D world coordinates such that the model accurately represents the image points in a real 3D space.
A target volume filter 640 receives a 3D model of a door scene and clips all 3D image points outside the target volume. The target volume can be a fixed volume or dynamically variable volume. According to one embodiment, the dynamic target volume depends on a door position, or angle. The door position, or angle, is received by a door position transform 650 that converts the encoder value into a door position (angle) value. This angle value is provided to the target volume filter 640, which rotates the target volume by the phase value. According to another embodiment, the target volume is static volume and an identity transform can be used in place of the door position transform. Any image points within the 3D model that fall within the target volume are forwarded to a people candidate detector 670.
In an another embodiment, the filter 640 may receive the rectified 2D images of the field of view, clip the images so as to limit the field of view, and then the clipped images to the 3D image generator 630 to generate a 3D model that corresponds directly to a target volume.
The people candidate detector 670 can perform multi-resolution 3D processing such that each 3D image point within the target volume is initially processed at low resolution to determine a potential set of people candidates. From that set of people candidates, further processing of the corresponding 3D image points are performed at higher resolution to confirm the initial set of people candidates within the target volume. Some of the candidates identified during low resolution processing may be discarded during high resolution processing.
The positions of the confirmed candidates are then transferred to an auto-exposure controller 680 where the locations of the candidates are used to define one or more regions of interest upon which to perform auto-exposure processing. For example, some applications require ROI's be rectangular and aligned with the pixel grid. For such applications, one can define a ROI to be the bounding box of the pixels corresponding to a 3D feature. Other applications may choose to use morphology operations to alter the shape or size of the area of the pixels corresponding to the extracted features. The auto-exposure controller 680 can then perform traditional auto-exposure techniques using the regions of interest. Alternatively, the controller 680 may use anticipated locations of the confirmed candidates as determined by a tracking algorithm in the tracker 660.
The scoring module 690 is used in a process for determining fuzzy set membership scores, also referred to as confidence scores, that indicate a confidence level that there is zero, one or more people in the target volume.
Field Calibration of 3D World Coordinate System
In order to generate the three dimensional models from the captured two dimensional images, a 3D coordinate system in world coordinates is preferred. With a 3D world coordinate system, objects are transformed in a space relative to the door instead of the camera. For more details regarding a process for calibrating a 3D world coordinate system from a 3D camera coordinate system, refer to U.S. patent application Ser. No. 10/702,059, filed Nov. 5, 2003, entitled “METHOD AND SYSTEM FOR ENHANCED PORTAL SECURITY THROUGH STEREOSCOPY,” the entire teachings of the above application are incorporated herein by reference.
Defining a Target Volume
Rather than analyze the entire 3D model of the scene about the portal, a smaller version of the scene, or target volume, can be defined that excludes unnecessary elements from the analysis, such as floors, walls and other area that are not of interest. For more information regarding a technique for defining a target volume, refer to U.S. patent application Ser. No. 10/702,059, filed Nov. 5, 2003, entitled “METHOD AND SYSTEM FOR ENHANCED PORTAL SECURITY THROUGH STEREOSCOPY,” the entire teachings of the above application are incorporated herein by reference.
Portal Access Event Detection
At 700, two dimensional images (e.g. right and left images) of a door scene are captured by cameras 610a, 610b. One of these cameras is designated the reference camera, and an image from the reference camera is the reference image.
At 710, the 2D images from cameras 610a, 610b are rectified by applying an image rectification transform that corrects for alignment and lens distortion, resulting in virtually coplanar images. Rectification can be performed by using standard image rectification transforms known in the art. In a preferred embodiment, the image rectification transform is implemented as a lookup table through which pixels of a raw image are transformed into pixels of a rectified image.
At 720, the 2D image points from the reference image (XR, YR) are matched to corresponding 2D image points in the non-reference image (XL, YL). By rectifying the images, reference image points (XR, YR) are matched to non-reference image points (XL, YL) along the same row, or epipolar line. Matching can be performed through known techniques in the art, such as in T. Kanade et al, “A Stereo Machine for Video-rate Dense Depth Mapping and Its New Applications,” Proc. IEEE Computer Vision and Pattern Recognition (CVPR), pp. 196-202, (1996), the entire contents of which are incorporated herein by reference.
At 730, a set of disparities D corresponding to the matched image points is computed relative to the reference image points (XR, YR), resulting in a disparity map (XR, YR, D), also called the depth map or the depth image. The disparity map contains a corresponding disparity ‘d’ for each reference image point (xR, yR). By rectifying the images, each disparity ‘d’ corresponds to a shift in the x-direction.
At 740, a three dimensional model of the door scene is generated in 3D world coordinates. In one embodiment, the three dimensional scene is first generated in 3D camera coordinates (XC, YC, ZC) from the disparity map (XR, YR, D) and intrinsic parameters of the reference camera geometry. The 3D camera coordinates (XC, YC, ZC) for each image point are then converted into 3D world coordinates (XW, YW, ZW) by applying the coordinate system transform.
At 750, the target volume can be dynamically adjusted and image points outside the target volume are clipped. For example, in the case of revolving doors, the target volume is rotated according to a door position. The 3D world coordinates of the door scene (XW, YW, ZW) that fall outside the 3D world coordinates of target volume are clipped. In a particular embodiment, clipping can be effectively performed by setting the disparity value ‘d’ to zero for each image points (xR, yR) whose corresponding 3D world coordinates fall outside the target volume, resulting in a filtered disparity map “filtered (XR, YR, D)”. A disparity value that is equal to zero is considered invalid. The filtered disparity map is provided as input to a multi-resolution people segmentation process commencing at 760.
At 760, coarse segmentation is performed for identifying people candidates within the target volume. According to one embodiment, coarse segmentation includes generating a topological profile of the target volume from a low resolution view of the filtered disparity map. Peaks within the topological profile are identified as potential people candidates. A particular embodiment for performing coarse segmentation is illustrated in
At 770, fine segmentation is performed for confirming or discarding people candidates identified during course segmentation. According to one embodiment, the filtered disparity map is analyzed within localized areas at full resolution. The localized areas correspond to the locations of the people candidates identified during the coarse segmentation process. In particular, the fine segmentation process attempts to detect head and shoulder profiles within three dimensional volumes generated from the localized areas of the disparity map. A particular embodiment for performing fine segmentation is illustrated in
At 780, the validated people candidates are tracked across multiple frames to determine access events, such as a piggyback violation, a single person event, or an ambiguous event. For example, the validated people candidates can be tracked using a fuzzy/confidence level scoring algorithm over a series of video image frames. The people candidates may also be tracked according to a trajectory tracking algorithm. For more information regarding methods for tracking validated people candidates across multiple image frames, refer to U.S. patent application Ser. No. 10/702,059, filed Nov. 5, 2003, entitled “METHOD AND SYSTEM FOR ENHANCED PORTAL SECURITY THROUGH STEREOSCOPY,” the entire teachings of the above application are incorporated herein by reference.
At 790, the locations of the people candidates as determined at either 660, 670 or 680 are then transferred to the auto-exposure controller 680 where the locations are used to define one or more regions of interest upon which to perform auto-exposure processing as described in
Coarse Segmentation of People Candidates
At 800, the filtered disparity map is segmented into bins. For example, in
At 810 of
In a particular embodiment, a mean disparity value dM for a particular bin can be calculated by generating a histogram of all of the disparities DBIN in the bin having points (XBIN, YBIN). Excluding the bin points in which the disparities are equal to zero and thus invalid, a normalized mean disparity value dM is calculated. The normalized mean disparity dM is assigned to a point in the low resolution disparity map for that bin.
At 820 of
The extent of the peak is determined by traversing points in every direction, checking the disparity values at each point, and stopping in a direction when the disparity values start to rise. After determining the extent of the first peak, the process repeats for any remaining points in the low resolution map that have not been traversed.
For example, in
At 830 of
At 840 of
Fine Segmentation of People Candidates
At 900, a two dimensional head template is generated having a size relative to the disparity of one of the coarse candidates. Disparity corresponds indirectly to height such that as disparity increases, the distance from the camera decreases, and thus the height of the person increases. For example,
The dimensions of the head template 975 are based on the coarse location of the candidate (e.g., xR1, yR1), the mean disparity value (e.g., dM1), and known dimensions of a standard head (e.g. 20 cm in diameter, 10 cm in radius). For example, to compute the dimensions of the head template, the position of the head is computed in 3D world coordinates (X, Y, Z) from the calculated coarse location and a mean disparity value using the factory data (e.g., intrinsic parameters of camera geometry) and field calibration data (e.g., camera to world coordinate system transform). Next, consider another point in the world coordinate system which is (X+10 cm, Y, Z) and compute the position of the point in the rectified image space (e.g., xR1, yR2) which is the image space in which all the image coordinates are maintained. The length of the vector defined by (xR1, yR1) and (xR2, yR2) corresponds to the radius of the circular model for the head template 975.
Furthermore, each point within the area of the resulting head template 975 is assigned the mean disparity value (e.g., dM1) determined for that candidate. Points outside the head template 975 are assigned an invalid disparity value equal to zero.
At 910 of
The template matching is repeated, for example, by positioning the template 970 to other areas such that the center of the head template 975 corresponds to locations about the original coarse location of the candidate (e.g., xR1, yR1). A fine location for the candidate (xF1, yF1) is obtained from the position of the head template 975 at which the best template score was obtained.
At 920, another mean disparity value dF1 is computed from the points of the filtered disparity map within the head template 975 centered at the fine candidate location (xF1, yF1). In a particular embodiment, the mean disparity value dF1 can be calculated by generating a histogram of all the disparities of the filtered disparity map that fall within the head template. Excluding the points in which the disparities are equal to zero and thus invalid, the normalized mean disparity value dF1 is calculated.
At 930, people candidates are discarded for lack of coverage by analyzing the disparities that fall within the head template which is fixed at the fine head location. For example, it is known that disparity corresponds to the height of an object. Thus, a histogram of a person's head is expected to have a distribution, or coverage, of disparities that is centered at a particular disparity tapering downward. If the resulting histogram generated at 920 does not conform to such a distribution, it is likely that the candidate is not a person and the candidate is discarded for lack of coverage.
At 940, the process determines whether there are more coarse candidates to process. If so, the process returns to 900 to analyze the next candidate. Otherwise, the process continues at 950.
At 950, people candidates having head locations that overlap with head locations of other people candidates are discarded. In a particular embodiment, the head locations of all of the people candidates are converted from the filtered disparity map into their corresponding 3D world coordinates. People candidates whose head locations overlap with the head locations of other people candidates result in at least one of the candidates being discarded. Preferably, the candidate corresponding to a shorter head location is discarded, because the candidate likely corresponds to a neck, shoulder, or other object other than a person.
At 960, the one or more resulting fine head locations (e.g., xF1, yF1) of the validated people candidates and the corresponding mean disparity values (e.g., dF1) are forwarded for further processing to determine portal access events, such as a piggyback violation or a single person event. The head locations can also be sent to the auto-exposure controller where the locations are used to define one or more regions of interest upon which to perform auto-exposure processing.
While this invention has been particularly shown and described with references to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.
This application is a continuation-in-part of U.S. application Ser. No. 10/702,059, filed Nov. 5, 2003. The entire teachings of the above application are incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
3727034 | Pope | Apr 1973 | A |
3728481 | Froehlich et al. | Apr 1973 | A |
3779178 | Riseley, Jr. | Dec 1973 | A |
3852592 | Scoville et al. | Dec 1974 | A |
4000400 | Elder | Dec 1976 | A |
4303851 | Mottier | Dec 1981 | A |
4382255 | Pretini | May 1983 | A |
4570181 | Yamamura | Feb 1986 | A |
4782384 | Tucker et al. | Nov 1988 | A |
4799243 | Zepke | Jan 1989 | A |
4814884 | Johnson et al. | Mar 1989 | A |
4823010 | Kornbrekke et al. | Apr 1989 | A |
4847485 | Koelsch | Jul 1989 | A |
4931864 | Kawamura et al. | Jun 1990 | A |
4967083 | Kornbrekke et al. | Oct 1990 | A |
4970653 | Kenue | Nov 1990 | A |
4998209 | Vuichard et al. | Mar 1991 | A |
5049997 | Arai | Sep 1991 | A |
5075864 | Sakai | Dec 1991 | A |
5097454 | Schwarz et al. | Mar 1992 | A |
5142152 | Boiucaner | Aug 1992 | A |
5146340 | Dickerson et al. | Sep 1992 | A |
5196929 | Miyasaka | Mar 1993 | A |
5201906 | Schwarz et al. | Apr 1993 | A |
5208750 | Kurami et al. | May 1993 | A |
5280359 | Mimura et al. | Jan 1994 | A |
5282045 | Mimura et al. | Jan 1994 | A |
5301115 | Nouso | Apr 1994 | A |
5333011 | Thompson et al. | Jul 1994 | A |
5353058 | Takei | Oct 1994 | A |
5387768 | Izard et al. | Feb 1995 | A |
5392091 | Iwasaki | Feb 1995 | A |
5432712 | Chan | Jul 1995 | A |
5455685 | Mori | Oct 1995 | A |
5512974 | Abe et al. | Apr 1996 | A |
5519784 | Vermeulen et al. | May 1996 | A |
5529138 | Shaw et al. | Jun 1996 | A |
5541706 | Goto | Jul 1996 | A |
5552823 | Kageyama | Sep 1996 | A |
5555312 | Shima et al. | Sep 1996 | A |
5559551 | Sakamoto et al. | Sep 1996 | A |
5565918 | Homma et al. | Oct 1996 | A |
5581250 | Khvilivitzky | Dec 1996 | A |
5581625 | Connell et al. | Dec 1996 | A |
5625415 | Ueno et al. | Apr 1997 | A |
5642106 | Hancock et al. | Jun 1997 | A |
5706355 | Raboisson et al. | Jan 1998 | A |
5866887 | Hashimoto et al. | Feb 1999 | A |
5880782 | Koyanagi et al. | Mar 1999 | A |
5881171 | Kinjo | Mar 1999 | A |
5917937 | Szeliski et al. | Jun 1999 | A |
5949481 | Sekine et al. | Sep 1999 | A |
5959670 | Tamura et al. | Sep 1999 | A |
5961571 | Gorr et al. | Oct 1999 | A |
5995649 | Marugame | Nov 1999 | A |
6014167 | Suito et al. | Jan 2000 | A |
6028626 | Aviv | Feb 2000 | A |
6031934 | Ahmad et al. | Feb 2000 | A |
6081619 | Hashimoto et al. | Jun 2000 | A |
6118484 | Yokota et al. | Sep 2000 | A |
6167200 | Yamaguchi et al. | Dec 2000 | A |
6173070 | Michael et al. | Jan 2001 | B1 |
6195102 | McNeil et al. | Feb 2001 | B1 |
6205233 | Morley et al. | Mar 2001 | B1 |
6205242 | Onoguchi et al. | Mar 2001 | B1 |
6215898 | Woodfill et al. | Apr 2001 | B1 |
6226396 | Marugame et al. | May 2001 | B1 |
6295367 | Crabtree et al. | Sep 2001 | B1 |
6297844 | Schatz et al. | Oct 2001 | B1 |
6301440 | Bolle et al. | Oct 2001 | B1 |
6307951 | Tanigawa et al. | Oct 2001 | B1 |
6308644 | Diaz | Oct 2001 | B1 |
6345105 | Nitta et al. | Feb 2002 | B1 |
6362875 | Burkley | Mar 2002 | B1 |
6370262 | Kawabata | Apr 2002 | B1 |
6408109 | Silver et al. | Jun 2002 | B1 |
6469734 | Nichani et al. | Oct 2002 | B1 |
6496204 | Nakamura | Dec 2002 | B1 |
6496220 | Landert et al. | Dec 2002 | B2 |
6516147 | Whiteside | Feb 2003 | B2 |
6678394 | Nichani | Jan 2004 | B1 |
6690354 | Sze | Feb 2004 | B2 |
6701005 | Nichani | Mar 2004 | B1 |
6710770 | Tomasi et al. | Mar 2004 | B2 |
6720874 | Fufido et al. | Apr 2004 | B2 |
6734904 | Boon et al. | May 2004 | B1 |
6756910 | Ohba et al. | Jun 2004 | B2 |
6791461 | Oku et al. | Sep 2004 | B2 |
6829371 | Nichani et al. | Dec 2004 | B1 |
6914599 | Rowe et al. | Jul 2005 | B1 |
6919549 | Bamji et al. | Jul 2005 | B2 |
6940545 | Ray et al. | Sep 2005 | B1 |
6963661 | Hattori et al. | Nov 2005 | B1 |
6970199 | Venturino et al. | Nov 2005 | B2 |
6980251 | Tamura et al. | Dec 2005 | B1 |
6999600 | Venetianer et al. | Feb 2006 | B2 |
7003136 | Harville | Feb 2006 | B1 |
7034881 | Hyodo et al. | Apr 2006 | B1 |
7042492 | Spinelli | May 2006 | B2 |
7058204 | Hildreth et al. | Jun 2006 | B2 |
7088236 | Sorensen | Aug 2006 | B2 |
7110569 | Brodsky et al. | Sep 2006 | B2 |
7146028 | Lestideau | Dec 2006 | B2 |
7260241 | Fukuhara et al. | Aug 2007 | B2 |
7358994 | Yano | Apr 2008 | B2 |
7382895 | Bramblet et al. | Jun 2008 | B2 |
7471846 | Steinberg et al. | Dec 2008 | B2 |
7538801 | Hu et al. | May 2009 | B2 |
7663689 | Marks | Feb 2010 | B2 |
20010010731 | Miyatake et al. | Aug 2001 | A1 |
20010030689 | Spinelli | Oct 2001 | A1 |
20020039135 | Heyden | Apr 2002 | A1 |
20020039137 | Harper et al. | Apr 2002 | A1 |
20020041698 | Ito et al. | Apr 2002 | A1 |
20020113862 | Center et al. | Aug 2002 | A1 |
20020118113 | Oku et al. | Aug 2002 | A1 |
20020118114 | Ohba et al. | Aug 2002 | A1 |
20020135483 | Merheim et al. | Sep 2002 | A1 |
20020145667 | Horiuchi | Oct 2002 | A1 |
20020150308 | Nakamura | Oct 2002 | A1 |
20020191819 | Hashimoto et al. | Dec 2002 | A1 |
20030053660 | Heyden | Mar 2003 | A1 |
20030067551 | Venturino et al. | Apr 2003 | A1 |
20030071199 | Esping et al. | Apr 2003 | A1 |
20030164892 | Shiraishi et al. | Sep 2003 | A1 |
20030184673 | Skow | Oct 2003 | A1 |
20040017929 | Bramblet et al. | Jan 2004 | A1 |
20040036596 | Heffner et al. | Feb 2004 | A1 |
20040045339 | Nichani et al. | Mar 2004 | A1 |
20040061781 | Fennell et al. | Apr 2004 | A1 |
20040109059 | Kawakita | Jun 2004 | A1 |
20040153671 | Schuyler et al. | Aug 2004 | A1 |
20040201730 | Tamura | Oct 2004 | A1 |
20040218784 | Nichani et al. | Nov 2004 | A1 |
20050074140 | Grasso et al. | Apr 2005 | A1 |
20050088536 | Ikeda | Apr 2005 | A1 |
20050105765 | Han et al. | May 2005 | A1 |
20050157204 | Marks | Jul 2005 | A1 |
20050163345 | van den Bergen et al. | Jul 2005 | A1 |
20050249382 | Schwab et al. | Nov 2005 | A1 |
20060139453 | Spinelli | Jun 2006 | A1 |
20090110058 | Shen | Apr 2009 | A1 |
Number | Date | Country |
---|---|---|
19709799 | Oct 1997 | DE |
0 847 030 | Jun 1998 | EP |
0 847 030 | Dec 1999 | EP |
1035510 | Sep 2000 | EP |
0 706 062 | May 2001 | EP |
0 817 123 | Sep 2001 | EP |
1313321 | May 2003 | EP |
WO-9631047 | Oct 1996 | WO |
WO-9638820 | Dec 1996 | WO |
WO-9808208 | Feb 1998 | WO |
WO-0175809 | Oct 2001 | WO |
WO-0248971 | Jun 2002 | WO |
WO-02095692 | Nov 2002 | WO |
WO-2004023782 | Mar 2004 | WO |
WO-2006067222 | Jun 2006 | WO |
Number | Date | Country | |
---|---|---|---|
Parent | 10702059 | Nov 2003 | US |
Child | 11019931 | US |