None.
Not applicable.
None. BACKGROUND OF THE INVENTION
An increasing number of products, systems and solutions require either automated (using a processor and algorithms) or manual (man-in-the-loop) reviewing a digital images or digital video of a human face as fundamental to the solution. Whether it be high technology surveillance systems, video conference systems, consumer still and video cameras, bank Automatic Teller Machines (ATMs) or even cell phones, presenting a user or system with a high quality digital image of a human face for either manual or automated review will continue to be central to many of today's products, and many more of tomorrow's. Furthermore, in the interest of improved automated face recognition, there is a need to provide the ever-increasing number of automated facial recognition engines with facial images of improved quality, particularly across widely varying environmental conditions.
The last decade has seen explosive growth in public awareness and use of biometrics in general, and in automated facial recognition in particular. Facial recognition has several important strengths relative to competing biometrics; it is the most intuitive to review manually, it is the least intrusive, does not require physical contact to capture the biometric signal, and can be used to good effect on the large number of existing facial databases such as passport, drivers' licenses, and employee databases. While the growth of facial recognition systems will no doubt continue to expand, and may well emerge as the dominant biometric, the widespread adoption of these systems has been held back by a failure to demonstrate repeatable and highly accurate performance. Core to this deficiency is the face finding and matching algorithms' dependency on very high image quality passed from the imager to the facial recognition engine. Ironically, most of the research dollars spent on the technology is on the software that finds and matches faces, while the source of the data that feeds these sophisticated algorithms is generally off the shelf and low cost conventional video cameras that are not well suited to the task.
With current art imagers, such as CCD cameras that are ubiquitous in consumer and security markets, analog signal data for each pixel is raster shifted off the imager and serially digitized to construct a digital image. A host of imager optimization parameters such as sensor integration time (electronic shutter speed), amplifier gain (contrast), amplifier DC offset (brightness), backlight compensation, gamma (amplitude compression) and many others are selected by a local processor in accordance with pre-set constraints defined by either the manufacturer or the user. But this small number of pre-set parameters can not produce ideal face images because the camera simply can not be sufficiently preprogrammed in a cost effective way to adapt to every conceivable combination of face and surrounding environment.
With current art, for example, the grouping of the individual sensing elements may be segregated into fixed regions (such as a band along the top [sky] and band along the bottom [sand]) to achieve a prescribed compromise of the competing imaging requirements. By coupling preset imaging parameters with prescribed field of view segregation, a set of canned imager parameters may be made available to the user as user selectable modes. This affords the user the flexibility to manually select the imaging mode best suited to the anticipated subject and environment scene dynamics. For example, this is commonly seen on digital still and video cameras as Sports Mode (tuned for high speed), Portrait Mode (tuned for low speed), Stage Mode (tuned for strong overhead lighting), Ski Mode (tuned for strong lighting below faces) and others. While this technique has proven to yield an improvement over cameras without any presets, and is more convenient than manually computing and setting several parameters as in early model 35 mm cameras, it nevertheless represents a very small number of operational modes left to cope with an infinite number of challenging scenes. Furthermore, as this tradeoff is fixed in time and space (geometry of the imager), it is not able to adapt to a moving target (e.g. face). Therefore a face that is optimally imaged in one location, such as the center of the field of view, may be imaged very poorly as it moves to another location within the scene, such as to the top, bottom or sides of the field of view. Furthermore, if the imager has to cope with multiple faces occupied very different locations, a pre-set approach designed to optimize a single spatial region will not produce good face images on faces outside of that region.
The invention provides for a higher quality digital image of a human's face to be captured and forwarded for display, storage or submittal to an automated facial recognition system. This invention will produce an improvement to the overall performance of systems based on manual (e.g. human) still image and video review, particularly when there are multiple faces in the imager's field of view, and will unlock the potential of automated facial recognition systems that have been held back by sensitivities to poor image quality.
FIGURE One illustrates a functional block diagram of the device consisting of the sensor, image control module, head find module and head track module.
This invention will provide greatly improved digital still and video images specifically of faces for applications requiring manual review, automated review, or a combination of both. The invention takes advantage of a new class of digital imager with individually addressable imaging elements, such as but not limited to, Complementary Metal Oxide Semiconductor (CMOS) imagers, which are now competing with conventional Charge Couple Device (CCD) imagers that have become the de facto standard imager since solid state imagers supplanted tube based imagers. Those skilled in the art of image system design will appreciate that the premise for this invention is based on an imager with individually programmable imaging elements without regard to the imager's spectral sensitivity, imaging element density, imager size or the specific material and construction techniques used in fabricating such an imager. For the purposes of this application, CMOS imagers operating in the visible spectrum will be used as an example of such an imager.
There exist a number of important technical differences between CMOS and CCD imagers, several of which can be exploited for improved imaging of human faces. It will be shown that a means has been devised to capture multiple faces within the imager field of view simultaneously, with improved face image quality through more optimal settings of camera imaging parameters, and at a higher frame rate than conventional cameras.
The improvements in face imaging will produce face images with less motion induced blurring, reduced sensitivity to background lighting for more consistent and optimal brightness and contrast within the facial region, and the ability to preserve these improvements even as the faces move through the camera's field of view and through environments that historically have posed a challenge to contemporary imagers.
While the preferred embodiment is described herein, it is understood that one skilled in the art may derive variations and alternative configurations. In the spirit of this invention, it is assumed that concepts germane to this invention will be afforded protection. The preferred embodiment fundamentally brings together the functionality of a discrete imager (or sensor) with individually addressable imaging elements (pixels), such as a CMOS imager, a local processor, and local software capable of running basic algorithms for determining head locations and associated optimal imaging parameters. Together, these components comprise a purpose built camera ideally suited to finding a face or multiple faces within the field of view, and to making the necessary calculations and adjustments to ensure that each face is individually and optimally imaged for improved display, storage or automated recognition.
In the absence of head like object within the field of view, the camera will behave as a conventional imager (current art) and the Image Control Module will dynamically adjust the camera's imaging parameters to present the best global scene as represented by the entire field of view. This video will be forwarded to the Head Find Module, where the camera will search for face like objects using algorithms that may be applied to either a single frame of video data, or to successive frames of data. Techniques to achieve this are well understood and well represented by prior art, and may consist of motion detection, blob detection and segmentation, edge detection, head template matching, and other image processing techniques. Furthermore, a combination of these techniques may be integrated to produce a more robust and accurate head detection.
Once a head has been detected, the approximate size, location, and velocity of the head is passed on to the Head Tracking Module. Here a unique head ROI is created for each head based on the head size and location data received by the Head Find Module, and the associated ROI data is passed back to the Image Control Module so that control may be applied to the imaging parameters to produce the optimal image specifically within the aforesaid ROI. This represents an improvement over existing art, where large fixed regions within the field of view are weighted to optimize the image, without consideration for potentially smaller objects of interest (such as a head and face) whose data may not be weighted sufficiently and may overlap the fixed regions. By taking advantage of the individually addressable imaging elements, each pixel within the ROI can be optimized in accordance with the ROI's unique requirements, regardless of the ROI's size or location on the imager. Examples of imaging parameters that may be optimized in real time for the specific face ROI include, but are not limited to:
On-Chip Binning
Antiblooming
Dynamic Range Analog to Digital Convervter (ADC)
Fast Focus and Display Mode
Multiple Readout Rates
Multiple Gain Settings
Programmable Offset
Sub-Windowing for Face Region of Interest
Programmable Camera Settings
Spectral Sensitivity
The Head Tracking Module may produce an estimation of the probable position of the head ROI in the next frame of data based on the velocity data of the current and previous frames. Well-established techniques such as Kalman filters may be employed to this end, although designers should not limit themselves to conventional estimation methods. Furthermore, the Head Tracking Module may manage multiple head ROIs simultaneously. The anticipated ROI location and size for each ROI is in turn passed on to the Image Control Module. Allowances may be made within both the Head Tracking Module and the Image Control Module to account for obscuration of overlapping ROIs. Given this dynamic condition, each face within its unique head ROI, regardless of its size and position with the field of view, is simultaneously afforded the optimal setting of critical imaging parameters.
The frame rate for each ROI may exceed conventional video frame rates (30 frames/second NTSC and 25 frames/second PAL) while not exceeding standard video bandwidths. For example, if a single head ROI is instantiated that comprises 20% of the imagers pixels, the ROI can be read out at five times standard frame rate without exceeding the original bandwidth. This has appeal in dynamic engagements where a ROI is moving with sufficient velocity to induce blurring of the detected image. Increasing the ROI frame rate will reduce facial blurring and facilitate improve imaging and subsequent recognition. This technique is also attractive for slow or non-moving faces in an under lit environment. As the amount of light reaching the imagers decreases, the imager will respond by increasing the electronic amplification of the image. At this point the imager will be Johnson Noise limited, which means that the electronic amplifier noise injected into the image data as a function of the ambient temperature and signal gain dominates the image. Because the noise from frame to frame is statistically uncorrelated, it can be averaged out across multiple frames. This technique may be applied to successive frames of a ROI where the facial data is relatively static, but the image data is dominated by noise. Averaging across several successive frames will suppress the noise while not tainting the facial image data, thereby producing a more noise free image that will produce higher subsequent recognition.
Finally, knowledge of the location, speed and direction of each ROI may be exploited in the subsequent recognition. For example, once an identity has been associated with a ROI with a sufficiently high accuracy, the size of the database searched may be adjusted downward in the interest of reducing processing time and improving matching accuracy.
This application is based on, claims the benefit of the filing date of, and incorporates by reference, the provisional patent application Ser. No. 60/555,063 filed on Apr. 5, 2004.
Number | Date | Country | |
---|---|---|---|
60555063 | Mar 2004 | US |