The invention relates generally to biometric systems, and more particularly to a system and method for biometric authentication via face recognition.
Biometrics may be defined as measurable physiological or behavioral characteristics of an individual useful in verifying or authenticating an identity of the individual for a particular application. Biometrics is increasingly being used as a security tool and authentication tool for industrial and commercial activities, such as credit card transactions, network firewalls, or perimeter security. For example, applications include authentication at restricted entries or secure systems on the Internet, hospitals, banks, government facilities, airports, and so forth.
Existing biometric authentication techniques include fingerprint verification, hand geometry measurement, voice recognition, retinal scanning, iris scanning, signature verification, and facial recognition. Unfortunately, these authentication techniques have a variety of limitations, inaccuracies, and so forth. For example, existing fingerprint verification systems may not recognize a valid fingerprint if dirt, oils, cuts, blood, or other impurities are disposed on the finger and/or the reader. By further example, hand geometry verification systems generally require a large scanner, which may not be feasible for some applications. Implementation of voice recognition is difficult because of variants such as environmental acoustics, microphone quality, and temperament of the individual. Furthermore, voice recognition systems have difficult and time-consuming training processes, while also requiring large space for template storage. One drawback with retinal scanning is that the individual must look directly into the retinal reader. It is also inconvenient for an individual having eyeglasses, because the individual must remove their eyeglasses for a retinal scan. Another problem associated with retinal scanning is that the individual must focus at a given point for performing the scan. Failure to focus correctly reduces the accuracy of the scan. While signature verification has proved to be relatively accurate, it is obtrusive for the individual. Regarding facial recognition systems, existing authentication techniques have primarily focused on matching two static images of the individual. Unfortunately, these facial recognition systems are relatively inconsistent and inaccurate due to variances in the facial pose or angle relative to the camera.
In addition to the various drawbacks noted above, all of these existing biometric authentication techniques require an individual to actively engage the particular system, thereby making the existing authentication systems inconvenient, time consuming, and effective only for restricted points of entry or passage. In other words, existing authentication systems are unworkable for passive monitoring or delocalized security checks, because the individual could simply walk by the authentication device. Without a means for capturing the necessary fingerprint, hand configuration (e.g., all fingers spread out and palm down), retinal scan, verbal phrase (e.g., “my name is John Smith”), signature, or facial pose (e.g., front and center), these authentication systems will be unable to perform their function.
In certain applications, it may be desirable to have passive monitoring and delocalized security checks, because these functions may detect unauthorized activities that would not otherwise be detectable by an authentication system at a point of entry or passage. For example, if an individual does not consent to being authenticated at a point of entry or passage, then the individual may simply bypass the localized authentication system and subsequently act as they desire.
Therefore, there is a need for a system and method that can passively identify individuals for purposes of monitoring, security, and so forth.
According to one aspect of the present technique, a system and method of face recognition is provided. The method includes capturing an image including a face and registering features of the image to fit with a model face to generate a registered model face. The registered model face is then transformed to a desired orientation to generate a transformed model face. The transformed model face is then compared against a plurality of stored images to identify a number of likely candidates for the face. In addition, the face recognition process may be performed passively.
In accordance with another aspect of the present technique, a surveillance system for identifying a person is provided. The system includes one or more imaging devices, each of which is operable to capture at least one image of the person including a face to generate a captured image. A face registration module included in the system fits the captured image to a model face to generate a registered model face. A face transformation module transforms the registered model face into a transformed model face with a desired orientation. A face recognition module identifies at least one likely candidate from a plurality of stored images based on the transformed model face. The imaging devices may capture the images even without any active cooperation from the person.
In accordance with another aspect of the present technique, a method of providing security is provided. The method includes providing imaging devices in a plurality of areas through which individuals pass. The imaging devices obtain facial images of each of the individuals. The method further includes providing a face recognition system, which recognizes an individual having the facial images by iteratively and cumulatively identifying candidates for each of the facial images.
These and other advantages and features will be more readily understood from the following detailed description of preferred embodiments of the invention that is provided with the accompanying drawings.
Referring generally to
The illustrated facial recognition system 10 also includes one or more communication modules 16 disposed in the facility 12, and optionally at a remote location, to transmit still images or video signals to a monitoring unit 18. As discussed in further detail below, the monitoring unit 18 processes the still images or video signals to perform face recognition of individuals 20 traveling about different locations within the facility 12. In certain embodiments of the facial recognition system 10, the communication modules 16 include wired or wireless networks, which communicatively link the imaging devices 14 to the monitoring unit 18. For example, the communication modules 16 may operate via telephone lines, cable lines, Ethernet lines, optical lines, satellite communications, radio frequency (RF) communications, and so forth. Moreover, embodiments of the monitoring unit 18 may be disposed locally at the facility 12 or remotely at another facility, such as a security monitoring company or station.
The monitoring unit 18 includes a variety of software and hardware for performing facial recognition of individuals 20 entering and traveling about the facility 12. For example, the monitoring unit 18 can include file servers, application servers, web servers, disk servers, database servers, transaction servers, telnet servers, proxy servers, mail servers, list servers, groupware servers, File Transfer Protocol (FTP) servers, fax servers, audio/video servers, LAN servers, DNS servers, firewalls, and so forth. As shown in
In operation, each imaging device 14 may acquire a series of facial images, e.g., at different poses or facial angles, as the individual 20 approaches, leaves, or generally passes by the respective imaging device 14. Advantageously, these facial images are acquired passively or, in other words, without any active participation from the individual 20. In turn, the one or more processors 26 process the acquired facial images, register the acquired facial images to an appropriate model face, transform the acquired/registered facial images to a desired pose (e.g., a front pose), and perform facial recognition on the acquired/registered/transformed facial images to identify one or more likely individuals stored in the database 22. The foregoing process may be repeated for a series of facial images, such that each iteration narrows the list of likely individuals from all the images stored in the database 22. In one embodiment, each facial image acquired by the camera 14 may capture a different portion, angle, or pose of the individual 20, such that iterative processing of these facial images produces a cumulatively more accurate facial recognition of that particular individual 20. In this manner, the facial recognition system 10 can passively track and identify the individuals 20 for purposes of security access among other reasons. In certain embodiments, appropriate authorities can be alerted of unauthorized entry or passage by certain individuals 20 through the various portions of the facility 12 if image information of such certain individuals 20 is pre-stored in the database 22.
When an individual 20 is enrolled into the facial recognition system 10, a complete model face is formed and stored in the database 22 for that individual 20. During enrollment, one or more facial images of each individual are recorded or acquired by an imaging device 14, for example, a video device such as a still or video camera. In certain embodiments, the recorded facial image is a full three-dimensional facial scan of the individual. For each individual 20 in the databases 22, the system locates and stores a set of ‘k’ fiducial points corresponding to certain facial features, such as the comers of the eyes, the tip of the nose, the outline of the nose, the ends of the lips, beginning and end of the eyebrows, facial outline, and so forth. Each of these k fiducial points has three-dimensional coordinates on the facial image in each captured image of the individual 20. Furthermore, the system may identify and store information on the position of each fiducial point with respect to a reference point, such as a centroid, a lowest point, or a topmost point of the facial image. In addition, the system may store other information associated with each of the k fiducial points. For example, the system may store an intensity value, such as a grayscale value or an RGB (red-green-blue) value corresponding to specific facial features and locations on the image.
In certain embodiments, the set of fiducial points (k) is represented as a vector Vi, which is a one-dimensional matrix of the k fiducial points for the ith image acquired. In one embodiment, the vector Vi is referenced to the centroid of the individual's facial image, where the centroid of the image may be computed by adding all the coordinates of the k fiducial points and dividing by the number of fiducial points (k). For a given vector Vi, a three-dimensional mesh may be plotted based on the k fiducial points represented by the vector Vi. The three-dimensional mesh is created by joining all the fiducial points k in the vector Vi. Therefore, each triangular surface formed by three points in the vector Vi in the three-dimensional mesh, defines a three-dimensional planar patch. Therefore, the three-dimensional mesh defines the three-dimensional appearance or structure of the face based on the plurality of three-dimensional patches. It may be noted that appearance of the face may include the grayscale, RGB, or color values corresponding to each location on the face. Also, each of the three-dimensional planar patches may be associated with a reference point, such as the mid-point of the planar patch, and an average grayscale, RGB, or color value.
Based on these vectors Vi for each individual 20 entered into the database 22, the system cumulatively processes these vectors Vi to create a facial model representative of all individuals 20 in the database 22. By utilizing a suitable generative modeling technique, such as Principal Component Analysis (PCA), a set of vectors Vi is used to create a low-dimensional subspace of independent variables, principal components, or model parameters that define the features of the images. PCA is a statistical method for analysis of factors that reduces the large dimensionality of the data space (observed variables) to a smaller intrinsic dimensionality of feature space (independent variables) that describes the features of the image. In other words, PCA can be utilized to predict the features, remove redundant variants, extract relevant features, compress data, and so forth. For example, the independent variables or model parameters may be defined as X, which is the low-dimensional representation of the plurality of vectors Vi for individuals 20 stored in the database 22. Thus, PCA provides the model parameters X, which define the appearance of the face of the individual 20. These model parameters X are constrained to the features of the face of the individual 20, thereby providing a focused model face. In this manner, a model face is created for all individuals 20 stored in the database 22. When a new face is found, that face can be fitted to the PCA space for generating a feature vector V that allows manipulation of the model face. Other modeling techniques that can be used include Independent Component Analysis, Hierarchical Factor Analysis, Principal Factors Analysis, Confirmatory Factor Analysis, neural networks, and so forth.
Referring generally to
The process 48 then proceeds to register the image to an initial model face (block 52). For example, the process 48 may match positions of certain facial features of the image with corresponding positions on the model face. The process 48 continues by transforming the image to a desired location (e.g., focal distance) and a desired pose (block 54). For example, the process 48 may transform the orientation and geometry of the registered model face from the first focal distance and first pose to the desired focal distance and desired pose, e.g., a centered frontal view of the individual's face. The first focal distance and the first pose may be the focal distance of individual 20 from imaging device 14, and the pose angle of the face of individual 20 with respect to imaging device 14 when the image was captured.
By further example of block 54, the captured facial image of individual 20 may be warped or twisted to produce a synthetic optimal view of the individual's face using the registered model face and the desired focal distance and pose information. Generation of the synthetic optimal view may be facilitated by suitable warping techniques. Warping produces a desired orientation in the synthetic optimal view by mapping pixel locations of the model face to a desired view, such as a frontal view. Transformation may facilitate comparison of the captured facial image with those available in the database. More specifically, the processes of registration and transformation normalize the captured image so that various parameters associated with the captured image become compatible or comparable with the images/models stored in the database 22.
Turning now to block 56 of
If the number of likely candidates (n) is not one at block 62, an optional new image of the individual 20 may be captured and utilized for further processing (block 64). Based on the new model face and optional facial image, the process 48 repeats the acts of registering the image to the new model face at block 52, transforming the registered image to the new model face at block 54, comparing the transformed image against the stored images at block 56 (e.g., stored images of the likely candidates (n) from the previous iteration of process 48), and identifying a new number of likely candidates (n) at block 58. The process 48 continues by creating another new model face based on the new number of likely candidates (n) (block 60). Preferably, the new number of likely candidates (n) is less than the previous number of likely candidates (n). Again, if the new number of likely candidates (n) is not equal to one, then the process 48 optionally proceeds by acquiring another new face image. In turn, the process 48 repeats the acts of registering, transforming, comparing, and identifying at blocks 52, 54, 56, 58, and 60 respectively.
This iterative and cumulate improvement of the model face and reduction of the number of likely candidates (n) continues until a single likely candidate is identified at block 66. In each iteration, the process 48 improves the model face based on a smaller number of likely candidates (n), which have facial features closer to those of the individual 20 actually having the captured face. In other words, each iteration of the process 48 eliminates unlikely candidates and focuses the model face on the most likely candidates (n), thereby making the model face resemble the individual 20 more accurately. As a result of this improvement, the comparison (block 56) between the model face and the number of likely candidates eliminates more unlikely candidates who no longer resemble the model face. Eventually, the process 48 converges onto the single likely candidate (n=1) at block 66.
Turning now to
After assuming the average parameters at block 68, the process 52 continues by generating an appearance vector using the current image and the model face with current parameters (block 70). In other words, the captured facial image is fitted onto the initial model face by adjusting the model parameters X to provide the appearance vector. The process 52 then proceeds by updating the model parameters based on an analysis of the appearance vector (block 72). The model face, which is parameterized on X, is effectively a generative structural model. For a given set of values, the three-dimensional structure of the face can be synthesized. Once a three-dimensional structure of the face is generated, the frontal view of the individual 20 in a normalized coordinate system is computed.
The process 52 then proceeds by evaluating whether the parameters have changed or are different from the model face for the appearance vector (block 74). In one embodiment, a residual function may be defined that is minimal for desired values of X. The residual function may be generated by computing Euclidean distance between the appearance vectors based on the appearance model. In a different embodiment, a PCA space for normalized frontal views is computed. The synthesized frontal view is then projected onto the appearance model based on X. The difference between the projected synthesized frontal view and the synthesized frontal view are the residuals. These will be small for desirable values of X. In other words, if a set of V vectors that are used to generate the model space for X, are restricted, the freedom of X also is restricted, which facilitates a more constrained and accurate fitting process. For example, the appearance vector of the updated model face is compared with the appearance vector of the captured face image. If the parameters are different, then the process 52 continues by repeating the acts of generating the appearance vector at block 70 and updating the model parameters at block 72 until there is no difference between the parameters of the model face and the captured facial image. When no differences remain, the process 52 has successively registered the captured image with the model face to produce a registered model face or a registered image 76.
Referring now to
While the invention has been described in detail in connection with only a limited number of embodiments, it should be readily understood that the invention is not limited to such disclosed embodiments. Rather, the invention can be modified to incorporate any number of variations, alterations, substitutions or equivalent arrangements not heretofore described, but which are commensurate with the spirit and scope of the invention. Additionally, while various embodiments of the invention have been described, it is to be understood that aspects of the invention may include only some of the described embodiments. Accordingly, the invention is not to be seen as limited by the foregoing description, but is only limited by the scope of the appended claims.