The following relates to the biometric identification arts, object identification arts, security clearance and admittance arts, one-to-many matching arts, and related arts.
One-to-many matching refers generally to the problem of determining whether a person or object is a member of a defined set of persons or objects. Such matching problems arise in diverse applications relating to security clearance, toll parking, invitation-only events, and the like. For example, a biometric identification system acquires a biometric signature of a “query” person requesting admission to a secure area (or attempting to log onto a computer with biometric identification security, or so forth). The biometric signature may, for example, be a feature vector representation of one or more face images, or of an electronically acquired fingerprint, an of optical eye scan, an of electronically recorded handwritten signature, or so forth. The biometric signature of the query person is compared with stored biometric signatures of all authorized persons. If a match is found, then the query person is admitted (or logged into the computer, or so forth).
As another example, a parking lot may be reserved for only authorized vehicles. Such a situation arises in a pre-pay parking lot serving customers who pay a monthly parking fee, or in the case of an employee-only parking lot, or so forth. In this case, the object signature may suitably be a feature vector derived from an image of the vehicle license plate, which is acquired by a camera at a toll gate. The feature vector is compared with a database of feature vectors representing license plate images of authorized vehicles, and the vehicle is admitted if its plate image feature vector matches the feature vector of any plate image in the database. In a variant approach, an image of the vehicle as a whole, or a portion of the vehicle, may be the source of the feature vector that is used as the signature.
Yet another example of a one-to-many matching system is a credit card scanner, which scans a credit card for its number (its “signature”) and compares this signature of the query credit card with all credit card numbers in the database—if no match is found then the card is declined.
One difficulty with one-to-many matching systems is scalability. As the number of authorized persons or objects increases, the size of the database storing the signatures of the authorized persons or objects increases, while processing efficiency degrades. If the number of authorized persons or objects is denoted by N, then the authorized signatures database size, and hence the search time for searching that database, scales with N.
Besides scalability, privacy is another concern with one-to-many matching systems. If the signatures are considered to be sensitive data, then the storage of the authorized signatures in the signatures database presents a possible security issue. Signatures such as fingerprints, credit card numbers, and so forth are generally considered to be sensitive data.
One way to address both scalability and privacy concerns is to employ a less informative signature. For example, a feature vector can be made smaller, with fewer features extracted from the image, so that a smaller signature can be stored. Privacy is enhanced by the reduced information contained in this smaller signature, but search time continues to scale with N. Moreover, the amount of information contained in the stored signature cannot be reduced too much by this technique, as removal of too much information makes the signature ineffective for unambiguously identifying the authorized person or object.
Disclosed in the following are improved data mining techniques that provide various benefits as disclosed herein.
In some embodiments disclosed herein, an authentication system is disclosed, including an authenticator comprising an electronic data processing device configured to perform an authentication process to determine whether a person or object to be authenticated is a member of a set of authorized persons or objects. The authentication process includes the operations of: acquiring a query signature comprising a vector whose elements store values of an ordered set of features for the person or object to be authenticated; computing an inner product of the query signature and an aggregate signature comprising a vector whose elements store values of the ordered set of features for the set of authorized persons or objects; and determining whether the person or object to be authenticated is a member of the set of authorized persons or objects based on the inner product of the query signature and the aggregate signature.
The authentication system of the preceding paragraph may further include an authenticator training component comprising an electronic data processing device configured to generate the aggregate signature representing the set of authorized persons or objects by operations including: generating a set of authorized signatures by acquiring a signature for each authorized person or object comprising a vector whose elements store values of the ordered set of features for that authorized person or object; and determining the aggregate signature to set an inner product of each authorized signature and the aggregate signature to a target inner product value.
In some embodiments disclosed herein, an authentication method is disclosed for determining whether a person or object to be authenticated is a member of a set of authorized persons or objects. The authentication method comprises: acquiring a query signature comprising a vector whose elements store values of an ordered set of features for the person or object to be authenticated; comparing the query signature and an aggregate signature comprising a vector whose elements store values of the ordered set of features for the set of authorized persons or objects; and determining whether the person or object to be authenticated is a member of the set of authorized persons or objects based on the comparison. The comparing operation may comprise computing an inner product of the query signature and the aggregate signature, with the determining being based on the inner product.
In some embodiments disclosed herein, a non-transitory storage medium stores instructions readable and executable by an electronic data processing device to perform an authentication method to determine whether a person or object to be authenticated is a member of a set of authorized persons or objects. The authentication method comprises: operating a camera or biometric sensor to acquire data on the person or object to be authenticated; extracting from the acquired data a query signature comprising a vector whose elements store values of an ordered set of features for the person or object to be authenticated; computing an inner product of the query signature and an aggregate signature comprising a vector whose elements store values of the ordered set of features for the set of authorized persons or objects; and determining whether the person or object to be authenticated is a member of the set of authorized persons or objects based on the inner product of the query signature and the aggregate signature.
The term “signature” as used herein denotes a representation of a person or object, in which the signature comprises values of an ordered set of features, which may be suitably represented as a vector in which each vector element stores the value of a corresponding feature of the ordered set of features. For example, the signature may be image features of an image of a salient aspect of the person or object (for example, a portrait image of a person's face, or a license plate image of the license plate of a vehicle), or the values of the sixteen digits of a sixteen-digit credit card number of a credit card for a credit card object, or values of quantitative features of an electronically recorded human fingerprint, or so forth.
The term “authentication” as used herein denotes the operation of determining whether the signature of a (query) person or object can be matched with any signature in a set of signatures representing a set of authorized persons or objects.
Disclosed herein are authentication systems that operate by aggregating the signatures of a set of authorized persons or objects into a single signature, referred to herein as an “aggregate signature”. In illustrative approaches the aggregate signature is constructed using linear aggregations, and the generation of the aggregate signature entails learning an optimal set of weights. During the authentication phase, similarity of the signature of a query person or object with the aggregate signature is suitably computed with a single dot product (i.e. inner product) between the query signature and the aggregate signature. The use of an aggregate signature provides benefits including: reduced storage (only one signature is stored to represent the entire set of authorized persons or objects); efficiency of the authentication system (assessing the query signature reduces to computing a single dot product); and privacy (signatures of individual authorized persons or objects are not stored independently at the authentication system; rather only the aggregate signature need be stored at the authentication system).
Two illustrative approaches are disclosed for learning weights of the aggregate signature: (1) a non-discriminative approach based on Generalized Max Pooling (see Murray and Perronnin, “Generalized Max Pooling” in CVPR (2014); and (2) a discriminative approach based on minimizing the training empirical loss on a classification task. In a variant embodiment, as both approaches are complementary they can be combined to yield improved results.
With reference to
With continuing reference to
The signature acquisition system 20 is applied to generate an authorized signature for each authorized person or object of the set of authorized persons or objects. These authorized signatures then form a set of authorized signatures 30. In a conventional authentication approach, this set of authorized signatures 30 would be used directly to authenticate a (query) person or object, by comparing the (query) signature of the query person or object against each authorized signature—if any match is found, the query person or object is deemed authenticated; otherwise, the query person or object is deemed not authenticated. As already discussed, this approach has some disadvantages. It requires sufficient data storage to store the entire set of authorized signatures 30. Extensive authentication processing is required as each authorized signature must be compared individually until a match is found, or until all authorized signatures have been compared and it is concluded the query signature is not authorized. Still further, storage of the set of authorized signatures 30 presents a possible privacy or data security issue if the signatures are considered to be personal information or sensitive data.
In embodiments disclosed herein, the set of authorized signatures 30 is aggregated by the authenticator training component 6 in order to generate the aggregate signature 10, which is then used for subsequent authentication operations performed by the authenticator 8. This alleviates the aforementioned disadvantages: storage requirements are reduced as only the single aggregate signature 10 is stored at the authenticator 8; processing time is vastly reduced as only a single signature comparison is performed; and privacy concerns are alleviated because the aggregate signature 10 is not uniquely associated with any particular authorized person or object. More particularly, in the illustrative embodiment of
As diagrammatically indicated in
The generated query signature 12 is then compared with the aggregate signature 10 using a suitably programmed electronic data processing device 42. In illustrative
With reference to
The authentication task is, however, not a classification problem. Rather, the authentication task entails identifying whether a query feature vector (i.e. signature) of a query person or object matches any authorized signature of a set of authorized signatures. In
There is therefore no reason, in general, to expect that the set of authorized signatures of an authentication task will fall within a simply connected region analogous to illustrative class region Cn of a classification problem, and more generally it is not apparent that a single vector in the feature space might be useful in performing authentication (or, more generally, one-to-many matching) entailing matching of a query signature (or, more generally, query feature vector) with one of a set of authorized signatures (more generally, a set of feature vectors any one of which is to be matched in the one-to-many matching problem) that are widely distributed through the feature space.
With reference to
In the example of
The various electronic data processing devices 26, 32, 42 of the authentication system of
By way of further illustration, in the case of the vehicle access control system of
In the case of the computer log-in system of
The disclosed authentication techniques may also be embodied as a non-transitory storage medium storing instructions executable by one or more computers and/or other electronic data processing device(s) 26, 32, 42 to perform the disclosed data processing operations in conjunction with data acquisition components 22, 24. The non-transitory storage medium may, for example, be a hard disk or other magnetic storage medium, or a FLASH memory or other electronic storage medium, or an optical disk or other optical storage medium, various combinations thereof, or so forth.
In the following, some more specific embodiments are described as non-limiting illustrative examples. The following notation is used in these examples. The number of authorized signatures in the set of authorized signatures 30 is denoted as K, and the set of authorized signatures 30 is written as
={s1, s2, . . . , sK}. Given a query feature vector qε
d extracted from the query image (or from biometric data, or so forth) corresponding to the query signature 12 of
={s1, s2, . . . , sK} corresponding to the set of authorized signatures 30 of
d→{0,1} that predicts whether the query q is relevant to (i.e. a member of) the set
or not. Note that F will not have access to the original set
after it has been learned.
A function F is defined through the composition of two functions: a real-valued function ƒ:d→
and a quantizing function σ:
→{0,1}, such as that F(q)=(σ∘ƒ)(q)=σ(ƒ(q)). Function ƒ gives a measure of similarity or distance between the query signature q and the set of authorized signatures
, while function σ transforms that measure into a final decision, usually through a thresholding. In illustrative
In the following, two different approaches are disclosed that define and optimize an ƒ function. The first one is a non-discriminative approach based on Generalized Max Pooling. The second one is a discriminative approach based on minimizing the empirical loss on the training set. Both approaches can be complementary, and an approach for using them together is also disclosed.
The generalized max pooling approach is based on the following idea: it is desired for the dot-product similarity between a single signature in the set and the aggregate signature to return a constant value c. Given a new (unknown, i.e. query) signature, if its dot-product with the aggregate signature is close to c, then this indicates the signature belongs to the set (i.e. should be authorized). However, if the dot-product is significantly different from c, then the signature likely does not belong to the set (i.e., should not be authorized). Such a solution is more likely to produce false-positives than false negatives.
The Generalized Max Pooling (GMP) approach computes a set representation sgmp to which each member of the set is equally similar (where similarity is measured by the dot product), that is:
snTsgmp=c, for n=1, . . . ,K (1)
The set representation sgmp suitably corresponds to the aggregate signature 10, elsewhere denoted herein as sagg. The choice of the constant c is arbitrary, and may be conveniently set to unity (1). With c=1 Expression (1) may be written in matrix form as:
STsgmp=1K (2)
where S represents the matrix of column vectors si and 1K denotes the K-dimensional vector of all ones. Expression (2) is a linear system of K equations with D unknowns. In general, this system might not have a solution (e.g. when D<K) or might have an infinite number of solutions (e.g. when D>K). To accommodate this, Expression (2) can be recast as a least-squares regression problem and solved according to:
It is beneficial to add a regularization term to obtain a stable solution. Introducing sλgmp, the regularized GMP becomes:
This is a ridge regression problem whose solution is:
sλgmp=(SST+λI)−1S1K (5)
where I in Expression (5) is the identity matrix. The regularization parameter λ should be cross-validated. By construction, the similarity between any si and sλgmp is approximately equal to 1. Therefore, set membership is suitably determined by defining ƒ(q)=q·sλgmp=qTsλgmp, and the thresholding function σ is suitably:
In this case, the decision as to whether query signature q belongs with the set of authorized signatures , i.e. is authenticated, is expressed as a thresholding operation, as per Expression (6). Varying the threshold τ controls the ratio between the true positive rate and the false positive rate.
The GMP in this context may be considered as a weighted linear aggregation. The regularized GMP sλgmp is the solution to Expression (4). Consequently, according to the representer theorem, sλgmp can be written as a linear combination of the encodings: sλgmp=Σi=1K αisi=Sαλ, where αλ is the vector of weights. Therefore GMP can be viewed as an instance of weighted aggregation. By introducing s=Sα in the GMP objective of Expression (4):
Denoting by =STS the K×K Gram matrix of vector-to-vector similarities, the following is obtained:
which admits the solution αλ=(+λIK)−11K.
Note that for λ very large we have αλ≈K/λ, i.e. equal weights for all si, resulting in standard sum aggregation. (As the constant factor c is arbitrarily set to 1 in Equation (1), the set of equal weights 1K/λ can also be arbitrarily set to 1K, as for sum aggregation). Note also that, if the set of signatures forms an orthonormal basis, i.e
=IK, then again we are back to standard sum aggregation. In view of this, in experiments reported herein the disclosed approach is compared to the sum-aggregation baseline.
As another illustrative example, the use of empirical loss minimization for learning ƒ is described. By way of motivation, it is noted that the GMP approach has two potential shortcomings. First, it entails computing the GMP weights for each set, and if the set changes (for example one adds or removes elements), the GMP weights need to be recomputed from scratch using the individual signatures. Second, the weights are learned using only the elements on the set. Although this can be convenient because no extra learning data is needed, if such extra negative data were available the method would not exploit it: GMP does not ensure that elements that do not belong to the set are not given scores close to 1 (in this example where c=1 is chosen; more generally, close to c where the choice of the constant c is arbitrary).
The illustrative empirical loss minimization approach to learn ƒ leverages extra training data and facilitates modifying the set contents. Assume availability of N training samples, Xεd×N, where xi is the i-th training sample. In general, X contains both the target samples contained in
as well as a set of negative samples not contained in
. The samples that form the target set are labeled with y=1, while the remaining samples are labeled with y=0. The labels are collected into a vector Yε{0,1}N and with yi the i-th label.
In general, we are interested in finding an ƒ* that minimizes the training empirical loss:
where Ω is a regularization function and λ controls its weight. In what follows, we consider a linear function ƒ parameterized with vector w, i.e, ƒ(q;w)=qTw. In the case where Ω(ƒ)=g(∥ƒ∥) with g:[0,∞)→ is strictly monotically increasing, according to the representer theorem, it is known that w is a linear combination of the training samples xi.
In what follows, we focus on the case of the square (i.e. quadratic) loss because it leads to an efficient closed form formula:
l(ƒ(xi),yi)=l(xiTw,yi)=(xiTw−yi)2 (10)
If we consider an l2 regularizer over w, Expression (9) is rewritten in matrix form as:
This is a ridge regression problem and w has a closed form solution:
w=(XXT+λI)−1XY (12)
This has similarities with the solution of the GMP problem, although they emerge from optimizing two different problems.
In Expression (12), the matrix w now consists of a label-independent part (XXT+λI)−1 and a label-dependent part XY. The first part, (XXT+λI)−1, does not require the labels of the data. In fact, it can be approximated with an “external” dataset of unlabeled signatures, with no need for set labels indicating whether or not the signatures are in the set of authorized signatures 30. It can then be reused when modifying the sets. This means that one only needs to know at training time the set of “positive” signatures, that is, the set of authorized signatures 30. The second part, XY, is simply the sum of the elements that form the set :
Therefore updating the aggregated signature when adding or removing authorized signatures of the set is straightforward.
In the quadratic loss context, whitening can also be advantageously employed. Denote by U the column eigenvectors and by D the diagonal matrix of eigenvalues of the eigendecomposition of XXT+λI. This means that we have XXT+λI=UDUT which amounts to a Singular Value Decomposition (SVD) analysis. Then w can be further rewritten as:
The similarity between a query q and a set using the learned w can be computed as:
Introducing Û=(D−1/2U)T, we have:
Therefore, the similarity between a query signature q and the set of authorized signatures can be seen as the dot product between the query and the sum-aggregated signatures in the set after being projected in a space generated by the whitened eigenvectors of the data. In practice, the matrices U and D are suitably learned from the data and all the signatures with Û=(UD−1/2)T are projected. Then, the similarity between a query and the set of authorized signatures is computed as the dot product of the whitened query and the sum-aggregated whitened set. In experiments, it was observed that l2-normalizing the signatures after projecting them with Û significantly improved the accuracy. Finally, the quantization function σ in this case is suitably a simple thresholding, i.e, σ(qTw)=1 if qTw>τ and 0 otherwise. In this case the decision as to whether query signature q belongs with the set of authorized signatures
, i.e. is authenticated, is based on whether the similarity measure (inner product qTw) exceeds a threshold τ.
The whitening can be viewed as a pre-processing of the data that improves sum-aggregation. This insight enables combination of whitening with GMP as follows: first the data is whitening by projecting on Û, then GMP is applied to the whitened data.
The disclosed approach can be expanded to multiple modalities. For example, in a single-modality approach we are interested in testing if an image of a license plate is in the set of authorized license plates. However, suppose that the authorized license plates set is not formed by images but by text strings. Multi-query frameworks can be used to put images and text in the same subspace, enabling this type of multimodal matching. However, since the statistics of the embedded images and the embedded text are still slightly different, it is likely that learning the whitening on only one of the modalities and applying it to the other may not yield the best results. For this case learning the whitening with a CCA-like formulation may be more fruitful.
Another consideration pertains to the size of the set of authorized signatures 30. As discussed with reference to
A way to address this is to break the set of signatures 30 into multiple sub-sets, i.e. multiple groups. Since this is a one-to-many matching problem, a query signature is then deemed as authorized if it matches any one of these sub-sets or groups of signatures. Each sub-set or group of authorized signatures is processed separately by the authenticator training component 6 to generate a corresponding aggregate signature 10. In the authenticator 8, processor 42 is separately applied to the query signature 12 for each of the aggregate signatures, and if any of these produce an output 50 indicating authentication 52 then the query signature 12 is deemed to be authenticated. The number of sub-sets or groups of authorized signatures is suitably chosen to trade off efficiency and storage (and, in a lesser degree, privacy) in favor of accuracy. In other words, as the number of sub-groups increases the storage requirements and processing time both increase, but the accuracy is also expected to increase.
With reference now to
The evaluation procedure was performed as follows. K random elements were drawn from the database, which constituted the image set (that is, the set of authorized signatures). Then, the remaining items were tested as to whether or not they belonged to this set (that is, they served as query signatures). By varying the decision threshold, the trade-off between a high true positive rate and a low false positive rate was analyzed. This is illustrated in
As presented in
As expected, the simple sum aggregation was the worst performing method. On non-whitened data, computing the GMP weights consistently led to significant improvements over using weights equal to 1.
Whitening the data always helped significantly: sum+whitening was superior to sum and GMP+whitening is superior to GMP.
Sum aggregation on whitened data outperformed GMP on non-whitened data in these experiments. This is not surprising, since both formulations are very similar, but the whitening implicitly addresses a two-class problem while GMP addresses a one-class problem.
Whitening+GMP provided the best accuracy, showing that both approaches are complementary. This improvement was significant in some experiments. For example, with sets of size K=8, at a 0.01% false positive rate, the whitened sum obtained a true positive rate of 66% while the whitened GMP reached a true positive rate of 83%. With sets of size K=16, at a 0.1% false positive rate, the whitened sum obtained a 40% positive rate while the whitened GMP obtained an 80% positive rate.
It will be appreciated that various of the above-disclosed and other features and functions, or alternatives thereof, may be desirably combined into many other different systems or applications. Also that various presently unforeseen or unanticipated alternatives, modifications, variations or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims.
Number | Name | Date | Kind |
---|---|---|---|
7130454 | Berube | Oct 2006 | B1 |
7647331 | Li et al. | Jan 2010 | B2 |
7801893 | Gulli' et al. | Sep 2010 | B2 |
8452106 | Ke et al. | May 2013 | B2 |
20070069921 | Sefton | Mar 2007 | A1 |
20080166026 | Huang | Jul 2008 | A1 |
20090058598 | Sanchez Sanchez | Mar 2009 | A1 |
20090208059 | Geva | Aug 2009 | A1 |
20130129151 | Rodriguez Serrano | May 2013 | A1 |
20140072185 | Dunlap | Mar 2014 | A1 |
20140114987 | Hoeng | Apr 2014 | A1 |
20140155098 | Markham | Jun 2014 | A1 |
20150086118 | Shabou | Mar 2015 | A1 |
Entry |
---|
Almazán, et al., “Word Spotting and Recognition with Embedded Attributes,” IEEE Transaction on Pattern Analysis and Machine Intelligence, pp. 1-17 (2014). |
Chum, et al., “Scalable Near Identical Image and Shot Detection,” Proc. 6th ACM International conference on Image and Video Retrieval, pp. 1-8 (2007). |
Chum, et al., “Near Duplicate Image Detection: min-Hash and tf-idf Weighting,” In BMVC, vol. 810, pp. 812-815 (2008). |
D'Angelo, et al., “Beyond Bits: Reconstructing Images from Local Binary Descriptors,” 21st International Conference on Pattern Recognition, pp. 935-938 (2012). |
Doermann, et al., “The Detection of Duplicates in Document Image Databases,” Language and Media Processing Laboratory Institute for Advanced Computer Studies, University of Maryland, pp. 1-37 (1997). |
Grauman, et al., “Efficient Image Matching with Distributions of Local Invariant Features,” Computer Science and Artificial Intelligence Laboratory, Technical Report, pp. 1-18 (2004). |
Kato, et al., “Image Reconstruction from Bag-of-Visual-Words,” Computer Vision and Pattern Recognition (CVPR), pp. 955-962 (2014). |
Ke, et al., “Efficient Near-duplicate Detection and Sub-image Retrieval,” ACM Multimedia, vol. 4, No. 1, pp. 1-8 (2004). |
Murray, et al., “Generalized Max Pooling,” arXiv preprint arXiv:1406.0312, pp. 1-8 (2014). |
Weinzaepfel, et al., “Reconstructing an image from its local descriptors,” Computer Vision and Pattern Recognition, pp. 1-8 (2011). |
Zhang, et al., “Detecting Image Near-Duplicate by Stochastic Attribute Relational Graph Matching with Learning,” Proceedings of the 12th Annual ACM International Conference on Multimedia, ACM, pp. 877-884 (2004). |
Number | Date | Country | |
---|---|---|---|
20160277190 A1 | Sep 2016 | US |