The present invention relates to machine learning, and in particular to a method, system and computer-readable medium for partial planar point cloud matching, which can be advantageously applied in technology areas such as biometrics.
There are different approaches for fingerprint matching. Depending on the data that is used during the matching, the fingerprint matching problem approaches can be classified in two categories: minutiae based, and image supported. In the first category of minutiae based matching, an algorithm where the matching fully relies on the extracted minutiae information is used. The algorithms on this category typically involve intensive search algorithms and similarity metrics, and typically don't involve machine learning (see Ravi, et al., “Fingerprint Recognition Using Minutia Score Matching,” arXiv:1001.4186 (2010) and WO2020/254857, each of which is hereby incorporated by reference herein). In the second category of image supported matching, Nguyen, “End-to-End Latent Fingerprint Search,” arXiv:1812.10213 (2018), which is hereby incorporated by reference herein, makes a direct use of the extracted images, and the minutiae information is typically used as supporting knowledge. The recent works in this area typically involve machine learning models.
In an embodiment, the present invention provides a method for partial planar point cloud matching. Partial point clouds and full point clouds are collected. A graph is generated from the partial point clouds and a graph is generated from the full point clouds. A point cloud graph network is trained to predict a matching matrix using the graphs.
Subject matter of the present disclosure will be described in even greater detail below based on the exemplary figures. All features described and/or illustrated herein can be used alone or combined in different combinations. The features and advantages of various embodiments will become apparent by reading the following detailed description with reference to the attached drawings, which illustrate the following:
Embodiments of the present invention tackle the set matching problem, i.e., given a subset Q, the goal is to retrieve the original set 0 where the query subset is taken. In particular, the goal according to an embodiment of the present invention is to provide for the partial latent to full ten-print palm-print/fingerprint minutiae matching, where the latent samples stand for the impressions that are left on objects after touching them. The latent samples can be partial or complete (e.g., a part of a fingerprint or palm print, or a complete single fingerprint), and are typically noisy. The ten-print stands for the “clean” and accurate data that is typically retrieved by the scanners, and it is stored on a dataset. These clean samples could also be taken by inkpads. For example, the clean samples for the minutiae matching could be an entire palm-print including the full ten fingerprints taken by a scanner, and fingerprint matching of a partial fingerprint could be a sub-problem of matching of a partial palm-print. According to an embodiment, however, it is not always necessary that the clean data used as a reference is also noisy or partial (e.g., the reference database of clean data could contain unidentified palm prints). The minutiae are the locations in which the fingerprint lines are joining, splitting, or ending, and the associated features such as the angle of the union, the texture, or others. This embodiment has applications in forensic or biometric identification, security and authentication, among others.
Biometrics are used for a number of purposes such as authentication for access to a device, voting, security in public spaces (e.g., in airports) and in forensics, as well as in other scenarios where authentication or identity matching is provided for by technical systems. Embodiments of the present invention provide a method, system and computer-readable medium for partial point cloud matching that takes advantage from the underlying geometric properties of the point cloud, which is generated from the point features of images, such as from partial fingerprints or palm-prints. Embodiments of the present invention use machine learning to accelerate the matching process and show favorable performance on both virtual and real datasets.
Embodiments of the present invention provide a new mechanism for partial latent to ten-print matching of sets of minutiae using machine learning and data augmentation.
Although some existing approaches can find good matches, these approaches rely on heuristics. These heuristics are not trainable and require time to compute. In an embodiment, the present invention supports the use of black-box heuristics by:
According to an embodiment of the present invention, the point-cloud used to represent the hand or the fingerprints is considered to lay on a planar geometry and thus distances and angles are preserved in the partial print and the database prints. In particular, when the point cloud consists of planar points, the principle that distances and angles are preserved can be advantageously used.
In the following, a system architecture according to an embodiment of the present invention is described. The input to the system is the point cloud (minutia) that contains: the planar coordinate of the points extracted from the object (e.g., fingerprint/hand-palm print); the tangent angle; and (optionally) the local curvature at the points. The minutiae are locations where two lines are joining or ending, e.g., as an (x,y) location, and typically, include the angle a of the line. This data is determined by running a detector over an image of the object (e.g., a scan of a fingerprint). The system consists of:
The foregoing system components can comprise hardware processors configured to implement computer code stored in physical memory so as to execute the functions listed above or any method according to an embodiment of the present invention. The output of the system consists of:
For each partial print (which can also be referred to as latent, or simply a “query” point-cloud), there are multiple candidates (the finger-prints or the palm-prints or in general other point clouds). For each candidate, there is generated 1) the matching matrix (which gives the correspondence of points in the input to the points in the candidate); and 2) the score of the overall match. For example, the score is the number of points re-identified or the (optimal) transport cost.
According to embodiments of the present invention, the following loss functions can be used:
An exemplary algorithm for extracting coordinates, angles and/or curvature of points of minutiae, for each pixel of the image of the object, collects local image information (e.g., as in a convolution) and extracts local features, such as the direction of the gradient (which will give information on the tangent of the direction) and then filters the pixel on come criteria. Examples of local feature extraction algorithms which could be used include scale invariant feature transform (SIFT), speed up robust feature (SURF), binary robust independent elementary features (BRIEF) or oriented FAST and rotated BRIEF (ORB) (see, e.g., Karami, Ebrahim, et al., “Image Matching Using SIFT, SURF, BRIEF and ORB: Performance Comparison for Distorted Images,” Computer Vision and Pattern Recognition, arXiv: 1710.02726 (Oct. 7, 2017), which is hereby incorporated by reference herein), and/or a neural network could be used. The following is exemplary pseudocode for a local feature extraction algorithm which could also be used according to an embodiment:
For the encoder network according to an embodiment of the present invention, the Graph Edge Convolution (see Wang, et al., “Dynamic graph cnn for learning on point clouds,” ACM Transactions On Graphics, 38, no. 5, pp. 1-12 (2019), which is hereby incorporated by reference herein) can be used, where the edge feature is proportional to the difference of the input feature of the nodes.
For graph generation from a point cloud according to an embodiment of the present invention, the k-nearest neighbors graph are considered from the input point cloud.
For computing the distance of the matching according to an embodiment of the present invention, the transport cost is used:
d(Z)=log<C,Z>
C
ij
=|c
j
y
−
i
x|2
where C is the cost matrix, whose entries are the cost of moving the i-point to the j-point; Z is a variable that specifies whether the i-point has been moved to the j-point; <C, Z>is the inner product of the two matrices (which can be also written as Σij(Cij, Zij)); and d(Z) is the cost of moving all the variables with respect to the cost matrix C; thus, the cost is the Euclidean distance of the feature in the original space and the distance is computed as the inner product between the cost matrix and the matching matrix.
For variational features and matching according to an embodiment of the present invention, graph embeddings are represented by mean and variance. During training, sampling takes place from a Gaussian model whose mean and variance are trainable variables, and the reparametrization trick is used to sample during training. The reparametrization trick is used to differentiate through a random variable. Here, the sample is written as x=σu+μ, where a is the standard deviation and μ is the mean, while u is sampled on a fixed normal distribution N(0,1) of unit variance and zero mean. The corresponding sample will have normal distribution (μ, σ2), a normal distribution of mean μ and variance σ2. Next, the derivatives of the two variables σ and μ are computed, which are now deterministic: f(u)=f(σ)+f(μ).
Advantages of using the variational model include:
In an embodiment, the present invention can be applied for data augmentation for extended training. In various contexts, the partial point cloud is not available. If information on the procedure of creation of the partial point cloud is available, the use of virtual partial cloud and virtual matching is considered. From the full point cloud, points are selected according to the creation procedure and it is generated from these points the virtual partial point cloud and the virtual matching (which point corresponds to which point). This virtual dataset is used in addition to other training data to improve training performance.
Embodiments of the present invention can be applied for box and random virtual point clouds. In the box case, a box of predefined size around a point is selected, and all the points inside the box are used a partial cloud points. In the random case, a predefined number of points are selected from the input cloud points and used as partial point cloud. These points may be perturbed with some random noise in the input features.
In an embodiment, the present invention provides a transformer network with 2D/3D spatial encoding. Here, the use of transformer network with a spatial encoding of the type (sin (w x), sin (w y), sin (w z), cos (w x), cos (w y), cos (w z)) is considered for different values of w to encode the spatial relative position.
In an embodiment, the present invention provides for optimal transport matching where differential optimal transport is used to perform the matching.
In an embodiment, the present invention provides for discrete selection of the points where a differentiable discrete gate is used to select which point to participate in the matching process.
In an embodiment, the present invention provides for optimization of the point cloud extraction where an embodiment of the method according to the present invention is used to improve the quality of the point cloud extraction.
Embodiments of the present invention, can be applied for technical applications such as fingerprint matching or hand-palm print matching, where the partial point cloud is used for matching fingerprints or hand-palm prints. Other embodiments of the present invention can be applied for planar image registration, where the method according to an embodiment of the present invention is used to register-align images, for example from satellite or drones.
Aspect 1: In an Aspect 1, the present invention provides a method for partial planar point cloud matching. Partial point clouds and full point clouds are collected. A graph is generated from the partial point clouds and a graph is generated from the full point clouds. A point cloud graph network is trained to predict a matching matrix using the graphs.
Aspect 2: The method according to Aspect 1, wherein the point cloud graph network is trained to predict the matching matrix by using a distance and angle preserving graph network to generate features that are rotation and translation invariant.
Aspect 3: The method according to Aspects 1 or 2, further comprising computing a matching score in each case as a transport cost between the matching matrix and a distance of an input feature from the partial point clouds with contrastive loss, wherein the input features include coordinates and angles.
Aspect 4: The method according to any of Aspects 1-3, further comprising providing statistics of the matching scores and the matching matrix using a variational approach.
Aspect 5: The method according to any of Aspects 1-4, wherein the transport cost is determined by comparing the predicted matching matrix to a true matching matrix.
Aspect 6: The method according to any of Aspects 1-5, wherein the matching scores are used to rank candidates, the candidates being a subset of the full point clouds and/or known point clouds stored in a database, the method further comprising selecting a highest scoring one of the candidates as a match.
Aspect 7: The method according to any of Aspects 1-6, further comprising predicting the matching matrix using the trained point cloud graph network, and using the predicted matching matrix to run a further point cloud matching method.
Aspect 8: The method according to any of Aspects 1-7, further comprising applying the trained point cloud graph network for matching partial finger prints and/or hand-palm prints to a complete set.
Aspect 9: The method according to any of Aspects 1-8, further comprising applying the trained point cloud graph network for planar image registration.
Aspect 10: In an Aspect 10, the present invention provides a system for partial planar point cloud matching, the system comprising one or more hardware processors, which alone or in combination are configured to execute the following steps: collecting partial and full point clouds; generating a graph from the partial point clouds and a graph from the full point clouds; and training a point cloud graph network to predict a matching matrix using the graphs.
Aspect 11: The system according to Aspect 10, wherein the point cloud graph network is trained to predict the matching matrix by using a distance and angle preserving graph network to generate features that are rotation and translation invariant.
Aspect 12: The system according to Aspect 10 or 11, wherein the system is further configured to compute a matching score in each case as a transport cost between the matching matrix and a distance of an input feature from the partial point clouds with contrastive loss, wherein the input features include coordinates and angles.
Aspect 13: The system according to any of Aspects 10-12, wherein the system is further configured to provide statistics of the matching scores and the matching matrix using a variational approach.
Aspect 14: The system according to any of Aspects 10-13, wherein the transport cost is determined by comparing the predicted matching matrix to a true matching matrix, and/or wherein the matching scores are used to rank candidates, the candidates being a subset of the full point clouds and/or known point clouds stored in a database, the system being further configured to select a highest scoring one of the candidates as a match.
Aspect 15: In an aspect 15, the present invention provides a tangible, non-transitory computer-readable medium having instructions thereon, which upon being executed by one or more processors, provide for execution of a method according to an embodiment of the present invention.
For the ML matching component 64, the performance can be computed by determining composite loss on the matching matrix and on the error with respect to the true match, for example, the ranking of the candidates (a ranking of one is correct and the lower the ranking, the larger the error). For the detailed matching component 65, the performance can be simply determined by the error in the matching after the black box.
There are two type of updates possible: 1) change the algorithm that extracts the minutia (if there are multiple algorithms); and/or 2) change the parameters of the extraction algorithm, for example the number of extracted points, a window where the points are compared with other points or thresholds on activate points or different methods to compute the angles. To determine the degree of improvement from an update, it is possible to: 1) compute the change (e.g., iteratively) along a single dimension, for example, by changing a single parameter to whether this change improves performance; 2) use some random or grid configuration of the extraction and select the best over all possible combination; 3) use either a random Markov tree search or Bayesian optimization to navigate the possible configuration. This area is called hyper-parameter optimization and includes a number of existing approaches. If the extraction module is differentiable, then learning can be end-to-end.
According to one embodiment, multiple parallel minutiae extraction components 62 are ensembled (combined), where the parameters are in this case the probabilities of getting a point cloud from one of the these parallel components.
The graph is a set of points and the edges that connects points (and can also be referred to as a graph network). The graph in some cases contains mainly the edges of nodes that are connected. The graph, for example, can be computed by the graph generation modules using the k-nearest neighbor algorithm. Spatial encoding has a similar role for the transformer and helps the transformer understand which nodes are close and which are far. It is typically defined by sine and cosine function at multiple frequencies: sin(2 πf1xi), cos(2 πf1xiwhere f1 are the frequencies and xi represents the coordinate of the point.
Embodiments of the present invention provide for the following improvements over existing computer systems and approaches:
In an embodiment, the present invention provides a method comprising the steps of:
from the distance matrix, extract the graph (e.g., using the k-nearest neighbors).
Embodiments of the present invention can be applied to improve biometric detection systems, such as those used for security in airports and Smartcity computer systems.
Embodiments of the present invention use a graph neural network to generate the matching matrix.
Embodiments of the present invention significantly reduce the time needed for matching.
Parameters of the system according to an embodiment of the present invention include, for the training phase, k-nn and dimension of the features and, for the test phase, sensibility for the ranking. For example, there could be parameters such as 1) number of neighbors for building a graph, and/or 2) size of the embedding for the encoder and decoder. At test time, there could be parameters to tune the sensitivity of the distance calculation (transport cost) or the generative model hyper-parameters (e.g., temperature of the softmax).
While subject matter of the present disclosure has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive. Any statement made herein characterizing the invention is also to be considered illustrative or exemplary and not restrictive as the invention is defined by the claims. It will be understood that changes and modifications may be made, by those of ordinary skill in the art, within the scope of the following claims, which may include any combination of features from different embodiments described above.
The terms used in the claims should be construed to have the broadest reasonable interpretation consistent with the foregoing description. For example, the use of the article “a” or “the” in introducing an element should not be interpreted as being exclusive of a plurality of elements. Likewise, the recitation of “or” should be interpreted as being inclusive, such that the recitation of “A or B” is not exclusive of “A and B,” unless it is clear from the context or the foregoing description that only one of A and B is intended. Further, the recitation of “at least one of A, B and C” should be interpreted as one or more of a group of elements consisting of A, B and C, and should not be interpreted as requiring at least one of each of the listed elements A, B and C, regardless of whether A, B and C are related as categories or otherwise. Moreover, the recitation of “A, B and/or C” or “at least one of A, B or C” should be interpreted as including any singular entity from the listed elements, e.g., A, any subset from the listed elements, e.g., A and B, or the entire list of elements A, B and C.
Priority is claimed to U.S. Patent Application No. 63/251,081, filed on Oct. 1, 2021, the entire disclosure of which is hereby incorporated by reference herein.
| Number | Date | Country | |
|---|---|---|---|
| 63251081 | Oct 2021 | US |