The invention concerns a method for comparing data obtained from a sensor or interface to determine a rate of similarity between the data. In particular the invention concerns a data comparison method via machine learning.
Numerous tasks implemented in the field of computer vision (or digital vision) for example require the comparison of complex data such as images to obtain a similarity score between such data.
For example, in the field of biometric authentication, face images of individuals are compared to determine whether the images have been obtained from the same person.
To treat this type of problem it is known to carry out an extraction of features from the data to be compared, the extraction of features converting the data to be compared into feature vectors, and subsequently to compute a similarity function between the feature vectors.
The computed similarity function generally comprises parameters that are a priori unknown. These parameters are determined and progressively optimised by machine learning. To do so, a processing unit conducts data comparison operations on a set of data known from a database, compares the results given by the similarity function with a real result and optimises the parameters of the similarity function accordingly for more reliable results.
For example from the publication by D. Chen, X. Cao L. Wang, F. Wen and J. Sun, Bayesian Face Revisited: A Joint Formulation, in ECCV, 2012, a learning method is known for a similarity function between data wherein the data are modelled by summation of two independent Gaussian variables: the mean of the class to which one datum belongs and the variation of the datum relative to the mean.
For example, if the data are images of faces, the class corresponds to the identity of the subject, and therefore the variation relative to the mean of the class corresponds to all the changes which may occur between a mean face image of the subject and an image taken under different circumstances:
However, an improvement in the performance level of comparison resulting from machine learning is limited by the fact that data of varying quality are taken into account in the database. As a result the determined similarity function shows deteriorated performance and hence deteriorated quality of comparison. The proposed comparison method is therefore not entirely reliable.
It is the objective of the invention to propose a data comparison method having improved performance compared with the prior art.
In this respect the subject of the invention is a method to compare two computer data items, obtained from a sensor or interface, carried out by processing means of a processing unit, the method comprising the computing of a similarity function between two feature vectors of the data to be compared,
characterized in that each feature vector of a datum is modelled as the summation of Gaussian variables the said variables comprising:
LR(x,y|Sε
where:
A=(Sμ+Sω+Sε
B=−AS
μ(Sμ+Sω+Sε
C=(Sμ+Sω+Sε
and where Sμ is the covariance matrix of the means of the classes (inter-class covariance matrix), Sω is the covariance matrix of the deviations relative to a mean (intra-class covariance matrix), and Sεx and Sεare the covariance matrices of the observation noises of vectors x and y respectively;
The proposed method allows data quality to be taken into account when computing the similarity function between data. This makes it possible to use variable weighting between data of good quality and more uncertain data.
For example when the method of the invention is applied to a comparison of images, the shadow or blur regions of an image are not taken into account by the similarity function with as much weighting as the clearly visible, clearly distinct regions.
Increased performance of data comparison is thereby obtained.
Additionally, machine learning allows optimisation of the similarity function parameters and hence improved performance of the comparison method.
Other characteristics, objectives and advantages of the invention will become apparent from the following non-limiting description given solely for illustration and is to be read in connection with the appended drawings in which:
With reference to
The processing unit 10 may be an integrated circuit for example and the processing means may be a processor.
Advantageously the system 1 further comprises an optionally remote database 20 storing in memory a plurality of data used by the processing unit 10 to carry out machine learning as described below.
Finally, the system 1 comprises a data acquisition unit 30, or if the data acquisition unit 30 is independent of the system it comprises an interface (not illustrated) adapted to communicate with such a unit. In this manner the system 1 is able to receive and process data b, in particular for comparison thereof using the method described below.
Depending on the type of data to be compared in the method described below, the data acquisition unit may be of any type e.g. an optical sensor (photographic camera, video camera, scanner), acoustic sensor, fingerprint sensor, movement sensor etc. It may also be a Man-Machine interface (keypad, tablet with touch-screen interface) to record data entered by an operator such as a text, figure, etc.
The computer data b are obtained by the acquisition unit 30, and are therefore derived from a sensor or interface e.g. a Man-Machine interface. They may be data representing a physical object e.g. an image, a schematic, a recording, a description, or representing a physical magnitude (electric, mechanical, thermal, acoustic, etc.), for example data recorded by a sensor.
The processing means 11 of the processing unit are advantageously configured to perform the data comparison method described below by executing a suitable programme.
To implement this method, the processing means 11 also advantageously comprise a feature extracting module 12 adapted to generate—from an input computer datum b communicated by a data acquisition unit 30—an extraction of features to generate a feature vector x associated with the datum and a quality vector qx of the datum associated with the feature vector.
The quality vector qx may be a vector of same size as the feature vector and each element thereof indicates a quality of the information contained in the corresponding element of the feature vector x. Alternatively, the quality vector qx may be of any size. The generation thereof is dependent on the type of datum b.
For example, feature extraction can be performed by applying to datum b one or more filters designed for this purpose, optionally followed by processing of the filtering result (e.g. computed histogram, etc.).
The generation of the quality vector is dependent on the type of datum b and type of features of the feature vector x—i.e. component elements of vector x. Each element of the quality vector takes into account intrinsic datum-related information associated with the particular features of the feature vector.
For example, in the field of signal processing or image processing, when the datum is an image or acquisition of a representative signal acquired by a sensor, it is frequent to use as feature vector x a frequency representation (e.g. Fourier transform) or spatial-frequency representation (e.g. wavelet transform) of the data. Each component of the feature vector then only depends on some frequency bands.
In such cases, the high frequency components of the datum may prove to be more discriminating than low frequency components, but also more sensitive to phenomena such as the presence of noise or lack of signal resolution.
The amount of noise in the datum can be determined by analysing its energy spectrum if the datum is a signal acquired by a sensor, or its intrinsic resolution if the datum is an image. For example the article by Pfenning and Kirchner is known for determination of the resolution of an image: <<Spectral Methods to Determine the Exact Scaling Factor of Resampled Digital Images>>, ISCCP, 2012.
The quality vector qx generated as a function of the feature vector x and of the intrinsic quality of the datum can then be constructed as follows:
According to another example, the datum is a face image.
According to this example, a feature vector can be obtained as shown in the article by Chen et al., <<Blessing of Dimensionality: High-dimensional Feature and Its Efficient Compression for Face Verification>>, VCPR, 2013, by concatenating local descriptors extracted in the vicinity of certain semantic points of the face (e.g. tip of nose, corners of the mouth, eyes, etc.).
This representation has the advantage of being more robust against variations in pose than methods which extract descriptors on a regular grid.
However the extraction of these features comprises a step to detect these points. Throughout this step a detector can be used which in addition to providing the most probable position of each point of the face in the image, also provides information translating the confidence level of detection accuracy.
A measurement is known for example from the article by Rapp et al., <<Blessing of Dimensionality: <<Multiple kernel learning SVM and statistical validation for facial landmark detection>>, Automatic Face & Gesture Recognition, 2011, which measures the distance to the separating hyperplane when using a detector based on Support Vector Machines (SVM).
Another example is given in the article by Dantone et al. <<Real-time Facial Feature Detection using Conditional Regression Forests”, CVPR, 2012, wherein a measurement of confidence is given by a number of votes determined by a detector using regression trees.
This confidence information can be used to create a quality associated with each component of the feature vector by attributing thereto the quality of detection of the facial semantic point to which it corresponds.
According to a further example, when the face image is a face image generated from an image which is not a front image of the face e.g. by applying the method described in application N° FR 2 998 402, the quality vector may be a confidence index, this index being relatively higher for the points of the face occurring in the original image and relatively lower for the points of the face not occurring in the original image and reconstructed via extrapolation.
More generally, when the datum is an image the quality vector can be obtained by local measurement of blur.
Alternatively, the feature-extracting module is a module of the acquisition unit 30 enabling the acquisition unit to communicate directly with the processing means 11 a feature vector and an associated quality vector.
With reference to
This method comprises the comparison of two data items by calculating a similarity function 100 between two feature vectors x and y of same size obtained from the data respectively, and by performing machine learning 200 of the parameters of the similarity function on a database.
In this method each feature vector is modelled as the summation of three independent Gaussian variables:
x=μ+ω+ε
where:
A class is a set of feature vectors considered to be similar. Two feature vectors are considered similar if their comparison by the similarity function produces a higher result than a threshold, this threshold being determined empirically.
For example, if the data are face images, a class advantageously corresponds to an individual. By comparing two feature vectors of several data, the data are considered to be similar if they originate from the same individual.
To return to the model previously described, two feature vectors belonging to one same class therefore have an identical value μ, but different values of ω and ε.
If the feature vectors belong to different classes, the three variables are fully independent.
It is considered that these three variables follow a multivariate normal distribution centred at 0, and the respective covariance matrixes are written Sμ, Sω, and Sε. Sμ is called an inter-class covariance matrix, Sω an intra-class covariance matrix and Sε an observation noise covariance matrix.
Sμ, Sω are unknowns common to all the feature vectors.
Sε on the other hand is known since it is obtained from the quality vector associated with the feature vector, by the feature extracting module. It is of same size as the associated feature vector.
For example, assuming that the observation noises do not correlate with one another, Sε can be well approximated by a diagonal matrix.
The elements of this diagonal matrix, then corresponding to the variance of the components of the quality vector, can be obtained from this vector.
For example, variance can be imposed by applying to the components of the quality vector qx a sigmoid function of type f(qx)=1/eaqx+b. The coefficients a and b can be chosen to associate a determined variance level with a quality level.
For example, a high quality can be associated with zero variance, a very low quality can be associated with maximum variance, the intermediate variances corresponding to intermediate qualities.
In general, since the quality vector and the feature vector depend on datum type, the transfer function which converts a quality vector to a noise covariance matrix is specific to the associated quality vector and feature vector.
In the remainder hereof Sεx denotes the covariance matrix of the background noise of vector x obtained from the quality vector qx, and Sεy is the covariance matrix of the background noise of vector y obtained from the quality vector qy.
Hsim denotes the hypothesis that two feature vectors belong to one same class i.e. the corresponding data are considered to be similar, and Hdis denotes the reverse hypothesis that the feature vectors belong to different classes and the corresponding data are considered to be dissimilar.
The joint probability of generating x and y, knowing their respective covariance matrixes of background noise and considering the hypothesis Hsim, is written P(x,y|Hsim,Sε
The joint probability of generating x and y, knowing their respective covariance matrixes of background noise and considering the hypothesis Hdis, is written P(x,y|Hdis,Sε
The matrixes and Ssim and Sdis are defined as follows:
The probability density of P(x,y|Hsim,Sε
where |Ssim| is the determinant of Ssim, and N is the dimension of a feature vector.
The same expression applies mutatis mutandis to the density probability P(x,y|Hdis,Sε
The computed similarity function to compare the two data corresponding to the vectors x and y is the logarithm of the ratio between the probability density of the feature vectors with the vectors belonging to one same class, and the probability density of the feature vectors with the vectors belonging to two different classes.
The similarity function is therefore expressed as follows:
When using the expression of probability density indicated above, and when developing the function using the block inversion formula to invert the matrixes Ssim and Sdis, the similarity function obtained is expressed as follows:
LR(x,y|Sε
In this expression, A, B and C are terms resulting from block inversion of Ssim and are respectively expressed as follows:
A=(Sμ+Sω+Sε
B=−AS
μ(Sμ+Sω+Sε
C=(Sμ+Sω+Sε
The constant is not dependent on x,y, Sε
It is therefore found that the similarity function LR takes into account the covariance matrixes Sε
The result of comparison is therefore impacted by the quality—or confidence—associated with a feature vector, which allows lesser weighting of a feature considered to be of poor or uncertain quality and greater weighting of a feature of good quality or having greater confidence.
As will be seen in the remainder hereof, this similarity function is also parameterised by machine learning. By taking into account the quality associated with a feature vector, the impact of a datum of poor quality on the parameterising of the function can be minimised.
The comparison method is therefore more reliable.
Returning to
This threshold is advantageously determined empirically by applying a large number of comparisons to known feature vectors in a database (that are known to belong or not belong to one same class).
If the result of the similarity function applied to x and y is higher than the determined threshold, the corresponding data are considered to be similar. Otherwise the data are considered to be dissimilar.
The expression of the similarity function LR indicated previously shows that this function is parameterised by the covariance matrixes Sμ, Sω, which are unknown.
Therefore the method comprises a step 200 to determine said matrixes by machine learning.
This method is advantageously conducted using an expectation-maximization algorithm (EM) and is performed on a set of data stored in the database 20, this data being called <<labelled>> i.e. the respective classes to which they belong are known.
A class to which a number mc of feature vectors belongs is denoted c, and Xc=[xc,1T, . . . , xc,m
denote their respective covariance matrixes of observation noise.
The latent variables Zc=[μc,ωc,1T, . . . ,ωc,m
The parameter to be estimated by the EM algorithm is Θ={Sμ,Sω}.
The expectation-Maximisation algorithm is an iterative algorithm comprising a first step 210 to estimate the parameters of distribution of the latent variables P(Zc|Xc,
On initialisation of the method,
The initialisation of parameter Sμ is advantageously obtained by calculating, for each class c, the empirical mean of the class and then determining a covariance matrix of the mean values.
The initialisation of parameter Sω can be obtained by calculating, for each class, the covariance matrix of the feature vectors from which the class mean is subtracted (i.e. feature vector differences relative to the mean) and then calculating the mean covariance matrix on all classes.
The algorithm next comprises a maximization step 220 following Θ of the expected logarithmic likelihood on the latent variables Zc:
For the proper conducting of this step and to minimise computing time, consideration is given to the fact that the latent variables ωc,i are conditionally independent of μc fixed by factorising P(Zc|Xc,
Optimisation of Q(Θ,
The combining of equation (1) and the fact that P(μc|Xc,
P(ωc,i|Xc,
Where Rc,i=(Sε
T
w
=R
c,i
S
ε
−1
T
μc
S
ε
−1
R
c,i
+R
c,i
And
b
ω
=R
c,i
S
ε
−1(xc,i−bμ
Step 220 therefore entails maximising relative to Sμ and Sw:
This is obtained by computing the gradients and solving
The expectation-maximization algorithm is performed iteratively by successively computing at step 210 the variables Tμ
Number | Date | Country | Kind |
---|---|---|---|
1460690 | Nov 2014 | FR | national |