The invention concerns the field of the encrypting of data and the comparison of encrypted data with a candidate data item for assessing similarities between one of the encrypted data items and the candidate data item.
The invention is applicable in particular to the field of biometry, for encrypting individual biometric data, and identifying a candidate individual by comparing one of his biometric data with the encrypted data.
A method for encrypting a data item known by the term “fuzzy vault” scheme is known, this method having been described in the following articles:
The “fuzzy vault” scheme consists of integrating, in a mathematical set called “fuzzy vault”, and referred to hereinafter as a “protected set”, information related to a data item A, as well as supplementary parasitic information that is generated randomly and is independent of the data item A. This parasitic information makes it possible to mask the information related to A.
More precisely, this encrypting applies to a data item A in the form of a list of indexed elements ai of a finite field F.
During this method, a polynomial p having certain mathematical properties not described here is generated randomly and, for each element ai of A, the image by p of the element ai is computed.
The pairs consisting of the elements ai of the data item A and their images by p are then added to the protected set.
And finally, error-inducing points are added to the protected set, these points being randomly generated pairs (xi, xi′), such that xi is not an element of A, and xi′ is not the image by p of xi. Mathematically, xiεF\A, xi′ε{p(xi)}.
A set of pairs (xi*, xi*′) is therefore obtained in which either the abscissas xi* belong to A and xi*′=p(xi*), or they belong to F\A, and in this case xi*′ are chosen in F\p(A).
Adding a large number of error-inducing points makes it possible to mask the points related to the data item A and to the polynomial p.
Next, the protected set is used to compare a second data item B with the data item A, without obtaining information on said data item A.
To do this, decrypting algorithms have been developed, making it possible to compare a data item B, in the form of a list of indexed elements bi, with the protected set, in order to determine whether the data item B corresponds to the data item A with a degree of similarity exceeding a predetermined threshold.
In particular, B corresponds to A if a large number of elements bi correspond to elements ai of A, the latter elements being situated by definition in the protected set.
The argument of these decrypting algorithms is the elements bi of the data item B that correspond to abscissas xi of the protected set, and their result is a polynomial p′. If B corresponds sufficiently to A, the polynomial p′ is the polynomial p that was used for encrypting the data item A.
It is then possible to apply this polynomial p to all the elements bi of B corresponding to abscissas xi of the protected set in order to determine which elements bi are also elements ai of A, since by construction only the pairs comprising an element and an image of this element by p are elements of A.
An example of a suitable decrypting algorithm is of the type for decrypting a Reed-Solomon code.
The fuzzy vault scheme therefore makes it possible to compare two data items without obtaining information on one of the data.
However, it is limited to a comparison of two data items and cannot be applied to a comparison of a data item with a set of several data in a data bank. This type of comparison is however used more and more frequently, in particular in the context of the biometric identification of individuals.
There therefore exists a need for extending the principle of the fuzzy vault scheme to a plurality of data in a data bank, so as to enable a comparison of the data in the bank with a given third party, without obtaining information on these.
Presentation of the Invention
One aim of the invention is to overcome the problem mentioned above.
This aim is achieved in the context of the present invention by means of a method for encrypting a set of at least two indexed data implemented by a server, the data being in the form of lists of elements, each element of which belongs to a finite set of indexed symbols called an alphabet,
the method being characterised in that the data is encrypted to form a protected set, the step of encrypting and creating the protected set comprising the following steps:
The invention further concerns a method for identifying an individual, in a system comprising a control server, suitable for acquiring a biometric data item of the individual to be identified, and a server managing a base containing individual biometric data of listed individuals,
in which, in order to identify the individual, his data item is compared with the N data in the base in order to identify the data item or items in the base having a degree of similarity with the data item of the individual exceeding a predetermined threshold,
the method being characterised in that, before the step of comparing the data item of the individual with the data in the base, these are encrypted by the management server using the method according to one of the preceding claims.
Advantageously, but optionally, the identification method may also comprise at least one of the following features:
Other features, aims and advantages of the invention will emerge from the following description, which is purely illustrative and non-limitative, and which must be read with regard to the accompanying drawings, in which:
The main steps of a method for encrypting a plurality of data Aj in a database DB are described with reference to
The database DB contains a number n of secret data Aj (j=1 . . . n), each data item Aj being in the form of a list of elements, for example of t indexed elements αij, i=1 . . . t, so that each Aj is written Aj=(α1j, . . . , αtj). Alternatively, the data Aj may be of different sizes from one another.
The elements aij of each Aj are preferably binary elements or vectors, each coordinate of which is a binary element.
The present invention fits within code theory, which uses certain mathematical objects, the definitions of which are given again here.
is an alphabet, that is to say a set containing N symbols x1 . . . xN, such that each element of the data item Aj is a symbol of the alphabet . This alphabet is defined according to the way in which the data Aj are coded.
Thus for example, if the elements of the data Aj are values coded on a certain number of bits, the alphabet comprises all the binary codes coded on this number of bits. For data Aj coded on one byte, the alphabet comprises the two hundred and fifty six (256) possible bytes.
An evaluation function is also defined as follows:
Furthermore, if Lk is a subset of P of dimension k, C=ev(Lk) is an evaluation code defined by Lk. It is said that C is an evaluation code on Y of length N and dimension k.
Finally, codeword means an element of the code C, that is to say the evaluation of a function f by the evaluation function ev(f).
Encrypting of the Data in the Base
The encrypting 100 of the data Aj in the base is done by the implementation, by a computer server, of the steps identified in
Generation of the Encoding Functions
During step 110, a server randomly generates, for each data item A in the base, a corresponding encoding function Fj.
Encoding function means a function that associates a coordinate of a codeword with an element.
In the present case, encoding functions Fj associated with an evaluation code for which there exists a list recovery algorithm are chosen.
For example, Reed-Muller codes are known, algebraic codes such as Goppa codes, or codes known by the term “folded Reed-Solomon codes”.
In the context of the present invention, a folded Reed-Solomon code is advantageously used, which is defined as follows:
Fm where u=q−1 is divisible by m,
In the case of an evaluation code of the folded Reed-Solomon code type, the encoding functions Fj corresponding to the data Aj are then defined as follows:
Returning to
To do this, the server generates, during a step 121, as many sets Si as there are symbols in the alphabet , each set Si corresponding to an element xi in the alphabet .
The server also defines two security parameters, l and r.
The first security parameter, l, is an integer associated with an indexed set Si. This integer may vary from one set Si to another, or be the same for all the sets Si.
The second security parameter, r, is also an integer. Its role is described in more detail hereinafter.
When the algorithm is initialised, the sets Si contain no element.
Then, for each symbol xi in the alphabet ,
The error-inducing points are chosen randomly in the set Y deprived of images of the symbols of the alphabet by the encoding functions Fj corresponding to the data Aj. Thus these error-inducing points are independent of the encoding functions.
These error-inducing points prevent identification of the authentic codewords. They therefore prevent the determination of the encoding functions Fj of the data Aj from the symbols of the alphabet and the codewords.
The integer l is a security parameter of the encrypting method. Its value depends on the decrypting algorithm that it is wished to use subsequently and the computing time that can be tolerated. Where it is chosen to use a folded Reed-Solomon code, the integer l is typically less than m, m being one of the parameters of the folded Reed-Solomon code, and also less than the number n of data Aj in the base.
Moreover, the server holds a counter of the number of non-empty indexed sets Si, this counter being incremented by 1 if a symbol xi of the alphabet is present in at least one of the data Aj. The counter value is called cpt.
At the end of these first steps 122, 123, empty indexed sets Si may remain, if the symbol of the corresponding alphabet xi is not present in any data Aj in the base.
The server then randomly chooses, during a step 124, indices ie, ie={icpt+1, . . . , ir}, such that the indexed sets Si
Here again, the error-inducing points are chosen in Y deprived of images of the symbols of the alphabet by the encoding functions Fj corresponding to the data
At the end of step 123, N-r empty sets Si remain.
The security parameter r therefore represents the number of non-empty indexed sets Si at the end of the encrypting step 120.
r is a positive integer, less than N, the number of symbols in the alphabet , chosen according to the number of data Aj in the base. Preferably, r has been chosen so that r has the same order of magnitude as N, the number of symbols in the alphabet. It is even possible to have r=N, so that no empty set remains during the encrypting step 120.
By way of non-limitative example, N may have an order of magnitude of 104, and then r is preferably between a few thousands and the value of N, around a few tens of thousands.
This step 124 of adding error-inducing points in sets Si
The mathematical algorithm of steps 121 to 124 is appended in
Finally, during a step 125, the server scrambles the elements of each indexed set Si. This scrambling is implemented by random re-indexing of the elements within each set Si.
Indeed, the codewords having been added first to the sets Si, their position in these sets would make it possible to identify them. The scrambling thus enables the codewords to have a random position in the sets Si.
Finally, during a step 126, pairs consisting of a symbol of the alphabet and a corresponding indexed set are added to the protected set LOCK, for each symbol in the alphabet.
For probative purposes elaborated on below, the server may also, during a step 127, calculate the image by means of a public hash function Hash of each encoding function Fj that was used to generate the codewords, and integrate these images Hash(Fj) in the set LOCK, which is then written LOCK(Aj, Hash(Fj)).
Decrypting
Once the data Aj have been encrypted in the set LOCK, this set is used to determine, from a data item B, the data item Aj having the most similarities with the data item B, without providing any information on the data Aj. It is this step 200 that is called decrypting, and the steps of which are illustrated in
The data item B is a list of t elements {b1, . . . , bt}, each element bi of which is a symbol xi in the alphabet .
A server having to proceed with the decrypting selects, during a step 210, among the indexed sets Si stored in the set LOCK, those Si
The server next uses a list recovery algorithm having as its input all the pairs {(xi
This list recovery algorithm depends on the code chosen to encrypt the data Aj. In the case where the code is a folded Reed-Solomon code, a suitable list recovery algorithm is the Guruswami list decrypting algorithm described in the publication by Venkatesan Guruswami, Linear-algebraic list decoding of folded Reed Solomon Codes, in IEEE Conference on Computational Complexity, pages 77-85. IEEE Computer Society, 2011.
The list recovery algorithm supplies as a result a list of codewords that have a degree of similarity with the indexed sets Si that exceeds a predetermined threshold. In these codewords, one or more encoding functions are deduced that correspond to the encoding function or functions Fj of the data Aj that have a degree of similarity with the data item B above a predetermined threshold.
In particular, if the data item B corresponds to one of the data Aj, the encoding function Fj corresponding to this data item Aj is obtained from the results of the list recovery algorithm.
The functions resulting from this algorithm are such that, for a proportion of the xi
If it is wished to obtain the proof that a function that is the result of this algorithm is indeed an evaluation function of a corresponding data item Aj, the server can calculate, during a verification step 230, the image of this result function by the public hash function Hash mentioned above, and compare this result with the hashings of each of the data Aj that are stored in the protected assembly LOCK.
Finally, from the encoding function Fj, the server can find the data item Aj. To do this, the image of all the symbols xi are computed by means of the encoding function Fj, and it is determined whether Fi(xi) belongs to the indexed set Si. If such is the case, then xi belongs to the data item Aj. It is then possible to reconstruct the data item Aj.
Application to Biometric Identification
A preferential application of this encrypting algorithm and the corresponding decrypting algorithm is that of biometric identification.
Biometric identification is illustrated schematically in
The identification of an individual consists of comparing a data item particular to this individual with similar data of referenced individuals in order to determine whether the individual to be identified corresponds to one of the referenced individuals with a degree of similarity exceeding a predetermined threshold.
The referenced individuals may for example be individuals whose access to a place is authorised, or alternatively individuals sought by the police.
For example, in
This biometric character may for example be an iris or a fingerprint.
With reference to
The number, form and position of the minutiae on a fingerprint 10 make this fingerprint unique and specific to the individual carrying it. Consequently it is the minutiae that are used to code a fingerprint.
The coding of a fingerprint 10 is a set of triplets (x, y, θ) in which X and y indicate the abscissa and the ordinate of a minutia on a normalised reference frame identified in
x, y, and θ are each coded on one byte. The corresponding alphabet for the encrypting method consists of all the possible triplets each coordinate of which is coordinated on one byte. There exist 256 (28) possible bytes and therefore the alphabet contains N=2563 elements.
Returning to
The management server SG uses the encrypting method described above on the data Aj in order to create a protected set LOCK (A1, . . . , AN).
When an individual presents himself in order to be identified, the control server SC acquires a biometric data item B, either by means of a fingerprint sensor or by reading a chip stored in an identity document.
The control server SC then uses the decrypting algorithm described above in order to determine which data item Aj, if such exists, corresponds to the data item B of the individual with a degree of similarity above a predetermined threshold.
An encrypting algorithm has therefore been developed enabling a plurality of data Aj to be encoded in a protected set. This algorithm constitutes an extension of the fuzzy vault scheme, the latter not making provision for coding several data, even more so when these data have elements in common.
This algorithm also makes it possible to minimise the storage space for the encoding of the data since the error-inducing points are added for all the data.
Furthermore, it makes it possible to effect only one decoding for all the data, which may represent a saving in computing time, depending on the list recovery algorithm to be used.
Number | Date | Country | Kind |
---|---|---|---|
12 52365 | Mar 2012 | FR | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2013/055297 | 3/14/2013 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2013/135846 | 9/19/2013 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
7627904 | Tokkonen | Dec 2009 | B2 |
7739733 | Szydlo | Jun 2010 | B2 |
20060123239 | Martinian et al. | Jun 2006 | A1 |
20060123241 | Martinian et al. | Jun 2006 | A1 |
20080044027 | Van Dijk | Feb 2008 | A1 |
20080235515 | Yedidia et al. | Sep 2008 | A1 |
20090262990 | Choi et al. | Oct 2009 | A1 |
20130004033 | Trugenberger | Jan 2013 | A1 |
Entry |
---|
International Search Report and Written Opinion issued Jun. 24, 2013 in PCT/EP2013/055297 filed Mar. 14, 2013 (with English translation of Category of Cited Documents). |
Preliminary Search Report and Written Opinion issued Dec. 17, 2012 in French Patent Application No. FR1252365 with English translation of Category of Cited Documents). |
International Search Report issued Jun. 24, 2013, in PCT/EP13/055297, filed Mar. 14, 2013. |
Juels, A. et al., “A Fuzzy Vault Scheme”, Designs, Codes and Cryptography, Kluwer Academic Publishers, Bo., vol. 38, no. 2, (18 pages), 2006 XP019205891. |
Kasahara, M., “A generalization of Secret Sharing Scheme on the Basis of Recovering Algorithm, K-RA”, International Association for Cryptologic Research, vol. 20070323: 100353, pp. 1-7, XP061002281, 2007. |
Chang, Ee-Chien et al., “Secure Sketch for Multi-Sets”, International Association for Cryptologic Research, vol. 20060315:181400, pp. 1-5, XP061001775, 2006. |
Translation of the Written Opinion of the International Search Report for PCT/EP2013/055297 dated Sep. 15, 2014. |
Translation of the International Preliminary Report on Patentability for PCT/EP2013/055297 dated Sep. 16, 2014. |
Number | Date | Country | |
---|---|---|---|
20150039899 A1 | Feb 2015 | US |