The present disclosure relates to an image recognition system, an image recognition server, and an image recognition method for image recognition utilizing edge cloud computing.
Facial recognition is theoretically and practically important, and has long been vigorously studied in a wide range of regions including virtual reality applications and IoT networks. Classification algorithm based on sparse representation, which is inspired by the sparse mechanism of the human visual system, has attracted much attention. For example, NPL 1 discloses employing K-Singular Value Decomposition (K-SVD) algorithm to learn a feature dictionary, applying Orthogonal Matching Pursuit (OMP) to find a sparse representation of a testing image, and using a Support Vector Machine (SVM) for facial recognition.
Another technique commonly employed for facial recognition is deep learning, which has been demonstrated to be effective in extracting deep hierarchical features. For example, it is possible to improve facial recognition performance by using Convolutional Neural Network (ConvNets), which is one of technologies of deep learning, to extract visual features. However, training in deep learning is highly computationally intensive and requires large quantities of training data.
On the other hand, edge computing and cloud computing are promising techniques to provide cloud computing capabilities at edges proximate to mobile users, while reducing traffic bottlenecks between a cloud and edges of a core network and a backhaul network. Such edge cloud computing makes it possible to not only accommodate increased computing load, but also provide a wide variety of services by collecting data from mobile devices. For example, the edge cloud computing can be used in the facial recognition described above to allow offloading some of computing tasks onto an edge and a cloud to improve computational efficiency (to reduce computational load) in the facial recognition.
NPL 1: Y. Xu, Z. Li, J. Yang, and D. Zhang, “A Survey of Dictionary Learning Algorithms for Face Recognition,” IEEE Access, vol. 5, pp. 8502-8514, April 2017.
NPL 2: J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust Face Recognition via Sparse Representation,” IEEE Trans. Pattern Anal. Machine Intell., vol. 31, no. 2, pp. 210-227, February 2009.
NPL 3: T. Nakachi, H. Ishihara, and H. Kiya, “Privacy-preserving Network BMI Decoding of Covert Spatial Attention,” Proc. IEEE ICSPCS 2018, pp. 1-8, December 2018.
However, utilizing edge cloud computing in facial recognition has the following problems. (1) To prevent data leakage due to human errors or accidents, high level of security such as privacy protection is required. (2) Insufficient utilization of multi-device diversity (for improvement of recognition accuracy).
An object of the present invention is to provide an image recognition system, an image recognition server, and an image recognition method having a new high security framework that can achieve utilization of multi-device diversity, to solve the above-described problems.
To achieve the object, the image recognition system according to the present disclosure uses a random unitary matrix to ensure a high level of security from end to end, and uses ensemble learning based on results from each device (user terminal) to increase recognition accuracy.
Specifically, an image recognition system according to the present disclosure is an image recognition system, including N terminals (N is an integer not less than 2), M transfer servers (M is an integer not less than 1), and an image recognition server, and in the image recognition system,
when an image yij is expressed, by using a dictionary matrix Dij being an M×K matrix having, as elements thereof, K (K>M) bases, and a sparse coefficient Xij being a K-dimensional vector, as yij=Dij·Xij, where i is a class to which an image to be identified belongs, and j and k (k∈j) are numbers of the terminals,
the terminals are configured to encrypt a testing image using a random unitary matrix Qp generated by using a key p, to generate an encrypted testing image, and transfer the encrypted testing image to a specified one of the transfer servers,
the transfer servers are configured to downsample the encrypted testing image, transform the encrypted testing image to a one-dimensional encrypted image vector, and transfer the encrypted image vector to the image recognition server, and
the image recognition server is configured to use a plurality of encrypted dictionaries generated by encrypting, by using the random unitary matrix Qp, a plurality of dictionaries generated using different training images, to solve, by Orthogonal Matching Pursuit, an optimization problem represented by Math. C1 for each of the plurality of encrypted dictionaries, to estimate a class, to which the encrypted image vector belongs, for each of the plurality of the encrypted dictionaries, and is further configured to perform ensemble learning on the class estimated for each of the plurality of the encrypted dictionaries to determine a single class to which the encrypted image vector belongs:
where,
[Math. C1]
An image recognition server according to the present disclosure includes, an input unit configured to receive an encrypted image vector and a plurality of encrypted dictionaries, the encrypted image vector being generated by downsampling an encrypted testing image and transforming the encrypted testing image to a one-dimensional vector, the encrypted testing image being generated by encrypting a testing image using a random unitary matrix Qp generated by using a key p, the plurality of encrypted dictionaries being generated by encrypting, by using the random unitary matrix Qp, a plurality of dictionaries generated using different training images, and
a determination unit configured to solve, by Orthogonal Matching Pursuit, an optimization problem represented by Math. C1 for each of the plurality of encrypted dictionaries, to estimate a class, to which the encrypted image vector belongs, for each of the plurality of the encrypted dictionaries, and further configured to perform ensemble learning on the class estimated for each of the plurality of the encrypted dictionaries to determine a single class to which the encrypted image vector belongs.
An image recognition method according to the present disclosure is an image recognition method performed in an image recognition system including N terminals (N is an integer not less than 2), M transfer servers (M is an integer not less than 1), and an image recognition server, the method including,
when an image yij is expressed, by using a dictionary matrix Dij being an M×K matrix having, as elements thereof, K (K>M) bases, and a sparse coefficient Xij being a K-dimensional vector, as yij=Dij·Xij, where i is a class to which an image to be identified belongs, and j and k (k∈j) are numbers of the terminals,
encrypting, by the terminals, a testing image using a random unitary matrix Qp generated by using a key p, to generate an encrypted testing image, and transferring the encrypted testing image to a specified one of the transfer servers,
downsampling, by the transfer servers, the encrypted testing image, transforming the encrypted testing image to a one-dimensional encrypted image vector, and transferring the encrypted image vector to the image recognition server, and
using, by the image recognition server, a plurality of encrypted dictionaries generated by encrypting, by using the random unitary matrix Qp, a plurality of dictionaries generated using different training images, to solve, by Orthogonal Matching Pursuit, an optimization problem represented by Math. C1 for each of the plurality of encrypted dictionaries, to estimate a class, to which the encrypted image vector belongs, for each of the plurality of the encrypted dictionaries, and performing ensemble learning on the class estimated for each of the plurality of the encrypted dictionaries to determine a single class to which the encrypted image vector belongs.
The image recognition system includes a computationally non-intensive encryption algorithm based on random unitary transformation and achieves a high level of security by performing, in encrypted regions, entire processing from the dictionary creation stage to the recognition stage. With the image recognition system, a dictionary can be easily created because complex methods, such as K-SVD, are not used at the dictionary creation stage and machine learning is performed using only encrypted training images. In addition, the image recognition system achieves high recognition performance by using ensemble learning to integrate recognition results based on the dictionaries of different devices.
Thus, the present invention can provide an image recognition system, an image recognition server, and an image recognition method having a new high security framework that can achieve utilization of multi-device diversity.
The present invention can provide an image recognition system, an image recognition server, and an image recognition method having a new high security framework that can achieve utilization of multi-device diversity.
Embodiments of the present disclosure will be described with reference to the accompanying drawings. The embodiments described below are examples of the present invention and the present invention is not limited to the embodiments described below. Note that components with the same reference signs in the specification and the drawings are assumed to be the same components.
The present embodiment describes a method and apparatus for concealment computation for sparse coding, for privacy protection.
The method is a concealment computation method for sparse coding which is intended to be used at Edge/Cloud, and allows direct usage of a number of software applications widely used and protection of user's privacy. The method will be described in the following order.
(1) Formulation of Sparse Coding
1. Formulation of Sparse Coding
In sparse coding, as illustrated in
y∈M (0)
is assumed to be expressed as a linear combination of K bases.
y=Dx (1)
where,
D={d
1, . . . , dK}∈M×K (1-1)
is an M×K dictionary matrix having, as its elements, bases di (1≤i≤K) which are column vectors, and
x={x
1, . . . , xK}∈K (1-2)
represents sparse coefficients. Note that, “K” represents the number of bases used for the linear combination of Equation (1).
Only a small number of the sparse coefficients are non-zero coefficients, and the other large number of coefficients are zero. Such a state in which a small number of non-zero elements exist as compared to the total number of elements is referred to as sparse. A dictionary matrix D is provided in advance or is estimated adaptively by using learning based on observation data.
Typically, an overcomplete dictionary matrix in which the relationship, K>M is satisfied (i.e., the number of bases is larger than the dimensions of the observed signal) is used. For the expression y=Dx when the number of bases is larger than the dimensions of the signal, the uniqueness of x cannot be ensured. Thus, the bases to be utilized to express the observed signal y is typically limited to some bases included in D. That is, when the following expression denotes an l0 norm of x, that is, the number of non-zero components of the vector x,
∥x∥0 (1-3)
sparse coding is typically formulated as the following optimization problem.
However, this problem is a combinational optimization problem whose optimal solution cannot be obtained without testing all basis combinations, and is known to be NP-hard. Thus, a relaxation problem to l1 norm
is typically considered. This h norm regularized problem can be expressed as a linear programming problem.
The sparse coding can be considered as two separate problems, which are “dictionary design problem (D design)” and “choice of sparse coefficients (x).”
For the dictionary design problem, it is possible to use a dictionary matrix with bases prepared in advance, by using, for example, discrete cosine transform, Fourier transform, wavelet transform, or curvelet transform, or it is possible to use a dictionary matrix obtained by learning bases from signals. Exemplary approaches of dictionary learning for sparse coding include Method of Optimal Direction (MOD) and K-Singular Value Decomposition (K-SVD). MOD uses a pseudo-inverse matrix to minimize a squared error between y and Dx. K-SVD can be regarded as a generalized k-means method and is proposed as an iterative algorithm faster than MOD.
Orthogonal Matching Pursuit (OMP) and Iterative. Reweighted Least Squares (IRLS) are well known as a selection algorithm of sparse coefficients.
2. System Configuration for Concealment Computation for Sparse Coding
Architecture for performing concealment computation for sparse coding in an edge/cloud processing unit 12 is illustrated in
In the pre-preparation of
{circumflex over (D)} (3-1)
which is transmitted to the edge/cloud processing unit 12. Note that the concealed dictionary matrix may also be denoted herein as “D{circumflex over ( )}”.
In other words, in the pre-preparation, the local processing unit 11 performs a dictionary matrix transformation procedure in which the dictionary matrix D, which has been provided in advance or obtained from training using observed signals, is subjected to concealment processing by using a random unitary matrix Qp, which is an M×M matrix generated by using a key p, to transform the dictionary matrix D to a concealed dictionary matrix D{circumflex over ( )}, and the concealed dictionary matrix D{circumflex over ( )} is stored.
In execution of the concealment computation for sparse coding in
ŷ (3-2)
which is transmitted to the cloud. Note that the concealed observed signal may also be denoted herein as “y{circumflex over ( )}”.
In other words, in the execution of the concealment computation for sparse coding, the local processing unit 11 performs an observed signal transformation procedure in which the observed signal vector y is subjected to concealment processing by using the random unitary matrix QP to transform the observed signal vector y to the concealed observed signal y{circumflex over ( )}.
Then, the edge/cloud processing unit 12 executes an OMP algorithm using the concealed dictionary matrix D{circumflex over ( )} and the concealed observed signal y{circumflex over ( )}, which have been transferred in advance, to estimate sparse coefficients. In other words, in the execution of the concealment computation for sparse coding, the edge/cloud processing unit 12 performs a computation procedure in which an optimization problem of Math. C1 is solved using Orthogonal Matching Pursuit by using the concealed observed signal y{circumflex over ( )} and the stored concealed dictionary matrix D{circumflex over ( )}, to determine sparse coefficients x{circumflex over ( )} that approximates the sparse coefficients x.
Note that the edge/cloud processing unit 12 may perform post processing. Post processing is an application-specific necessary process performed by using the estimated sparse coefficients, examples of which include processing of media signals such as image/acoustic signals, analysis of biological signals such as brain waves, brain blood flow, or fMRI, and machine learning.
3. Concealment Computation Method for Orthogonal Matching Pursuit (OMP)
For a given observed signal y and dictionary D, a problem of finding coefficients x which gives Dx approximating y is called a sparse coding problem (in a narrow sense). Here, the optimization problem of Expression (2) is considered as a problem of approximating the signal with a linear combination of as few bases as possible without reconstruction error exceeding a certain threshold.
where, ε is a desired value of the difference between the product of the dictionary matrix D and the sparse coefficients x, and the sample signal y.
To solve this problem, a number of algorithms have been proposed, such as a method based on a greedy algorithm and a method using relaxation from lo norm constraints to l1 norm constraints.
The present disclosure proposes concealment computation for Orthogonal Matching Pursuit (OMP), which is widely used as a selection algorithm for sparse coefficients. The Orthogonal Matching Pursuit is an algorithm that finds a “support” S, which is an index set of non-zero coefficients, from an index set of coefficients to be utilized to approximate the observed signal. Initially, the support is set as an empty set, then a basis is newly added to the support set one by one so as to minimize the residual error of approximation of the observed signal y by a linear combination of the bases, and if the residual error of the approximation of the signal only by the bases included in the support is equal to or less than E, the algorithm is stopped. This algorithm is a greedy algorithm in which bases that contribute to reduction of the residual error is selected one by one, and does not ensure optimality of the solution. However, this algorithm is known to give excellent approximation in the majority of cases.
Parameters are initialized.
Here, k will be described. In the sparse coding, the observed signal y is approximated by a linear combination of bases di, as represented in Equation (1). In the OMP algorithm, bases selected from K bases di (i=1, 2, . . . , K) are added one by one to the support S in an order starting from a basis which gives best approximation. The variable denoting the number of bases in this context is “k”.
For example, k=1 means that a single basis is used to express the observed signal y, where di may be d1 or may be d3. In addition, k=2 means that two bases are used to express the observed signal y, where di may be d1 and d5 or may be d1 and d3.
Step S01 is a step for initialization, and thus k is set to 0, the sparse coefficient x is set to a zero vector, a residual error r is set to the observed signal y, and the support S is set to an empty set.
Step S02
k is set to k+1.
The error when adding the k-th basis di to the support S is calculated.
The support S is updated.
Search for the best solution X−k in the support S is performed.
The residual error r is updated.
The algorithm checks whether the updated residual error r is less than the desired value E.
∥rk∥<ϵ (9)
If the updated residual error r is less than the desired value, the result of the search in step S05 is set as the solution.
To facilitate subsequent analysis, the basis vector di is defined as follows,
d=Dδi (10)
where, δi is the following column vector,
δi=[(0, . . . , 0, δ(i), 0, . . . , 0)]T (10-1)
where i-th element is 1 and the other elements are zero. The approximation error of Equation (5) is expressed as follows using δi,
A random unitary matrix will be explained before describing the concealment computation for Orthogonal Matching Pursuit (OMP) using a random unitary matrix according to the present disclosure.
Basic properties of the concealment computation using a random unitary matrix will be described. In previous studies, template protection using random unitary transformation has been studied as a method for cancelable biometrics. Typically, in the concealment computation using a random unitary matrix, transformation T (●) is performed using the random unitary matrix Qp generated using the key p to transform an N-dimensional signal fz (z=1, . . . , L) to the following N-dimensional concealed signal.
{right arrow over (∫)}2=(∫2, p)Qp∫2 (11)
Note that the concealed signal may also be denoted herein as “fz{circumflex over ( )}”.
Qp is an N×N matrix,
Q
p∈N×N (11-1)
and satisfies the following equation,
Q
p
*Q
p
=I (12)
where [●]* represents Hermitian transpose and I represents an identity matrix. N is any natural number, and in the present embodiment, N=M. “L” is the number of signals (the number of samples). For example, in the case of audio signals, z corresponds to time, and signals t (having N elements) are L sample signals for time z=1 to L (there are L signals fz having N elements). In the case of images, for example, z can be defined as an index of each image, and there are L image signals from f1 to fL. Alternatively, a single image may be divided into small blocks and the small blocks may be indexed.
Regarding generation of the random unitary matrix Qp, use of Gram-Schmidt orthogonalization, and a method in which Qp is generated by combining a plurality of unitary matrices have been studied. Vectors a and b, which are two observed signals fz and fw and vector a{circumflex over ( )} and b{circumflex over ( )} obtained as a result of transformation using the random unitary matrix Qp satisfy the following relationship.
Property 1: Preservation of Euclidean distance
Property 1: Preservation of Euclidean distance
a
*
b={circumflex over (a)} ̆{circumflex over (b)} (14)
Property 3: Preservation of correlation coefficient
Property 4: Norm invariance
Concealment Computation for Orthogonal Matching Pursuit (OMP)
In the concealment computation for sparse coding according to the present embodiment, the concealed observed signal y{circumflex over ( )} and the concealed dictionary matrix D{circumflex over ( )} are generated as follows.
ŷ=T(y, p)=Qpy (16)
{circumflex over (D)}=T(D, p)=QpD (17)
For a given y{circumflex over ( )} and D{circumflex over ( )}, an optimization problem of the following equation is considered, instead of Equation (4):
By solving the equation using the Orthogonal Matching Pursuit, the sparse coefficients
{circumflex over (x)} (18-1)
can be obtained. Note that the sparse coefficient of the Expression (18-1) may also be denoted herein as “x{circumflex over ( )}”.
Here, proof that the sparse coefficients xA are identical to the sparse coefficients x obtained from the unconcealed observed signal y and the unconcealed dictionary matrix D will be described. The concealment computation algorithm of Orthogonal Matching Pursuit is as shown in
Parameters are initialized.
Step S01 is a step for initialization, and thus k is set to 0, the sparse coefficient x is set to a zero vector, a residual error r{circumflex over ( )} is set to the concealed observed signal y{circumflex over ( )}, and the support S is set to an empty set.
k is set to k+1.
The error when adding the k-th basis d, to the support S is calculated. Here, by replacing, in Equation (5A), the dictionary matrix D and the residual error rk−1 with the concealed D{circumflex over ( )} and r{circumflex over ( )}k−1, and using the relational equations (16) and (17), the approximation error can be expressed by the following equation,
According to the above-described properties of random unitary matrix, norm is invariant and thus,
and by the preservation of inner product, the following relationship holds.
{circumflex over (D)}*{circumflex over (D)}=D*D (19-3)
Thus, Equation (19) can be rewritten as follows.
Equation (20) is the same as Equation (5A). In other words, the approximation error ε{circumflex over ( )}(i) calculated using the concealed signals y{circumflex over ( )} and D{circumflex over ( )} is identical to the approximation error ε(i) calculated using the original signals (y and D).
Step S04
The support S is updated. By ε{circumflex over ( )}(i)=ε(i), the following equation holds.
Step S05
Search for the best solution x{circumflex over ( )}k in the support S is performed.
E
2
=∥ŷ−{circumflex over (D)}
S
x
S
∥22 (22)
By solving minimization for Expression (22-1) included in Equation (22),
the following equation is given.
By preservation of inner product of Equation (14), the following relationship holds,
and thus Equation (23) can be rewritten as follows.
{circumflex over (x)}
k=(DS
Equation (24) is identical to Equation (7). In other words, the best solution x{circumflex over ( )}k in the support obtained using the concealed signals y{circumflex over ( )} and D{circumflex over ( )} is equal to the best solution X−1(when the original signals (y and D) are used.
Step S06
The residual error r{circumflex over ( )} is updated. Replacing Equation (8) using the concealed signals results in the following equation.
By the definitional Equations (16) and (17) and the best solution in support, x{circumflex over ( )}k=x−k, the following equation holds.
Here, as Equation (8) holds, Equation (25) can be expressed as the following equation by using the error rk when using the original signal.
{circumflex over (r)}k=Qprk (25-1)
Step S07
The algorithm checks whether the updated residual error r{circumflex over ( )} is less than the desired value ε.
∥{circumflex over (r)}k∥2<ϵ (25-2)
The algorithm terminates if Equation (25-2) is satisfied. By Equation (25-1) and the property of norm invariance, the following relationship holds.
That is, a stopping rule for the concealed signals y{circumflex over ( )} and D{circumflex over ( )} is the same as a stopping rule for the original signals (y and D).
As described above, the sparse coefficients x{circumflex over ( )} calculated using the concealed signals, is proven to be equal to the sparse coefficients x calculated using the original signals.
4. Specific Configuration of System for Concealment Computation for Sparse Coding
The local processing unit 11 includes a dictionary learning unit 21, a random unitary transformation unit 22, a cache unit 23, and a transmission unit 24. The dictionary learning unit 21 receives the observed signal y for training, and performs learning using a K-SVD method or the like to generate the dictionary matrix D. Note that, if the dictionary matrix D is provided in advance, the dictionary learning unit 21 is not required. The random unitary transformation unit 22 transforms the dictionary matrix D to the concealed dictionary matrix D{circumflex over ( )} by subjecting the dictionary matrix D to concealment processing by using the random unitary matrix QP, which is an M×M matrix generated using the key p. The cache unit 23 temporarily stores the concealed dictionary matrix D{circumflex over ( )} generated by the random unitary transformation unit 22. The transmission unit 24 transmits the concealed dictionary matrix D{circumflex over ( )} to the edge/cloud processing unit 12.
The edge/cloud processing unit 12 includes a reception unit 25 and a database unit 26. The transmission unit 25 receives the concealed dictionary matrix D{circumflex over ( )} that has been transmitted from the local processing unit 11. The database unit 26 stores the concealed dictionary matrix D{circumflex over ( )}.
In these procedures, the dictionary learning unit 21 is not required in the local processing unit 11. The random unitary transformation unit 22 transforms the observed signal vector y to the concealed observed signal y{circumflex over ( )} by subjecting the observed signal vector y to concealment processing by using the random unitary matrix QP. The cache unit 23 temporarily stores the concealed observed signal y{circumflex over ( )} generated by the random unitary transformation unit 22. The transmission unit 24 transmits the concealed observed signal y{circumflex over ( )} to the edge/cloud processing unit 12.
In addition to the reception unit 25 and the database unit 26, the edge/cloud processing unit 12 includes a main loop unit 13, an initialization unit 31, and a sparse coefficient output unit 37. The transmission unit 25 receives the concealed observed signal y{circumflex over ( )} that has been transmitted from the local processing unit 11. When the transmission unit 25 receives the concealed observed signal y{circumflex over ( )} for the first time, the transmission unit 25 transfers the concealed observed signal y{circumflex over ( )} to the initialization unit 31, and causes the initialization unit 31 to perform the step S01 of the concealment computation algorithm for Orthogonal Matching Pursuit of
The main loop unit 13 calculates the sparse coefficient x{circumflex over ( )} using the concealed observed signal y{circumflex over ( )} and the concealed dictionary matrix D{circumflex over ( )} stored in the database unit 26. The main loop unit 13 includes an approximation error calculation unit 32, a support update unit 33, a best solution search unit 34, a residual error update unit 35, and a computation stop unit 36.
The approximation error calculation unit 32 calculates the approximation error ε{circumflex over ( )}(i) (step S03 in
The best solution search unit 34 searches for the best solution x{circumflex over ( )}k in the support S using the support S updated by the support update unit 33 and the concealed dictionary matrix D{circumflex over ( )} stored in the database unit 26 (step S05 in
That is, the concealment computation method for sparse coding described in the present embodiment is a calculation method of approximation of the sparse coefficient x by using the concealed observed signal y{circumflex over ( )} obtained by subjecting the observed signal y to concealment processing by using the random unitary matrix Qp, and the concealed dictionary matrix D{circumflex over ( )} obtained by subjecting the dictionary matrix D to concealment processing by using Qp, to solve the optimization problem by Orthogonal Matching Pursuit (OMP).
5. Effects of the Present Concealment Computation Method
With the concealment computation method for sparse coding described in the present embodiment, it is possible to perform sparse coding by utilizing edge/cloud computational resources while protecting privacy. The concealment computation method for sparse coding described in the present embodiment can estimate sparse coefficients by using observed signals and a dictionary matrix which are concealed for privacy protection, and can be used with widely used applications.
The present embodiment is an image recognition system achieved by applying the concealment computation method for sparse coding described in the first embodiment, to image recognition, in particular to facial image recognition for identifying the face of a person.
A face image recognition technology that uses sparse coding to classify facial images is disclosed in, for example, NPL 2.
An equation shown in
[1] System Model
In this section, architecture of a system using edges and a cloud will be described first.
An image yij is assumed to be expressed, by using a dictionary matrix Dij being an M×K matrix having, as elements thereof, K (K>M) bases, and a sparse coefficient Xij being a K-dimensional vector, as yij=Dij·Xij, where i is a class to which an image to be identified belongs, and j and k (k∈j) are numbers of terminals 111.
The terminal 111 encrypts a testing image using a random unitary matrix Qp generated by using a key p, to generate an encrypted testing image, and transfer the encrypted testing image to a specified one of the transfer servers 112.
The transfer server 112 downsamples the encrypted testing image, transform the encrypted testing image to a one-dimensional encrypted image vector, and transfer the encrypted image vector to the image recognition server 113.
The image recognition server 113 uses a plurality of encrypted dictionaries generated by encrypting, by using the random unitary matrix Qp, a plurality of dictionaries generated using different training images, to solve, by Orthogonal Matching Pursuit, an optimization problem represented by Math. C1 for each of the plurality of encrypted dictionaries, to estimate a class, to which the encrypted image vector belongs, for each of the plurality of the encrypted dictionaries, and performs ensemble learning on the classes estimated for each of the plurality of the encrypted dictionaries to determine a single class to which the encrypted image vector belongs.
where,
[Math. C1a]
First, a method for facial recognition based on a sparse representation will be described, and an optimization problem under privacy constraints (encrypted state) is formulated. Note that in the following description, “terminal” may be referred to as “device”, “transfer server” may be referred to as “edge server”, and “image recognition server” may be referred to as “cloud.”
A. Edge and Cloud System
In the system 301, the N mobile devices 111 are connected to the single remote cloud server 113 via the M edge servers 112. The mobile device 111 executes an application that includes facial recognition, such as an interactive game or a virtual reality application.
Each mobile device j has Bij training samples for class i, which is one of L classes of people denoted by “L” (see NPL 2). Each of the edge servers 112 is a lightweight computing at a wireless access point. On the other hand, the remote cloud 113 has a more powerful processor and is connected with the edge servers 112 using a backbone network.
In computing utilizing edges and a cloud, the mobile device 111 offloads its computing tasks onto an edge server 112 that is proximate via the wireless channel. Instead of the mobile device 111, the edge server 112 performs computational tasks, along with the cloud server 113.
B. Sparse Representation of Facial Image
Facial recognition is to determine a class to which a new test sample belong, using labeled training samples for L different classes (face images of L persons). In the present embodiment, a subspace model defined in NPL 2 is used and the definition is as follows.
The following definitions are provided.
The training sample Bij can be expressed as the sum of b column vectors d(i, n)j (more correctly, for the device j of the class i (the i-th person), a weighted linear sum of the b column vectors). The dictionary Dij can be formulated as represented in Math. 2-0. Almost all of the test samples from the same class which can be represented by Math. 2-1, are in a subspace having a range of the Bij.
where, X,J is a weight of each element. —Definition 1 End—
Note that the term “subspace” refers to a space represented by a linear weighted sum based on the dictionary Dij represented in Expressions (2-0) and the corresponding coefficients Xij. In other words, a “subspace” is a space defined by Equation (1), which expresses the device j of the class i (the i-th person).
The dictionary Dj of the device j is defined as follows.
According to definition 1, any testing image y can be sparsely represented on the dictionary Dj.
where Xj is the sparse coefficients.
[Math. 2-3a]
If Math. 2-3a holds and Dj is an overcomplete matrix, the solution of Equation (2-3) cannot be determined uniquely. This problem can be solved by solving the following l0 minimization problem.
where ε represents sparsity constraint (desired value of error). The above optimization problem can be solved efficiently by using Orthogonal Matching Pursuit (OMP).
C. Formulation of the Problem
When the sample data yj and j∈N are given, the sparse coefficients X{circumflex over ( )}j is calculated using Math. 2-4. Ideally, if the elements of X{circumflex over ( )}j are non-zero, it is associated with a dictionary Dj column of one class. For example, if the following specification is provided,
yj can be substituted into the class i. However, due to noise and modeling errors, there is a non-zero small entry associated with another class. To address this problem, the following definition is used.
[Math. 2-4b]
δlj=[0, . . . , 0, 1, . . . , 1, 0, . . . , 0], j∈, l∈ (2-4b)
The non-zero entries do not only correspond to the entries of Xj, but it only relates to the l-th class. The test sample yj can be approximated by Math. 2-4c by using only the coefficients corresponding to the l-th class, and yj can be classified according to the optimization problem of Math. 2-5.
where, ni(yj) represents the approximation error specific to each class.
NPL 2 states that such a method is effective in facial recognition. However, this classification is performed based only on a dictionary of each device, and is thus vulnerable to noise and modeling errors.
An object of the present invention is to construct a framework for minimizing reconstruction errors by utilizing multi-device diversity while ensuring security in an edge and cloud system, as described above. This can be formally formulated as follows.
where f(●) is an encryption function, p is a key for encryption,
represents sparse coefficients of y{circumflex over ( )}k in Di.
At Math. 2-6, the first equation means a framework for minimizing a reconstruction error, the second and third equations means ensuring security (encryption), and the fourth equation means that there is no difference in accuracy of computation between a plaintext region and an encrypted region.
[2] Safe Sparse Representation
This section describes a high security framework for sparse representation using edges and a cloud for facial recognition. Random unitary transformation that satisfies the privacy protection constraint of Equation (6) will be introduced and three important properties of the random unitary transformation will be described. Based on the properties, it will be described that the result of facial recognition is not affected by the random unitary transformation. Furthermore, a framework of ensemble learning to utilize multi-device diversity will be described. The sparse representation and related reconstruction errors are calculated according to each dictionary in the cloud, which serves as a member classifier. These member classifiers solve Equation (6) and inform a determiner of the results, to improve the accuracy of classification.
A. Random Unitary Transformation
Not only for achieving privacy protection in the system, but also for realizing algorithm operating for the encrypted region, the random unitary transformation is one of promising methods. The random unitary transformation has proven to be effective for biometric template protection and network BMI coding (see, e.g., NPL 3).
Any vector v∈Rm×1 encrypted by using a random unitary matrix Qp∈Cm×m having a private key p can be expressed as follows:
where, Q (which may also be denoted herein as v*) is an encrypted vector, and the unitary matrix Qp is defined as follows,
where [●]* represents a Hermitian transpose matrix and I represents an identity matrix. Gram-Schmidt orthogonalization can be used to generate Qp. The encrypted vector has the following three properties:
B. Safe Sparse Representation and Recognition
A training sample
[Math. 2-11a]
and a sample for testing
[Math. 2-11b]
, j∈ (2-11b)
are encrypted as follows by using the random unitary transformation.
Math. 12 means encrypting an image as shown in
In addition, by Math. 2-2, the dictionary is encrypted as follows.
To obtain a safe sparse representation, an optimization problem for an encrypted region is formulated.
Note that the solution X{circumflex over ( )}(j, k) of Math. 2-14 is equal to the solution X(j, k) of a case where no encryption is performed.
Here, regarding the following theorem, it will be proved that the encryption does not affect the result of the facial recognition.
[Theorem 1] The result (Math. 2-14a) of solving Math. 2-6 is the same as the result (Math. 2-14b) of solving Math. 2-15.
Proof: When (Math. 2-14a) is observed to be usually small, Math. 2-16 is obtained.
By the properties of the unitary transformation, the following equations can be obtained.
Thus, Math. 2-16 can be transformed as follows.
This is the same as Math. 2-14b.
C. Ensemble Learning Framework
The term, “ensemble learning” as used herein is learning that improves prediction capability for novel data by combining training results obtained individually using separate member classifiers. In a specific example of ensemble learning, the plurality of encrypted dictionaries D{circumflex over ( )}j are separately used to perform class estimation, and a class which gets the most estimation results may be determined as a final class (majority vote). Alternatively, the plurality of encrypted dictionaries D{circumflex over ( )}j may be separately used to solve the optimization problem of Math. 2-6 to determine the reconstruction error (Math. 2-18) of y{circumflex over ( )}k, and a class which gives the smallest value for Math. 2-6 may be found among classes estimated by separately using the plurality of encrypted dictionaries D{circumflex over ( )}j, to be determined as the final class.
rlj(
The ensemble learning framework includes algorithm 1 of
d(i, n)j is any training image therein.
Dictionary Learning Stage
The dictionary learning stage includes three steps, as illustrated in
The encrypted dictionary has a smaller data size compared to the training sample. The training image itself is not transmitted to the cloud, and thus the network bandwidth required between the edge server and the cloud can be reduced.
Recognition Stage
First, a device k encrypts the testing image and transmits the encrypted testing image to the specified edge server 112. Then, the edge server 112 downsamples and shrinks the encrypted testing image, and creates a one-dimensional encrypted testing vector y{circumflex over ( )}k. The edge server 112 transmits the encrypted testing vector y{circumflex over ( )}k to the cloud server 113.
Each of the encrypted dictionaries functions as a member classifier in the ensemble learning framework. When the cloud server 113 receives the encrypted testing vector y{circumflex over ( )}k, the member classifier 121-j (j∈N) solves Math. 2-14 to calculate the sparse representation X{circumflex over ( )}(i, k). Furthermore, the member classifier 121-j uses OMP to solve Math. 2-5 (or Math. 2-17) to determine a class to which the encrypted testing vector belongs, and calculates a reconstruction error (Math. 2-14a). The classification result and the reconstruction error are passed to the determiner 122. Finally, the determiner 122 solves Math. 2-6 for the classification result and the reconstruction error from each of the member classifiers to combine the results from the member classifiers (ensemble learning), and determines a class which gives the smallest reconstruction error.
Simulation results of comparing the image recognition method described in the embodiments with other image identification methods are shown in
With the image recognition system according to the present disclosure, it is possible to make the most of advantages of edge and cloud computing by utilizing multi-device diversity. To improve the performance of facial recognition, results based on dictionaries from different devices are combined to improve recognition performance.
It is very important to prevent privacy leakage, especially when sharing of calculation results at a cloud is allowed. The image recognition system according to the present disclosure has a framework for facial recognition that utilizes edges and cloud and that is based on sparse representation in which privacy protection is ensured.
(1) Privacy Protection by Random Unitary Transformation
As one of privacy protection methods, so-called secure computing, which is a method for computing encrypted data without decrypting it, has been actively studied. The secure computing is typically performed based on a multiparty protocol or homomorphic encryption. However, these methods have problems of difficulty in division, computational efficiency, and computational accuracy, and thus are not sufficiently utilized and used in only in limited applications, such as sorting processes and some statistical analyses. To address these problems, it is possible to employ a computationally non-intensive encryption algorithm based on random unitary transformation. The present embodiment describes both theoretical proof and demonstration by simulation that such encryption does not affect results of facial recognition.
The performance of a dictionary-based facial recognition algorithm depends significantly on the number of training samples. In such a case, it is difficult to gather all training samples at the cloud, due to reasons as to bandwidth and storage costs. In the present embodiment, the diversity provided by the cloud is utilized to integrate, through ensemble learning, only the recognition results based on the dictionaries generated at different devices. Further, as a result of the simulation, it is proved that the present embodiment is robust (noise-robust) and can achieve a high recognition rate.
11: Local processing unit
12: Edge/cloud processing unit
13: Main loop unit
21: Dictionary learning unit
22: Random unitary transformation unit
23: Cache unit
24: Transmission unit
25: Reception unit
26: Database unit
31: Initialization unit
32: Approximation error calculation unit
33: Support update unit
34: Best solution search unit
35: Residual error update unit
36: Computation stop unit
37: Sparse coefficient output unit
111: Terminal (device)
112: Transfer server (edge server)
113: Image recognition server (cloud server)
121: Member classifier
301: Image recognition system
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/022641 | 6/6/2019 | WO |