This patent application claims the benefit and priority of Chinese Patent Application No. 202211571546.3, filed with the China National Intellectual Property Administration on Dec. 08, 2022, the disclosure of which is hereby incorporated herein by reference in its entirety as part of the present application.
The present disclosure belongs to the field of electroencephalogram (EEG) signal recognition in the field of biometric feature recognition, and particularly relates to a task-independent brainprint recognition method based on feature disentanglement by decorrelation that is intended to obtain identity information independent of task information based on feature disentanglement by decorrelation and utilize a brainprint feature for identity recognition by adversarial self-supervision.
Biometric recognition is dependent on an individual feature and plays a vital role in an authentication system. While physical biometric recognition techniques such as face recognition and fingerprint recognition have been extensively applied to the real life, potential hazards of elaborate forgery or secret reproduction are still inevitable. In addition to physical biometric recognition, brain activity recorded by an electroencephalogram (EEG) is proposed as a new cognitive biometric feature, which meets a basic recognition requirement and referred to as “brainprint”. Moreover, only a living individual can provide brain activity signals, and these signals are not controlled by a user. This means that the identity information of the user may not be leaked or stolen intentionally. Accordingly, an EEG-based biometric recognition technique can be used in applications with high security requirements.
In recent years, according to a type of task stimulus received by a subject, brainprint recognition may be roughly divided into four major categories: brainprint recognition based on a resting potential (RP), brainprint recognition based on a visual evoked potential (VEP), brainprint recognition based on movement imagery (MI), and brainprint recognition based on an event-related potential (ERP). These brainprint recognition categories still have some problems. Brainprint recognition based on an external stimulus requires that a subject has no corresponding physiological defects and can receive an external stimulus. Moreover, these brainprint recognition categories are directed at particular task stimuli, can be hardly promoted and used in the reality, and have limitations. Compared with existing methods, a task-independent brainprint recognition method based on feature disentanglement by decorrelation is proposed, which is intended to disentangle identity information and task information in an EEG and fully utilize a brainprint feature to realize highly robust brainprint recognition across tasks.
In view of the shortcomings of the prior art, an objective of the present disclosure is to provide a task-independent brainprint recognition method based on feature disentanglement by decorrelation. The task-independent brainprint recognition method based on feature disentanglement by decorrelation is mainly intended to disentangle identity information and task information in an electroencephalogram, extract independent identity information, and fully utilize the identity information by adversarial self-supervision.
In a first aspect, the present disclosure provides a task-independent brainprint recognition method based on feature disentanglement by decorrelation, specifically including the following steps:
step 1, preprocessing original EEG data and establishing a data set;
step 2, establishing an EEG feature extraction network for extracting a multi-scale time-frequency-space feature of a brainprint, specifically including:
step 2-1, dividing each EEG sample into low-frequency, high-frequency, and full-frequency subsamples by a low frequency band, a high frequency band, and a full frequency band;
step 2-2, separately subjecting low-frequency, high-frequency, and full-frequency time-frequency features to two layers of one-dimensional time-domain convolutions and one layer of one-dimensional frequency-domain convolution having different kernel sizes to extract advanced time-frequency brainprint features of low-frequency, high-frequency, and full-frequency EEGs;
step 2-3, obtaining a time-frequency brainprint feature {ƒi∈n×c}i=13 of each EEG sample by extracting a time-domain brainprint feature and a frequency-domain brainprint feature, where n represents a number of hidden layers, and c represents a number of EEG channels; and
step 2-4, splicing the time-frequency brainprint features by frequency-domain dimensions to obtain a time-frequency feature ƒts∈n×c×3, then performing channel-wise and spatial convolutions on the time-frequency feature ƒts with a two-dimensional convolution kernel of (c×3), and outputting the multi-scale time-frequency-space feature ƒ∈
n of the brainprint;
step 3, establishing a primary brainprint and task disentangling neural network model;
where the primary brainprint and task disentangling neural network model includes a primary brainprint disentangling neural network and a primary task disentangling neural network that are concurrent;
the primary brainprint disentangling neural network includes an EEG feature extraction network Gf and an identity discriminator Cf; the EEG feature extraction network Gf is configured to extract identity information, and is the EEG feature extraction network established in step 2; the identity discriminator Cf is configured for identity recognition, and includes a main classifier C and an auxiliary classifier Cs that are concurrent; and the main classifier C and the auxiliary classifier Cs are both fully connected layers;
a loss function of the EEG feature extraction network Gf is as follows:
where s represents the loss function; n represents a number of samples; m represents a number of subjects; and {circumflex over (P)} represents a probability that input data xi belongs to subject m;
the primary task disentangling neural network includes an EEG feature extraction network Gt and a task classifier Ct; the EEG feature extraction network Gt is configured to extract task information, and is the EEG feature extraction network established in step 2; and the task classifier Ct is configured for task discrimination, and is a fully connected layer;
step 4, training the primary brainprint and task disentangling neural network model training, specifically including:
step 4-1, establishing constraint conditions for the EEG feature extraction networks Gf and Gt: decorrelating the identity information S=Gƒ(x) and the task information A=Gt(x), specifically including:
(1) splicing an original identity feature matrix S and an original task feature matrix A output by the EEG feature extraction networks Gf and Gt to obtain a new matrix Q∈m
acquiring an independent representation for a kernel function by mapping original data to the RKHS, as shown below:
(x,·)=Σiαiφi(x)φi(·)=(√{square root over (αi)}φi(x), . . . )
(3)
where (·,·) represents a measurable symmetric positive definite kernel function; φ represents a mapping function; (·)
represents a Hilbert space; and αrepresents a feature value;
(2) detecting independence of the vectors qi, qj by the HSIC after acquiring the kernel function;
for random variables qi, qj and kernel functions 1,
2, defining the HSIC as:
HSIC
1,
2(qi, qj):=∥Cq1,
2∥F2 (4)
where C1,
2 represents the cross-covariance operator regarding the kernel functions
1 and
2 in the RKHS; ∥·μF is the Frobenius norm; and there exists HSIC
1,
2 (qi, qj)=0⇔qi⊥qj, with qi being independent of qj;
(3) optimizing the kernel function (x,·);
approximating the kernel function using random Fourier features (RFF) due to a high complexity of calculating the kernel function (x,·) in the Hilbert space, acquiring a dimension-reduced function by Fourier transform sampling to approximate the original kernel function, and capturing a nonlinear correlation of the two vectors qi, qj, specifically including:
mapping the vectors qi, qj to a low-dimensional Euclidean space using the RFFs by the following formula (5), and obtaining an inner product after the mapping as an estimated value of the kernel function; performing linear calculation using the RFFs to remove the nonlinear correlation to realize statistical independence of the features;
expressing a random Fourier function space RFF as:
RFF
={h:x→√{square root over (2)} cos(ωx+ϕ)|ω˜N(0,1), ϕ˜U(0,2π)} (5)
where ω represents sampling from a standard normal distribution N(0,1); ϕ represents sampling from a uniform distribution U(0,2π); and x represents the vectors qi, qj; and transforming the matrix Q into the RFFs by the formula (5) to approximate the kernel functions (x,·) of an identity feature and a task feature;
(4) detecting the independence;
assuming that there exist measurable spaces Ω1 and Ω2, where (1,
1) and (
2,
2) represent the RKHSs in Ω1 and Ω2, and correspondingly,
1 and
2 are also measurable, and a unique cross-covariance operator ΣXY exists in a space from
1 to
2, deriving:
g, Σ
XY
ƒ
=Coν(ƒ(X), g(Y))=XY[ƒ(X)g(Y)]−
X[ƒ(X)]
Y[g(Y)] (6)
where ƒ(X)∈1, g(Y)∈
2, and Coν(·) represents a covariance matrix;
as shown in the Formula (6), the calculation of ΣXY is expanded to the calculation of the covariance matrix over the Euclidean space, and ƒ(X) and g(Y) represent nonlinear kernel functions; due to ΣXY=0⇔X⊥Y, if a Hilbert-Schmidt norm of ΣXY is zero, X and Y are considered independent; since calculation is difficult for a kernel method, the RFFs are capable of providing a function space RFF to achieve the objective, and a cross-covariance matrix ΣXY is expressed as:
where
and u and v are elements in a random
Fourier space;
theoretically detecting the independence between the two vectors qi, qj, substituting the vectors qi, qj respectively as Xi and Yi into the formulas (6) and (7), where whether the cross-covariance operator ΣST regarding u(qi) and v(qj) tends to 0 needs to be determined, and the elements u and v in the random Fourier space are expressed as:
where nqRFF; and
establishing a cross-covariance matrix, minimizing the Frobenius norm of the cross-covariance matrix to achieve an objective of uncorrelation, and defining a loss function as:
dec=λΣ1≤i,j≤m
where hyper-parameter λ represents sigmoid ramp-up, which is in the following form according to a function with a number of training epochs increasing from 0:
λ(t)=e−5(1−t)
where t∈[0, epoch];
step 4-2, fully mining the identity information output by the primary brainprint disentangling neural network using an adversarial self-supervised network, specifically including:
step 4-2-1, inputting the identity feature S output by the EEG feature extraction network Gf to the adversarial self-supervised network H to obtain a mask representation; taking each dimension of the mask representation as a discrete random variable, and sampling each dimension to obtain an approximate K—hotvector, and defining β representing approximate sampling of a κ—hot vector by Gumbel-Softmax trick as:
β=Gumbel−Softmax(H(S), κN)∈N (12)
where κ∈(0,1); N represents dimensions obtained by the adversarial self-supervised network; a mask closest to 1 in κN after sampling from a result is taken as an important feature, and other mask is taken as a secondary feature; and
defining Gumbel-Softmax: for pre-defined τ>0, i531 {1, . . . , N}, p∈{1, . . . , κ}, deriving:
where π=H(S)∈N represents a probability vector; ε1p . . . εKp represent samples complying with a Gumbel distribution; and πi0, i∈1, . . . , N, Σiπi=1; and
step 4-2-2, multiplying the identity feature S output by the EEG feature extraction network Gf by masks β and 1-β to obtain an important dimension feature and a secondary dimension feature, and utilizing the main classifier C and the auxiliary classifier CS to train the adversarial self-supervised network H to fully utilize a discriminable identity feature;
redefining s of the formula (2)
as a loss function of the main classifier C:
s
sup
=l(C(S○β),S) (14)
and as a loss function of the auxiliary classifier CS:
S
inƒ
=l(CS(S○(1−β)), S) (15)
where ○ represents point multiplication;
firstly minimizing the loss functions to train and optimize the EEG feature extraction networks Gf, Gt, the main classifier C, and the auxiliary classifier CS; and secondly minimizing ssup and maximizing
sinƒ to train the adversarial self-supervised network H:
step 5, verifying and testing the trained primary brainprint and task disentangling neural network model; and
step 6, performing brainprint recognition on an EEG using the trained, verified, and tested primary brainprint and task disentangling neural network model.
Preferably, step 1 may specifically include:
step 1-1, downsampling EEG data to 250 Hz, and filtering the original EEG data at 0 to 75 Hz using a Butterworth filter;
step 1-2, performing short-time Fourier transform on the EEG data x processed in step 1-1 to extract time-frequency features;
step 1-3, intercepting the time-frequency features obtained in step 1-2 using a time window, adding to each time-frequency feature a tag of a subject to which the time-frequency feature belongs and a corresponding task tag; and
step 1-4, using one task of EEG sample data obtained in step 1-3 as a test set {Xt, Yt, Ytl}, and proportionally dividing remaining samples into a training set {Xsj, Ysj, Yslj}j=1K and a validation set {Xν, Yν, Yνl}, where X, Y, Yl, and K represent a sample, an identity tag, a task tag, and a number of tasks, respectively.
Preferably, step 1-2 may specifically include:
using a time-limited window function h(t), assuming that a non-stationary signal x is stationary within one time window, performing piecewise analysis on the signal x by moving of the window function h(t) on a time axis to obtain a set of local “frequency spectra” of the signal, and defining the short-time Fourier transform of signal x(τ) as:
where STFT(t, ƒ) represents the short-time Fourier transform of the signal x(τ) in time t and frequency f, and h(τ−t) represents the window function.
Preferably, in step 2-1, the low frequency may be 1 to 12 Hz, and the high frequency may be 12 to 30 Hz.
Preferably, in step 2-3, the kernel sizes of the two layers of one-dimensional time-domain convolutions may be [(6×1), (7×1)], [(7×1), (11×1)], and [(15×1), (15×1)], respectively, and the kernel sizes of the one layer of one-dimensional frequency-domain convolution may be (12×1), (18×1), and (30×1), respectively.
Preferably, the loss functions (16) and (17) may be optimized by gradient back propagation.
In a second aspect, the present disclosure provides a task-independent brainprint recognition apparatus, including:
a data acquisition module configured to acquire EEG data;
a data preprocessing module configured to preprocess the EEG data; and
a brainprint recognition module configured to use a trained, verified, and tested primary brainprint and task disentangling neural network model.
In a third aspect, the present disclosure provides a computer-readable storage medium, storing a computer program, where the computer program, when executed in a computer, causes the computer to perform the task-independent brainprint recognition method based on feature disentanglement by decorrelation described above.
In a fourth aspect, the present disclosure provides a computing device, including a memory and a processor, where an executable code is stored in the memory, and the processor, when executing the executable code, performs the task-independent brainprint recognition method based on feature disentanglement by decorrelation described above.
The present disclosure has the following beneficial effects: for EEG data under different task stimuli, the present disclosure firstly allows for coarse-grained decomposition of identity information and task related information in the electroencephalogram. Further, in consideration of the non-stationarity of the EEG, a brainprint feature and a task feature are disentangled by decorrelation. Finally, by adversarial self-supervision, all identity related features are fully utilized as much as possible, rendering a model stable across time and across tasks.
To make the objective, technical solutions, and advantages of the present disclosure clearer, the present disclosure will be further described in detail with reference to the technical solutions of the present disclosure and the accompanying drawings.
The present disclosure provides a task-independent brainprint recognition method based on feature disentanglement by decorrelation, and a flowchart thereof is as shown in
Step 1, original EEG data is preprocessed.
1) An original EEG signal contains a noise frequency of typically lower than 0.5 Hz or higher than 50 Hz. To remove power frequency interference caused by an EEG acquisition device and myoelectric interference of a subject, the EEG data is downsampled to 250 Hz, and the original EEG data is filtered at 0 to 75 Hz using a Butterworth filter.
2) Short-time Fourier transform is performed on a signal x output in operation 1) to extract time-frequency features. A time-limited window function h(t) is used. Assuming that a non-stationary signal x is stationary within one time window, piecewise analysis is performed on the signal x by moving of the window function h(t) on a time axis to obtain a set of local “frequency spectra” of the signal. A specific window size of the present disclosure is 0.5 s. The short-time Fourier transform of the signal x(τ) is defined as:
STFT(t,f) represents the short-time Fourier transform of the signal x(τ) in time t and frequency f, and h(τ−t)represents the window function.
3) The EEG data obtained in 2) is intercepted using a time window of 15 s, and corresponding EEG sample data is added with a tag of a subject to which the EEG sample data belongs and a corresponding task tag.
4) One task of the EEG sample data obtained in 3) is used as a test set {Xt, Yt, Yt1}, and remaining samples are proportionally divided into a training set {Xsj, Ysj, Yslj}j=1K and a validation set {X84 , Yν, Yνl}, where X, Y, Yl, and K represent a sample, an identity tag, a task tag, and a number of tasks, respectively. The EEG sample is expressed as x∈c×s×t, where c represents a number of EEG channels; s represents a number of frequency-domain dimensions; and t represents a number of time-domain dimensions. Specifically, nine channels Fz, F7, F8, C3, C4, P7, P8, O1, and O2 are selected in the present disclosure; 1 to 30 Hz, and a sampling rate of 250 Hz are adopted. That is, c=9, s=30, and t=30.
Step 2, a neural network model for extracting multi-scale time-frequency brainprint features is established.
1) Each EEG sample is divided into three subsamples by three frequency bands: 1 to 12 Hz, 12 to 30 Hz, and a full frequency band, and time-domain brainprint features are extracted using one-dimensional time-domain convolution kernels of (1×21), (1×5), and (1×11), respectively.
2) Frequency-domain brainprint features of low-frequency, high-frequency, and full-frequency EEGs are extracted using two layers of one-dimensional frequency-domain convolution kernels of [(6×1), (7×1)], [(7×1), (11×1)], and [(15×1), (15×1)], respectively.
3) By the extraction of the time-domain and frequency-domain features, {ƒi∈n×c}i=13 time-frequency feature is obtained for each sample, where n represents a number of hidden layers.
4) The time-frequency brainprint features are spliced by frequency-domain dimensions to obtain a time-frequency feature ƒts∈n×c×3; channel-wise and spatial convolutions are then performed on the time-frequency feature fts with a two-dimensional convolution kernel of (9×3), and a time-frequency-space feature ƒ∈
n of the brainprint is output.
Step 3, a primary brainprint and task disentangling neural network model is established.
1) An EEG feature extraction network Gƒ described in step 2 is established to extract identity information, and a fully connected layer is used to establish an identity discriminator Cƒ for identity recognition, where the identity discriminator Cƒ includes a main classifier C and an auxiliary classifier Cs:
where s represents the loss function; n represents a number of samples; m represents a number of subjects; and {circumflex over (P)} represents a probability that input data xi belongs to subject m;
2) An EEG feature extraction network Gt described in step 2 is established to extract task information, and a fully connected layer is used to establish a task discriminator Ct for task discrimination.
Step 4, constraint conditions for the EEG feature extraction networks Gf and Gt during training are established: the identity information S=Gƒ(x) and the task information A=Gt(x) are decorrelated.
4) An original identity feature matrix S and an original task feature matrix A output by the EEG feature extraction networks Gf and Gt to obtain a new matrix. The matrix Q is then mapped to a high-dimensional reproducing kernel Hilbert space (RKHS), and a dependency relationship between any two vectors qi, qj in the matrix Q is determined by a Hilbert-Schmidt independence criterion (HSIC). A Frobenius norm of a cross-covariance operator in the RKHS is calculated. An independent representation for a kernel function is acquired by mapping original data to the RKHS, as shown below:
(x,·)=Σiαiφi(x)φi(·)=(√{square root over (α1)}φi(x), . . . )
(3)
where (·,·) represents a measurable symmetric positive definite kernel function; φ represents a mapping function; (·)
represents a Hilbert space; and at represents a feature value. The function φ maps the original data to a high-dimensional space, and the kernel function is derived from an inner product of the mapping function in the high-dimensional space.
5) The independence of the vectors qi, qj is detected by the HSIC after acquiring the kernel function. For random variables qi, qj and kernel functions 1,
2, the HSIC is defined as:
HSIC
1,
2(qi,qj):=∥Cq1,
2∥F2 (4)
where C1,
2 represents the cross-covariance operator regarding the kernel functions
1 and
2 in the RKHS; and ∥·∥F is the Frobenius norm. Moreover, there exists HSIC
1,
2 (qi, qj)=0=⇔qi⊥qj, with qi being independent of qj.
6) The kernel function (x,·) is optimized.
The present disclosure proposes approximating the kernel function using random Fourier features (RFF) due to a high complexity of calculating the kernel function (x,·) in the Hilbert space. A dimension-reduced function is acquired by Fourier transform sampling to approximate the original kernel function, and a nonlinear correlation of the two vectors qi, qj is captured. Specific steps are as follows:
The vectors qi, qj are mapped to a low-dimensional Euclidean space using the RFFs by the following formula (5), and an inner product after the mapping is an estimated value of the kernel function. Linear calculation is performed using the RFFs to remove the nonlinear correlation to realize statistical independence of the features.
A random Fourier function space RFF is expressed as:
RFF
={h:x→√{square root over (2)} cos(ωx+ϕ)|ω˜N(0,1), ϕ˜U(0,2π)}; (5)
where ω represents sampling from a standard normal distribution; and ϕ represents sampling from a uniform distribution.
The matrix Q is transformed into the RFFs by the formula (5) to approximate the kernel functions (x,·) of an identity feature and a task feature.
4) The independence is detected.
It is assuming that there exist measurable spaces Ω1 and Ω2, where (1,
1) and (
2,
2) represent the RKHSs in Ω1 and Ω2, and correspondingly,
1 and
2 are also measurable. There exists a unique cross-covariance operator τXY in a space from
1 to
2, deriving:
g, Σ
XY
ƒ
=Coν(ƒ(X), g(Y))=XY[ƒ(X)g(Y)]−
X[ƒ(X)]
Y[g(Y)] (6)
where ƒ(X)∈1, g(Y)∈
2, and Coν(·) represents a covariance matrix.
As shown in the Formula (6), the calculation of ΣXY is expanded to the calculation of the covariance matrix over the Euclidean space, and ƒ(X) and g(Y) represent nonlinear kernel functions. Due to ΣXY=0⇔X⊥Y, if a Hilbert-Schmidt norm of ΣXY is zero, X and Y are considered independent. Since calculation is difficult for a kernel method, the RFFs are capable of providing a function space RFF to achieve the objective, and a cross-covariance matrix ΣXY may be expressed as:
Theoretically, the independence between the two vectors qi, qj (replaced by X, Y as described above) is detected. Whether the cross-covariance operator ΣST regarding u(qi) and ν(qi) tends to 0 needs to be determined, and u and v are elements in the random Fourier space, expressed as:
where nqRFF.
A cross-covariance matrix is established; the Frobenius norm of the cross-covariance matrix is minimized to achieve an objective of uncorrelation, and a loss function is defined as:
dec=λΣ1≤i,j≤m
where hyper-parameter λ represents sigmoid ramp-up, which is in the following form according to a function with a number of training epochs increasing from 0:
λ(t)=e−5(1−t)
Step 5, an adversarial self-supervised module is established to fully utilize the identity information.
1) The identity feature S output by Gf is input to the adversarial self-supervised network H to obtain a mask representation. Each dimension of the mask representation is taken as a discrete random variable, and each dimension is sampled to obtain an approximate K—hotvector. β representing approximate sampling of a vector by Gumbel-Softmaxtrickκ—hot is defined as:
β=Gumbel−Softmax(H(S), κN)∈N (12)
where κ∈(0,1); N represents dimensions obtained by the adversarial self-supervised network; a mask closest to 1 in κN after sampling from a result is taken as an important feature, and other mask is taken as a secondary feature.
Gumbel-Softmax is defined: for pre-defined τ>0, i ∈{1, . . . , N}, p∈{1, . . . , κ}, the following formula is derived:
where π=H(S)∈N represents a probability vector; ε1p . . . εKp represent samples complying with a Gumbel distribution; and πi≤0, i∈1, . . . , N, Σiπi1. It is set that τ=0.8.
2) The identity feature S output by Gf is multiplied by masks β and 1-β to obtain an important dimension feature and a secondary dimension feature, and the main classifier C and the auxiliary classifier CS are utilized. s of the formula (2) may be redefined as:
a loss function of the main classifier ssup=l(C(S○β), s); (14)
and a loss function of the auxiliary classifier sinƒ=l(CS(S○(1−β), s). (15)
Firstly, the loss functions are minimized to train and optimize the EEG feature extraction networks Gf, Gt, the main classifier C, and the auxiliary classifier CS. Secondly, ssup is minimized and
sinƒ is maximized to train the adversarial self-supervised network H:
The proposed adversarial self-supervised module may allow secondary dimensions to play a role. The optimized auxiliary classifier CS uses the secondary dimensions to classify tags to minimize the loss function sinƒ, and the self-supervised network learns β to select a favorable dimension to maximize the loss function
sinƒ. Therefore, a secondary dimension having a low contribution can be found. The classifier is adversarial to a mask network. By optimizing Gf and Gt to minimize
sinƒ and
dec, a disadvantageous dimension is forced to carry more identity features that are independent of the task features. Finally, low-level representations are removed repeatedly and turned into new advanced representations, and learned representations more tend to clean identity features.
Step 6, the network model is trained.
The training set obtained in step 1.4 is used to optimize the loss functions by gradient back propagation for the model established in step 2 to step 5, and the best model is saved by using the validation set obtained in step 1.4 for testing.
A stochastic gradient descent (SGD) optimizer is used; a learning rate is 0.025; and batch_size is 64.
Step 7, the effectiveness of the present disclosure is verified on the multi-task identity recognition data set, including 30 subjects (N=30). A contrast experiment is conducted with existing methods, and results are as shown in Table 1. The verification results indicate that the model proposed in the present disclosure is capable of effectively brainprint features under different cognitive tasks, is not limited to the cognitive tasks, and has good robustness.
| Number | Date | Country | Kind |
|---|---|---|---|
| 202211571546.3 | Dec 2022 | CN | national |