MULTI-SOURCE DOMAIN ADAPTIVE ELECTROENCEPHALOGRAM (EEG) EMOTIONAL STATE CLASSIFICATION METHOD BASED ON KNOWLEDGE DISTILLATION

Description

CROSS-REFERENCE TO RELATED APPLICATION

This patent application claims the benefit of and priority to Chinese Patent Application No. 202310802378.2 filed with the Chinese Patent Office on Jun. 30, 2023, which is hereby incorporated by reference herein in its entirety.

TECHNICAL FIELD

The present disclosure belongs to the field of emotional state recognition of electroencephalogram (EEG) in the field of biometric recognition, and particularly relates to a multi-source domain adaptive EEG emotional state classification method based on knowledge distillation.

BACKGROUND

Emotion recognition plays an important role in human-computer interaction. In recent years, with the improvement of computing power, emotion recognition methods based on deep learning have attracted more and more attention. These methods make decisions that reflect human emotions by deeply excavating the potential objective emotional features of users.

Affective brain-computer interfaces (aBCIs) are an important application of emotion recognition. By measuring the signals of the peripheral and central nervous system, features related to the emotional state of users are extracted, and these features are used to adjust human-computer interaction (HCI). The aBCIs show potential in rehabilitation and communication.

Generally speaking, emotion recognition mainly be classified into two categories: a method based on non-physiological signals, such as facial expression images, body gestures, and voice signals; and a method based on physiological signals, such as electroencephalography (EEG), electromyography (EMG) and electrocardiogram (ECG). However, compared with non-physiological signals, physiological signals can directly touch the internal emotional state of individuals, making the internal emotional state of individuals less susceptible to conscious or unconscious manipulation. Among various emotion recognition methods based on physiological signals, the EEG is one of the most commonly used methods, because the EEG is collected directly from the cerebral cortex and is very valuable for reflecting the psychological state of people. With the rapid development of the EEG collecting technology and the processing methods, emotion recognition based on the EEG has attracted more and more attention in recent years.

However, due to the low signal-to-noise ratio (SNR) and the significant individual differences between different subjects at different times, it is still a huge challenge to construct an efficient and robust emotion recognition deep learning model based on the EEG. In addition, it is very important to use the existing labeled data to analyze new unlabeled data in the brain-computer interface (BCI) based on the EEG. Therefore, domain adaptation is widely used in research work. By learning from the source data distribution, a model that performs well in related but different target data distribution is trained. However, in practice, there is usually a plurality of source domains, so that multi-source domain adaptation becomes a powerful extension of domain adaptation. Nevertheless, the technique used for domain alignment in multi-source domain adaptation is usually Maximum mean discrepancy (MMD), which only takes into account the adaptation of the domain level, but lacks the adaptation of the data pair level. Such limitation may lead to a lack of discriminative ability. In addition, in most multi-source domain adaptation frameworks, only the average prediction results of a plurality of single-source domain models are used as the final results, and these single-source domain models are not fully utilized.

SUMMARY

In order to solve the defects of the prior art and make better use of the advantages of a plurality of single-source models, the present disclosure proposes a multi-source domain adaptive EEG emotional state classification method based on knowledge distillation (MS-KTF).

The technical scheme used by the present disclosure is as follows.

According to the present disclosure, the differential entropy (DE) features are used as frequency domain features of used EEG signals, the EEGNet model is slightly modified as the feature extractor, and a single-layer linear layer is used as a classifier to analyze the EEG signals, so as to implement a task of emotional state recognition in a cross-subject scenario and a cross-session scenario

According to the present disclosure, the training process is divided into three steps: (1) pre-training each teacher model based on each labeled source domain; (2) based on the corresponding labeled source domain and the unlabeled target domain, performing domain adaptation for each teacher model by using a source domain classification loss (SCL), a target domain classification loss (TCL), a maximum mean discrepancy (MMD) and a pseudo-label triplet loss; (3) transferring knowledge of teachers from a plurality of single-source domains to a student model. In addition, in step (2), in order to improve the effectiveness of the pseudo-label triplet loss, a margin-based sampling strategy is used to filter the original features, and only those features whose marginal scores are higher than the preset threshold are selected as embedded features for calculating the pseudo-label triplet loss.

The embodiment of that present disclosure includes the following steps.

In Step S1, data processing is performed.

The emotional data set SEED is taken as an example for analysis, and the processing steps of the original EEG data collected by an EEG collection device are as follows.

In Step S1-1, data denoising is performed.

The data set used by the present disclosure to verify the model performance comes from SEED. First, the original EEG signal collected in the data set is down-sampled to 200 HZ, then the band-pass filtering of 0-75 HZ is performed, ocular artifacts in the signals are removed by an independent component analysis (ICA) technology, and finally the traditional moving average and linear dynamic system (LDS) methods are used to further smooth the features.

In Step S1-2, differential entropy (DE) feature extraction is performed.

DE features are extracted from the EEG data after removing artifacts, and the data is segmented with a 1 s non-overlapping sliding window for each subject to obtain 3394 data samples. For each data sample x_i, the number of EEG data collecting channels is 62; and the frequency domain features of five frequency bands including δ(1-3 HZ), θ(4-7 HZ), α(8-13 HZ), β(14-30 HZ) and γ(31-50 HZ) are extracted.

In Step S2, data definitions and data set divisions are performed.

There are two test scenarios for emotional state classification in the method: the cross-subject scenario and the cross-session scenario, and the model tests in the two scenarios have their own different data definitions and data set divisions, which are explained in detail hereinafter.

It is assumed that there are N subjects, and each subject has D different session (period) tests. The whole sample set is expressed as U={(X_i,Y_i)_i=1^N}_j=1^D, where i indicates a serial number of the subject, j indicates a serial number of a session (period), X_iindicates a sample set of the subject i, and the corresponding label set is Y_i.

For a task of emotional state classification in the cross-session scenario, the data set is cross-verified by using a leave-one-out method; specifically, in each subject i, the data of 15 emotional tests of all subjects in a latest session is taken as a test set; for remaining D−1 sessions, in a unit of session, each session is deem as a source domain in a training set, and finally, D−1 source domains are obtained as the training set; and a total of N tests are conducted and the average accuracy is calculated.

For a task of emotional state classification in the cross-subject scenario, the data set is cross-verified by using the leave-one-out method; specifically, in a session (period), the data of all 15 emotional tests of a subject are iteratively extracted with an assumption that an emotional state label thereof is unknown as a test set; from remaining N−1 subjects, R subjects are randomly and unrepeatably grouped into a group, as a source domain in the training set; finally

$⌊ \frac{N - 1}{R} ⌋$

(round down) source domains are obtained as the training set, a total of D×N tests are conducted, and the average accuracy is calculated.

In Step S3, the construction and training of the MS-KTF model is performed.

The main parameters in the neural network MS-KTF model include:

- (1) a dimension d_fof the feature embedding space, that is, a dimension that features enter the embedding space after being extracted by the feature extractor, which is closely related to a characterization ability of embedded features;
- (2) the threshold for margin-based sampling, that is, the key parameter to determine whether the feature should be sampled, which samples the feature with a score higher than the threshold;
- (3) the temperature for distillation learning, that is, a temperature coefficient in softmax operation, which is mainly used to adjust the smoothness of the predicted distribution condition. In Step S3-1, initialization is performed.

The MS-KFT model consists of two parts: teacher models, which each are based on a corresponding single source domain, and a student model acting on the target domain, both the teacher models and the student model each consist of two modules: a domain-specific feature extractor N_fand a label classifier N_y; a plurality of single-source domain teacher models based on the multi-source domain and parameters of a target domain student are initialized.

In Step S3-2, a plurality of single-source domain teacher models are pre-trained.

Based on a multi-source domain sample set, a feature extractor N_fand a label classifier N_yof each domain-specific teacher model are pre-trained using a corresponding labeled single source domain sample set, such that each domain-specific teacher model has a certain pattern recognition ability in its respective source domains.

In Step S3-3, domain adaptation is performed on feature extractors of a plurality of single-source domain teacher models.

One labeled source domain sample set and one unlabeled target domain sample set are formed into a branch. In each branch, the feature extractor N_fcorresponding to the domain-specific teacher model is used to extract features from the respective source domain sample and target domain sample, and the features are extracted from the original feature space into the embedded space.

Thereafter, the embedded features are aligned at a domain level based on the maximum mean discrepancy in the feature space; the embedded features are aligned at a data pair level based on the pseudo-label triplet loss of margin-based sampling.

By minimizing the maximum mean discrepancy and the pseudo-label triplet loss, the feature extractors N_fof a plurality of single-source domain teacher models are trained to extract domain-invariant features in the source domain and the target domain.

In Step S3-4, label classifiers N_yof a plurality of single-source domain teacher models are trained.

In each single-source domain teacher model, the extracted source domain feature information is passed through the label classifier N_yto obtain predicted emotion Ŷ^S, and a cross-entropy between the predicted emotion Ŷ^Sand a corresponding label Y^Sin an actual sample is calculated; similarly, a cross-entropy between a predicted emotion Ŷ^Tof the target domain feature information and a generated pseudo-label Ŷ^Tis calculated.

By minimizing the two obtained cross-entropies, the label classifiers N_yof a plurality of single-source domain teacher models are trained to have good emotion classification ability in their respective source domain and respective target domain.

In Step S3-5, the knowledge of a plurality of single-source domain teacher models is merged.

Two different merging strategies are used to balance performances of the teacher models.

- (1) The teacher models to be merged is screened based on a voting manner.

This strategy is more suitable for a case in which teacher models have poor performance balance. Based on an emotion prediction result Ŷ_teacher^Tobtained by the feature extractor and the label classifier of each teacher model, with a unlabeled target domain sample, a corresponding one-hot coding result Ô_teacher^Tis generated; voting is performed based on the one-hot coding result Ô_teacher^Tgenerated by each teacher model, and the voting result is deem as a decision variable {circumflex over (D)}^T; in a case that the emotion prediction result Ŷ_teacher^Tof the teacher model is the same as the decision variable {circumflex over (D)}^T, the teacher model is selected for knowledge merging.

A mean value Y_teacher^Tof emotion prediction results of all selected teacher models is calculated as the merged knowledge of all the teacher models.

- (2) The teacher models are merged in an average manner.

This strategy is more suitable for a case in which teacher models have strong performance balance; in this case, all the teacher models have a same weight, and the mean value Y_teacher^Tof the prediction results of all the selected teacher models is calculated as the merged knowledge of all the teacher models.

In Step S3-6, the merged knowledge of the teacher models is taught to the student model.

With unlabeled target domain sample data, a prediction result Ŷ_student^Tof the student model is obtained through the feature extractor and the label classifier of the student model;

based on a predetermined distillation temperature, smoothing processing is performed on the merged knowledge Y_teacher^Tof the teacher model and the prediction result Ŷ_student^Tof the student model; and Kullback-Leibler (KL) divergence is used to evaluate a difference between two prediction results;

by minimizing the KL divergence between the teacher models and the student model so that the student model learns the knowledge of the teacher models, more extensive feature extraction and label classification ability than the teacher models in the target domain are obtained.

In Step S4, model performance evaluation in two scenarios including a cross-session scenario and a cross-subject scenario is performed.

The present disclosure specifically verifies the performance of the model on the SEED data set.

The emotional state Ŷ_student^Tpredicted by the target domain sample set in the training convergent student model is compared with the real state Y^T, the accuracy result is obtained, and the model performance is evaluated. The accuracy rate is a number of correctly classified samples during model testing to a total number of test samples, and the calculation formula of the model accuracy is as follows:

$\begin{matrix} a c c = \frac{T P + T N}{T P + F P + F N + T N} . & (1) \end{matrix}$

A multi-source domain adaptive EEG emotional state classification system based on knowledge distillation includes a pre-training module, a teacher model domain adaptation module and a student model training module; where the pre-training module pre-trains each teacher model based on each labeled source domain; the teacher model domain adaptation module performs domain adaptation for each teacher model by using a source domain classification loss (SCL), a target domain classification loss (TCL), a maximum mean discrepancy (MMD) and a pseudo-label triplet loss based on the corresponding labeled source domain and an unlabeled target domain; and the student model training module transfers knowledge of teachers from a plurality of single-source domains to a student model. The pre-training module, the teacher model domain adaptation module and the student model training module are computer implemented modules.

In addition, in the teacher model domain adaptation module, in order to improve effectiveness of the pseudo-label triplet loss, a margin-based sampling strategy is used to filter original features, and only features with marginal scores higher than a preset threshold are selected as embedded features for calculating the pseudo-label triplet loss.

The beneficial effect of that present disclosure are as follows.

The present disclosure solves the blind estimation problem of the maximum mean discrepancy (MMD) technology in the multi-source domain adaptation by using the pseudo-label triplet loss. In addition, the margin-based sampling strategy based on uncertainty measurement is used to improve its effectiveness, while knowledge extraction technology is introduced to train a more robust student model by teaching the knowledge of a plurality of teacher models, so as to make maximum use of multi-source domain knowledge. Through the experimental verification on the public emotion data set SEED, compared with the previous method, the present disclosure has achieved significant improvement.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic flow diagram of the present disclosure.

FIG. 2 is a detailed structural diagram of a feature extractor of a model according to the present disclosure.

FIG. 3 is an architecture diagram of an MS-KTF model.

FIG. 4 is a diagram of data division and construction of the MS-KTF model.

FIG. 5 is a structural diagram of a teacher model.

FIG. 6 is a structural diagram of a student model.

DETAILED DESCRIPTION OF THE EMBODIMENTS

The preferred embodiments of the present disclosure will be described in detail with reference to the attached drawings hereinafter, so that the advantages and features of the present disclosure can be more easily understood by those skilled in the art, and the scope of protection of the present disclosure can be more clearly defined.

Multi-source domain adaptation (MSDA) aims to transfer knowledge from a plurality of source domains to an unlabeled target domain, which is very suitable for cross-session and cross-subject EEG emotion recognition. However, the existing MSDA model only takes into account the domain level of each pair of feature relationships between the source domain and the target domain, but rarely takes into account the correlation of the data pair level between the two domains, resulting in poor robustness.

The present disclosure discloses a multi-source domain knowledge transfer framework (MS-KTF) for EEG emotional recognition. First, data is obtained for band-pass filtering, and artifacts are removed by an independent component analysis (ICA) technology. Second, EEG features are extracted by using a differential entropy (DE) method, and a three-dimensional EEG time series is converted into a two-dimensional sample matrix. Then, a training set and a test set are defined in two task scenarios, respectively, so as to ensure that the training set and the test set are not overlapped with each other. For these samples, MS-KTF combines a pseudo-label triple loss based on margin-based sampling with a maximum mean discrepancy (MMD). According to the method, unbiased alignment between each pair of source domains and target domains can be implemented at the domain level and the data pair level. Specifically, the framework learns knowledge from different source domains, so that a plurality of single-source models are utilized to the greatest extent, and a more powerful model is implemented with less time consumption. Finally, the classification accuracy is used to evaluate the performance of the model in the two task scenarios. According to the present disclosure, the triple loss and the maximum mean discrepancy are combined, so that the problem of insufficient alignment of EEG signal distribution differences is solved to a certain extent, and a high-precision cross-session and cross-subject emotional state classifier is trained, which has the advantages of low time complexity, high calculation efficiency, strong generalization ability and the like, so as to have a wide application prospect in the actual brain-computer interaction.

Refer to FIG. 1, FIG. 2, FIG. 3, FIG. 4, FIG. 5 and FIG. 6 for the specific implementation of the present disclosure. The embodiment of the present disclosure includes the following steps.

In Step S1, data processing is performed.

The emotional data set is taken as an example for analysis, and the processing steps of the original EEG data collected by an EEG collection device are as follows.

In Step S1-1, data denoising is performed.

The data set used by the present disclosure to verify the model performance comes from SEED. Refer to the paper “Investigating Critical Frequency Bands and Channels for EEG-Based Emotion Recognition with Deep Neural Networks” for details. First, the original EEG signal collected in the data set is down-sampled to 200 HZ, then the band-pass filtering of 0.3-50 HZ is performed, and finally ocular artifacts in the signals are removed by an independent component analysis (ICA) technology.

In Step S1-2, differential entropy (DE) feature extraction is performed.

DE features are extracted from the EEG data after removing artifacts. Each subject will watch 15 videos that can result in obvious emotional changes of the subject, and the EEG data collected within the same video playing duration is regarded as an emotional test. Each subject has 15 emotional tests. The data is segmented with a 1 s non-overlapping sliding window for each subject to obtain 3394 data samples. For each data sample x_i, the number of EEG data collecting channels is 62; and the frequency domain features of five frequency bands including: δ(1-3 HZ), θ(4-7 HZ), α(8-13 HZ), β(14-30 HZ) and γ(31-50 HZ) are extracted.

In Step S2, data definitions and data set divisions are performed.

There are two test scenarios for emotional state classification: the cross-subject scenario and the cross-session scenario, and the model tests in the two scenarios have their own different data definitions and data set divisions, which are explained in detail hereinafter.

It is assumed that there are N subjects, and each subject has D different (session) period tests. The whole sample set is expressed as U={(X_i,Y_i)_j=1^D}_i=1^N, where i indicates a serial number of the subject, j indicates a serial number of a session (period), X_iindicates a sample set of the subject i, and a corresponding label set is Y_i.

For a task of emotional state classification in the cross-session scenario, the data set is cross-verified by using a leave-one-out method; specifically, in each subject i, the data of 15 emotional tests of all subjects in a latest session is taken as a test set; for remaining D−1 sessions, in a unit of session, each session is deemed as a source domain in a training set, and finally D−1 source domains are obtained as a training set; a total of N tests are conducted and the average accuracy is calculated.

For a task of emotional state classification in the cross-subject scenario, the data set is cross-verified by using the leave-one-out method; specifically, in a session (period), the data of all 15 emotional tests of a subject are iteratively extracted with an assumption that an emotional state label thereof is unknown, as a test set; from remaining N−1 subjects, R subjects are randomly and unrepeatably grouped into a group, as a source domain in the training set; and

$⌊ \frac{N - 1}{R} ⌋$

(round down) source domains are obtained as the training set. Finally, the performance of the model is verified on the test set of N subjects, a total of D×N tests are conducted, and the average accuracy is calculated.

In Step S3, the construction and training of the MS-KTF model is performed.

The main parameters in the neural network MS-KTF model include:

- (1) a dimension d_fof the feature embedding space, that is, a dimension that features enter the embedding space after being extracted by the feature extractor, which is closely related to a characterization ability of embedded features;
- (2) the threshold for margin-based sampling, that is, the key parameter to determine whether the feature should be sampled, which samples the feature with a score higher than the threshold;
- (3) the temperature for distillation learning, that is, a temperature coefficient in softmax operation, which is mainly used to adjust the smoothness of the predicted distribution condition. In Step S3-1, Specific data division and input of the model is performed.

In Step S3-1-1, the data division of the model is performed.

The data division and construction of the model are shown in FIG. 4, and the specific division condition is described as follows:

for the cross-subject scenario: a target domain sample set of the model is U^T={X_i}, where X_iindicates the feature data set of an i-th subject; X_i={x_j}_j=1ⁿ; x_jindicates a j-th sample in X_i, and n indicates a number of samples in X_i; the multi-source domain sample set of the model is

$U^{s} = {{(X_{i}, Y_{i})}_{i \in P_{j}}}_{j = 1}^{⌊ \frac{N - 1}{R} ⌋},$

P_j⊆[N]\i, P_j∩P_k=Ø, ∀i∀k, i≠j, where [N]\i indicates a serial number set of all the subjects after the i-th subject data is removed, and P_jindicates a serial number set of the subjects included in the j-th source domain (* all the data in the cross-subject scenario comes from the same session).

For the cross-session scenario: the target domain sample set of the model is U^T={(X_i)_i=1^N}_j, where X_iindicates a feature data set of an i-th subject; j indicates a j-th session (period); the multi-source domain sample set of the model is U^S={(X_i,Y_i)_i=1^N}_k, k∈[D]/j, where [D]/j indicates the a serial number set of all the sessions after the j-th session is removed.

In Step S3-1-2, data input of the model is performed.

As shown in the left half of FIG. 3, that is, as show in FIG. 4, one labeled source domain sample set U^Sand one unlabeled target domain sample set U^Tare formed into a branch for a subsequent training of each single-source domain teacher model; and for the student model, only the unlabeled target domain sample set U^Tis used.

In Step S3-2, initialization of the model is performed.

As shown in the right half of FIG. 3, the MS-KFT model consists of two parts: teacher models, which each are based on a corresponding single source domain (the upper right corner of FIG. 3), and a student model only acting on the target domain (the lower right corner of FIG. 3). Both the teacher models and the student model each consist of two modules: a domain-specific feature extractor N_fand a label classifier N_y. The specific structure of the feature extractor N_fis shown in FIG. 2, and the label classifier N_yconsists of a single-layer linear layer and a softmax function.

In Step S3-3, pre-training single-source domain teacher models are performed.

As shown in FIG. 5, it is a structural diagram of a plurality of single-source domain teacher models. Each of the domain-specific feature extractor N_fand the domain-specific label classifier N_yforms a single-source domain teacher model.

Based on a multi-source domain sample set U^S, a feature extractor N_fand a label classifier N_yof each domain-specific teacher model are pre-trained using a corresponding labeled single source domain sample set, such that each domain-specific teacher model has a certain pattern recognition ability in its respective source domains (the optimization goal is the same as SCL in the following formula (5), which will not be described in detail here).

In Step S3-4, training of the feature extractor of single-source domain teacher models is performed.

After passing through the feature extractor N_fof the domain-specific teacher model, the respective low-dimensional features F^Sand F^Tof the corresponding source domain data U^Sand the target domain data U^Tare extracted. In order to ensure the unbiased adaptability of the extracted features, two methods including domain-level distribution alignment and data-pair-level distribution alignment are used in this patent.

In Step S3-4-1, domain-level distribution alignment is performed.

Corresponding to the unbiased distribution alignment in FIG. 5, based on the pseudo-label triple loss and the maximum mean discrepancy, the domain adaptation of each teacher model is performed in this patent.

The maximum mean discrepancy (MMD) is a distance metric in probabilistic metric space, which is widely used in machine learning and nonparametric testing. The distance metric is based on the idea of embedding the probability into reproducing kernel Hilbert space (RKHS), which aims to reduce the distribution difference between the source domain and the target domain while retaining their specific discriminant information. In the training process, the distance between the source domain and the target domain in the feature space is reduced by minimizing the maximum mean discrepancy (MMD) loss, so as to achieve domain-level alignment, and a specific formula is as follows:

$\begin{matrix} M M D_{(F^{S}, F^{T})} = { \frac{1}{N^{S}} \sum_{i = 1}^{N^{S}} F_{i}^{S} - \frac{1}{N^{T}} \sum_{j = I}^{N^{T}} F_{j}^{T} }_{ℋ}^{2} . & (2) \end{matrix}$

Where F_i^Sand F_j^Tindicate the extracted low-dimensional features of the i-th sample in the source domain and the j-th sample in the target domain, respectively; N^Sand N^Tindicate a number of source domain samples and a number of target domain samples respectively.

In Step S3-4-2: data-pair-level distribution alignment is performed.

Because the MMD blindly estimates parameters to take into account statistical information and their relationships, feature distinguishability may be reduced, and the relationship between the intra-class distance and the inter-class distance may be affected, because one of the distance values decreases and the other distance value increases. The triple loss can reduce the intra-class distance and increase the inter-class distance, which is a way to solve this problem. However, in domain adaptation, the target domain is usually unlabeled. Therefore, the triple loss of margin-based sampling is used to perform data-pair-level distribution alignment in this patent.

A margin-based score of the prediction result of each sample is used in this patent as a basis for determining whether the sample is sampled, and this method may be expressed by the following formula:

$\begin{matrix} margin (x) = {g_{θ} (F^{T})}_{i^{*}} - \max_{i \in [k] ∖ i^{*}} {{g_{θ} (F^{T})}_{i}} X_{s e lected} = {x_{j} | margin (x_{j}) \geq Threshold, x_{j} \in X} . & (3) \end{matrix}$

Where x is an input sample, g_θ is an abstract function of the label classifier, i* is a category with a highest prediction probability in the prediction result, k is a number of all categories, [k]\i* indicates a set of all the categories except i*, and Threshold is a predetermined threshold of margin-based sampling.

The triplet loss requires sampling in a form of triplet {x_i^a,x_i^p,x_iⁿ}_i^N^trip, in which x_i^a(an anchor sample) and x_i^p(a positive sample) are different samples of the same category from the i-th triplet, x_iⁿis any sample of the i-th triplet of a category different from that of anchor sample x_i^a; the purpose of the triple loss is to ensure that a distance between the embedded features of a positive sample pair (x_i^a,x_i^p) plus a fixed margin value is smaller than a distance between the embedded features of a negative sample pair (x_i^a,x_iⁿ). Formally, for a mini-batch sample set, the triple loss is defined as:

$\begin{matrix} d_{p} (x_{i}^{a}, x_{i}^{p}) =  f_{θ} (x_{i}^{a}) - f_{θ} (x_{i}^{p})  d_{n} (x_{i}^{a}, x_{i}^{n}) =  f_{θ} (x_{i}^{a}) - f_{θ} (x_{i}^{n})  ℒ_{trip} = \frac{1}{N} \sum_{x_{i} \in X_{s e l e c t e d}} \max {d_{p} (x_{i}^{a}, x_{i}^{p}) - d_{n} (x_{i}^{a}, x_{i}^{n}) + α, 0} . & (4) \end{matrix}$

Where N is a number of samples contained in X_selected, α is a predetermined margin value for guiding separability, d(⋅) is a function for calculating an Euler distance between regularized embedded feature pairs, and f_θ(⋅) is an abstract function for feature extraction.

In Step S3-5, training of label classifiers of single-source domain teacher models is performed.

The cross-entropy (CE) loss is used as an evaluation index of the classification result of the label classifier in the source domain and the target domain in the patent, a source classification loss (SCL) is specifically used as the classification loss in the source domain, and a target classification loss (TCL) is specifically used as the classification loss in the target domain.

In the source domain, there is a real label, so the SCL uses the real label and the classification result of the label classifier as the comparison object, and the specific formula is as follows:

$\begin{matrix} w_{i} = {\begin{matrix} 1, if {\hat{y}}_{i}^{s} = y_{i}^{s} \\ 0, otherwise \end{matrix} ℒ_{S C L} = - \sum w_{i} \ln (g_{θ} (f_{θ} (x_{i}))) . & (5) \end{matrix}$

Where x_iis an i-th source domain input sample, y_i^Sis a real label of the i-th source domain input sample, ŷ_i^Sis a prediction result of the label classifier for the i-th source domain input sample, f_θ(⋅) is an abstract function for feature extraction, and g_θ(⋅) is an abstract function of the label classifier.

In the target domain, the sample lacks a real label, and the corresponding TCL uses the generated pseudo label and the classification result of the label classifier as comparison objects, and a specific formula is as follows:

$\begin{matrix} {\tilde{y}}_{i}^{T} = \arg \max (g_{θ} (f_{θ} (x_{i}))), w_{i} = {\begin{matrix} 1, if {\hat{y}}_{i}^{T} = {\tilde{y}}_{i}^{T} \\ 0, otherwise \end{matrix}, ℒ_{T C L} = - \sum w_{i} \ln (g_{θ} (f_{θ} (x_{i}))), & (6) . \end{matrix}$

Where x_iis an i-th target domain input sample, {tilde over (y)}_i^Tis a pseudo label generated by the i-th target domain input sample, ŷ_i^Tis a prediction result of the label classifier for the i-th target domain input sample, f_θ(⋅) is an abstract function for feature extraction, and g_θ(⋅) is an abstract function of the label classifier.

In Step S3-6, the goal optimization and the training of the single-source domain teacher model is performed.

Step S3-4 and Step S3-5 are summarized, in a domain adaptation stage of the teacher model, the final optimization goal is shown in a following formula:

$\begin{matrix} \min_{f_{θ}, g_{θ}} ℒ_{D} = \min_{f_{θ}, g_{θ}} (ℒ_{S C L} + {βℒ}_{M M D} + {γℒ}_{trip} + {σℒ}_{T C L}) . & (7) \end{matrix}$

Where β, γ and σ are weighting factors for balancing a loss function.

By using a random gradient optimizer and combining with a mini-batch training mode, domain invariant features are obtained for each pair of source domain and target domains at the domain level and a data pair level through minimizing the MMD loss and the triple loss ( custom-character _MMD,_trip) in formula (7); by minimizing the classification losses (_SCL,_TCL) in the source domain and the target domain, a better classifier is obtained, which accurately predicts the source domain samples without sacrificing the ability to discriminate the target domain samples.

In Step S3-7, training of the student model is performed.

The structure of the specific student model can be seen in FIG. 6. After the domain adaptation of the teacher models, a plurality of single-source domain models are obtained. The models can effectively extract the deep EEG pattern representations that are differentiated for classification tasks but transferable between different domains. In order to maximize the utilization of these single-source models, knowledge distillation is used to transfer knowledge learned from the multi-source domain and train a more powerful student model.

In order to better merge the knowledge of the teacher models, a voting-based method is used to select the knowledge of the teacher models to be merged, which is expressed as the following formula in the patent:

$\begin{matrix} {{\tilde{y}}_{i}} = {mode}_{j = 1}^{N_{t}} ({\hat{y}}_{i}^{T_{j}}), {mask}_{i}^{T_{j}} = {\begin{matrix} 0, if {\hat{y}}_{i}^{T_{j}} \notin {{\tilde{y}}_{i}} \\ 1, otherwise \end{matrix}, {merge}_{i} = a v g_{j = 1}^{N_{t}} (g_{θ}^{T_{j}} (f_{θ}^{T_{j}} (x_{i})) * {mask}_{i}^{T_{j}}) . & (8) \end{matrix}$

Where x_iis an i-th input sample, N_tis a number of teacher models, mode(⋅) is a function for finding a mode/multiple modes, * is a point multiplication function, ŷ_i^T^jis a prediction result of a j-th teacher model for the i-th input sample, and {{tilde over (y)}_i} is a decision label set for generating a teacher model mask of the i-th input sample.

After obtaining the merged knowledge of a plurality of single-source domain teacher models, Kullback-Leibler (KL) divergence is used to evaluate a difference between the prediction result of the teacher model and the prediction result of the student model, and the formula is as follows:

$\begin{matrix} ℒ_{K D} = KLD [(g_{θ} (f_{θ} (X)); T), (merge; T)] . & (9) \end{matrix}$

Where X is an input sample set, merge is a merged teacher knowledge set, T is a predetermined temperature coefficient for controlling the smoothness of softmax function, and KLD[p,q] is an evaluation function for measuring a KL divergence between a distribution p and a distribution q.

By using an adam optimizer and combining with a mini-batch training mode, a KL loss in formula (9) is minimized, so that the student model fully learns the merged knowledge of the teacher model and obtains better performance in the target domain.

In Step S4: model performance evaluation in two scenarios including a cross-session scenario and a cross-subject scenario is performed.

The present disclosure specifically verifies the performance of the model on the SEED data set and the SEED-IV data set.

The prediction result y_predobtained by the converged student model in the target domain is compared with the real label y^Tin the target domain by using a confusion matrix, and the comparison result is obtained to evaluate the model performance. The accuracy rate is a number of correctly classified samples during model testing to a total number of test samples, and the calculation formula of the model accuracy is as follows:

$\begin{matrix} a c c_{s} = \frac{T P + T N}{T P + F P + F N + T N}, s = 1, 2, \dots, 45. & (10) \end{matrix}$

Where TP is a positive sample of a positive type predicted by the model, TN is a negative sample of a negative type predicted by the model, FP is a negative sample of a positive type predicted by the model, and FN is a positive sample of a negative type predicted by the model. SEED data includes 15 subjects, and each subject has three tests with a total of 45 tests. The average accuracy of the first two tests of 15 subjects is as follows:

$\begin{matrix} a c c_{a l l} = \frac{\sum {acc}_{s}}{2 * 15}, s = 1, 2, \dots, 30. & (11) \end{matrix}$

The mean square error formula of the result is as follows:

$\begin{matrix} s t d_{a l l} = \sqrt{\frac{\sum {({acc}_{a l l} - a c c_{s})}^{2}}{2 * 1 5}}, s = 1, 2, \dots, 30. & (12) \end{matrix}$

Refer to Step S3-1-1 for the data set division in the two scenarios including the cross-subject scenario and the cross-session scenario. For the cross-subject scenario, the model proposed by the present disclosure is tested on EEG data of 15 subjects in one test. For the cross-session scenario, the model proposed by the present disclosure is tested on EEG data of 15 subjects in one test. The comparison between the final test results and the existing technologies (SVM, DGCNN and RGNN) is shown in the following table:

TABLE 1

table of comparison of classifier performance on the

SEED data set

the present

Classifier
DDC
DAN
MS-MDA
disclosure

accuracy rate of
81.53/6.83
79.93/7.06
88.56/7.80
97.58/1.46

cross-session

accuracy rate of
68.99/3.23
65.84/2.25
89.63/6.79
91.73/10.48

cross-subject

TABLE 2

table of comparison of classifier performance on the

SEED-IV data set

the present

Classifier
DDC
DAN
MS-MDA
disclosure

accuracy rate of
57.63/11.28
55.14/12.79
61.43/15.71
77.70/13.49

cross-session

accuracy rate of
37.71/6.36
32.44/9.02
59.34/5.48
74.19/15.84

cross-subject

As can be seen from the results in the above tables, the method proposed by the present disclosure has higher accuracy than those of DDC, DAN and MS-MDA in the cross-session scenario and the cross-subject scenario. The present disclosure is not only suitable for the research of emotional state recognition, but also suitable for any EEG-based cross-session and cross-subject classification prediction task, which solves the problem of individual differences of the EEG to some extent.

Claims

1. A multi-source domain adaptive electroencephalogram (EEG) emotional state classification method based on knowledge distillation, which uses differential entropy features as frequency domain features of used EEG signals, an improved EEGNet model as a feature extractor, and a single-layer linear layer as classifier to analyze the EEG signals, so as to implement a task of emotional state recognition in a cross-subject scenario and a cross-session scenario, comprising: I. pre-training each teacher model based on each labeled source domain;II. based on the corresponding labeled source domain and an unlabeled target domain, performing domain adaptation for each teacher model by using a source domain classification loss, a target domain classification loss, a maximum mean discrepancy and a pseudo-label triplet loss; wherein in order to improve effectiveness of the pseudo-label triplet loss, a margin-based sampling strategy is used to filter original features, and only features with marginal scores higher than a preset threshold are selected as embedded features for calculating the pseudo-label triplet loss;III. transferring knowledge of teachers from a plurality of single-source domains to a student model.
2. The multi-source domain adaptive EEG emotional state classification method based on knowledge distillation according to claim 1, wherein there are two test scenarios for emotional state classification in the method: the cross-subject scenario and the cross-session scenario, and model tests in the two scenarios have their own different data definitions and data set divisions, and the specific data definitions and data set divisions are as follows: assuming that there are N subjects, and each subject has D different sessions; the whole sample set is expressed as U={(Xi,Yi)i=1N}j=1D, where i indicates a serial number of the subject, j indicates a serial number of a session, Xi indicates a sample set of a subject i, and a corresponding label set is Yi;for a task of emotional state classification in the cross-session scenario, the data set is cross-verified by using a leave-one-out method; in each subject i, data of 15 emotional tests of all subjects in a latest session is taken as a test set; for remaining D−1 sessions, in a unit of session, each session is deemed as a source domain in a training set, and D−1 source domains are obtained as the training set; a total of N tests are conducted and an average accuracy is calculated;for the task of emotional state classification in the cross-subject scenario, the data set is cross-verified by using the leave-one-out method; in a session, data of all 15 emotional tests of a subject are iteratively extracted with an assumption that an emotional state label thereof is unknown, as a test set; from remaining N−1 subjects, R subjects are randomly and unrepeatably grouped into a group, as a source domain in the training set;
3. The multi-source domain adaptive EEG emotional state classification method based on knowledge distillation according to claim 1, wherein construction and training of a multi-source domain knowledge transfer framework (MS-KTF) model during implementation of the method are as follows: S3-1: initialization;the MS-KFT model consists of two parts: teacher models, which each are based on a corresponding single source domain, and a student model acting on a target domain. both the teacher models and the student model each consist of two modules: a domain-specific feature extractor Nf and a label classifier Ny; a plurality of single-source domain teacher models based on the multi-source domain and parameters of a target domain student are initialized;S3-2: pre-training the plurality of single-source domain teacher models;based on a multi-source domain sample set, a feature extractor Nf and a label classifier Ny of each domain-specific teacher model are pre-trained using a corresponding labeled single-source domain sample set, such that each domain-specific teacher model has a certain pattern recognition ability in its respective source domain;S3-3: performing domain adaptation on feature extractors of a plurality of single-source domain teacher models;one labeled source domain sample set and one unlabeled target domain sample set are formed into a branch, and in each branch, the feature extractor Nf corresponding to the domain-specific teacher model is used to extract features from the respective source domain sample and target domain sample, and the features are extracted from an original feature space into an embedded space;embedded features are aligned at a domain level based on the maximum mean discrepancy in the feature space; the embedded features are aligned at a data pair level based on the pseudo-label triplet loss of margin-based sampling;by minimizing the maximum mean discrepancy and the pseudo-label triplet loss, the feature extractors Nf of the plurality of single-source domain teacher models are trained to extract domain-invariant features in the source domain and the target domain;S3-4: training label classifiers Ny of the plurality of single-source domain teacher modelsin each single-source domain teacher model, the extracted source domain feature information is passed through the label classifier Ny to obtain predicted emotion ŶS, and a cross-entropy between the predicted emotion ŶS and a corresponding label YS in an actual sample is calculated; similarly, a cross-entropy between a predicted emotion ŶT of the target domain feature information and a generated pseudo-label ŶT is calculated;by minimizing the two obtained cross-entropies, the label classifiers Ny of the plurality of single-source domain teacher models are trained to have good emotion classification ability in their respective source domains and respective target domains;S3-5: merging knowledge of the plurality of single-source domain teacher models;S3-6: teaching merged knowledge of the teacher models to the student model.
4. The multi-source domain adaptive EEG emotional state classification method based on knowledge distillation according to claim 2, wherein construction and training of a multi-source domain knowledge transfer framework (MS-KTF) model during implementation of the method are as follows: S3-1: initialization;the MS-KFT model consists of two parts: teacher models, which each are based on a corresponding single source domain, and a student model acting on a target domain. both the teacher models and the student model each consist of two modules: a domain-specific feature extractor Nf and a label classifier Ny; a plurality of single-source domain teacher models based on the multi-source domain and parameters of a target domain student are initialized;S3-2: pre-training the plurality of single-source domain teacher models;based on a multi-source domain sample set, a feature extractor Nf and a label classifier Ny of each domain-specific teacher model are pre-trained using a corresponding labeled single-source domain sample set, such that each domain-specific teacher model has a certain pattern recognition ability in its respective source domain;S3-3: performing domain adaptation on feature extractors of a plurality of single-source domain teacher models;one labeled source domain sample set and one unlabeled target domain sample set are formed into a branch, and in each branch, the feature extractor Nf corresponding to the domain-specific teacher model is used to extract features from the respective source domain sample and target domain sample, and the features are extracted from an original feature space into an embedded space;embedded features are aligned at a domain level based on the maximum mean discrepancy in the feature space; the embedded features are aligned at a data pair level based on the pseudo-label triplet loss of margin-based sampling;by minimizing the maximum mean discrepancy and the pseudo-label triplet loss, the feature extractors Nf of the plurality of single-source domain teacher models are trained to extract domain-invariant features in the source domain and the target domain;S3-4: training label classifiers Ny of the plurality of single-source domain teacher modelsin each single-source domain teacher model, the extracted source domain feature information is passed through the label classifier Ny to obtain predicted emotion ŶS, and a cross-entropy between the predicted emotion ŶS and a corresponding label YS in an actual sample is calculated; similarly, a cross-entropy between a predicted emotion ŶT of the target domain feature information and a generated pseudo-label ŶT is calculated;by minimizing the two obtained cross-entropies, the label classifiers Ny of the plurality of single-source domain teacher models are trained to have good emotion classification ability in their respective source domains and respective target domains;S3-5: merging knowledge of the plurality of single-source domain teacher models;S3-6: teaching merged knowledge of the teacher models to the student model.
5. The multi-source domain adaptive EEG emotional state classification method based on knowledge distillation according to claim 3, wherein two different merging strategies are used to balance performances of the teacher models: (1) screening the teacher models to be merged based on a voting manner;this strategy is more suitable for a case in which teacher models have poor performance balance; based on an emotion prediction result ŶteacherT obtained by the feature extractor and the label classifier of each teacher model, with a unlabeled target domain sample a corresponding one-hot coding result ÔteacherT is generated; voting is performed based on the one-hot coding result ÔteacherT generated by each teacher model, and the voting result is deemed as a decision variable DT; in a case that the emotion prediction result ŶteacherT of the teacher model is the same as the decision variable {circumflex over (D)}T, the teacher model is selected for knowledge merging;a mean value ŶteacherT of emotion prediction results of all selected teacher models is calculated as the merged knowledge of all the teacher models;(2) merging the teacher models in an average manner;this strategy is more suitable for a case in which teacher models have strong performance balance; in this case, all the teacher models have a same weight, and the mean value ŶteacherT of the prediction results of all the selected teacher models is calculated as the merged knowledge of all the teacher models.
6. The multi-source domain adaptive EEG emotional state classification method based on knowledge distillation according to claim 4, wherein two different merging strategies are used to balance performances of the teacher models: (1) screening the teacher models to be merged based on a voting manner;this strategy is more suitable for a case in which teacher models have poor performance balance; based on an emotion prediction result ŶteacherT obtained by the feature extractor and the label classifier of each teacher model, with a unlabeled target domain sample a corresponding one-hot coding result ÔteacherT is generated; voting is performed based on the one-hot coding result ÔteacherT generated by each teacher model, and the voting result is deemed as a decision variable {circumflex over (D)}T; in a case that the emotion prediction result ŶteacherT of the teacher model is the same as the decision variable {circumflex over (D)}T, the teacher model is selected for knowledge merging;a mean value YteacherT of emotion prediction results of all selected teacher models is calculated as the merged knowledge of all the teacher models;(2) merging the teacher models in an average manner;this strategy is more suitable for a case in which teacher models have strong performance balance; in this case, all the teacher models have a same weight, and the mean value YteacherT of the prediction results of all the selected teacher models is calculated as the merged knowledge of all the teacher models.
7. The multi-source domain adaptive EEG emotional state classification method based on knowledge distillation according to claim 3, wherein the teaching merged knowledge of the teacher models to the student model is as follows: with unlabeled target domain sample data, obtaining a prediction result ŶstudentT of the student model through the feature extractor and the label classifier of the student model;based on a predetermined distillation temperature, performing smoothing processing on the merged knowledge YteacherT of the teacher models and the prediction result ŶstudentT of the student model; and using Kullback-Leibler (KL) divergence to evaluate a difference between two prediction results;by minimizing the KL divergence between the teacher models and the student model so that the student model learns the knowledge of the teacher models, obtaining more extensive feature extraction and label classification ability than the teacher models in the target domain.
8. The multi-source domain adaptive EEG emotional state classification method based on knowledge distillation according to claim 4, wherein the teaching merged knowledge of the teacher models to the student model is as follows: with unlabeled target domain sample data, obtaining a prediction result ŶstudentT of the student model through the feature extractor and the label classifier of the student model;based on a predetermined distillation temperature, performing smoothing processing on the merged knowledge YteacherT of the teacher models and the prediction result ŶstudentT of the student model; and using Kullback-Leibler (KL) divergence to evaluate a difference between two prediction results;by minimizing the KL divergence between the teacher models and the student model so that the student model learns the knowledge of the teacher models, obtaining more extensive feature extraction and label classification ability than the teacher models in the target domain.
9. The multi-source domain adaptive EEG emotional state classification method based on knowledge distillation according to claim 3, wherein the initialization is implemented as follows: S3-1-1: data division of the model, in which specific division is described as follows:for the cross-subject scenario: a target domain sample set of the model is UT={Xi}, where Xi indicates a feature data set of an i-th subject; Xi={xj}j=1n, xj indicates a j-th sample in Xi, and n indicates a number of samples in Xi; a multi-source domain sample set of the model is
10. The multi-source domain adaptive EEG emotional state classification method based on knowledge distillation according to claim 4, wherein the initialization is implemented as follows: S3-1-1: data division of the model, in which specific division is described as follows:for the cross-subject scenario: a target domain sample set of the model is UT={Xi}, where Xi indicates a feature data set of an i-th subject; Xi={xj}j=1n, xj indicates a j-th sample in Xi, and n indicates a number of samples in Xi; a multi-source domain sample set of the model is
11. The multi-source domain adaptive EEG emotional state classification method based on knowledge distillation according to claim 5, wherein the initialization is implemented as follows: S3-1-1: data division of the model, in which specific division is described as follows:for the cross-subject scenario: a target domain sample set of the model is UT={Xi}, where Xi indicates a feature data set of an i-th subject; Xi={xj}j=1n, xj indicates a j-th sample in Xi, and n indicates a number of samples in Xi; a multi-source domain sample set of the model is
12. The multi-source domain adaptive EEG emotional state classification method based on knowledge distillation according to claim 6, wherein the initialization is implemented as follows: S3-1-1: data division of the model, in which specific division is described as follows:for the cross-subject scenario: a target domain sample set of the model is UT={Xi}, where Xi indicates a feature data set of an i-th subject; Xi={xj}j=1n, xj indicates a j-th sample in Xi, and n indicates a number of samples in Xi; a multi-source domain sample set of the model is
13. The multi-source domain adaptive EEG emotional state classification method based on knowledge distillation according to claim 7, wherein the initialization is implemented as follows: S3-1-1: data division of the model, in which specific division is described as follows:for the cross-subject scenario: a target domain sample set of the model is UT={Xi}, where Xi indicates a feature data set of an i-th subject; Xi={xj}j=1n, xj indicates a j-th sample in Xi, and n indicates a number of samples in Xi; a multi-source domain sample set of the model is
14. The multi-source domain adaptive EEG emotional state classification method based on knowledge distillation according to claim 8, wherein the initialization is implemented as follows: S3-1-1: data division of the model, in which specific division is described as follows:for the cross-subject scenario: a target domain sample set of the model is UT={Xi}, where Xi indicates a feature data set of an i-th subject; Xi={xj}j=1n, xj indicates a j-th sample in Xi, and n indicates a number of samples in Xi; a multi-source domain sample set of the model is
15. The multi-source domain adaptive EEG emotional state classification method based on knowledge distillation according to claim 3, wherein the performing domain adaptation on the feature extractors of the plurality of single-source domain teacher models is as follows: after passing through the feature extractor Nf of the domain-specific teacher model, respective low-dimensional features FS and FT of the corresponding source domain data US and the target domain data UT are extracted; in order to ensure unbiased adaptability of the extracted features, two methods comprising a domain-level distribution alignment and a data-pair-level distribution alignment are used;S3-4-1: domain-level distribution alignmentbased on the pseudo-label triple loss and the maximum mean discrepancy, the domain adaptation of each teacher model is performed; in a training process, a distance between the source domain and the target domain in the feature space is reduced by minimizing a maximum mean discrepancy (MMD) loss, so as to achieve the domain-level alignment, and a specific formula is as follows:
16. The multi-source domain adaptive EEG emotional state classification method based on knowledge distillation according to claim 4, wherein the performing domain adaptation on the feature extractors of the plurality of single-source domain teacher models is as follows: after passing through the feature extractor Nf of the domain-specific teacher model, respective low-dimensional features FS and FT of the corresponding source domain data US and the target domain data UT are extracted; in order to ensure unbiased adaptability of the extracted features, two methods comprising a domain-level distribution alignment and a data-pair-level distribution alignment are used;S3-4-1: domain-level distribution alignmentbased on the pseudo-label triple loss and the maximum mean discrepancy, the domain adaptation of each teacher model is performed; in a training process, a distance between the source domain and the target domain in the feature space is reduced by minimizing a maximum mean discrepancy (MMD) loss, so as to achieve the domain-level alignment, and a specific formula is as follows:
17. The multi-source domain adaptive EEG emotional state classification method based on knowledge distillation according to claim 5, wherein the performing domain adaptation on the feature extractors of the plurality of single-source domain teacher models is as follows: after passing through the feature extractor Nf of the domain-specific teacher model, respective low-dimensional features FS and FT of the corresponding source domain data US and the target domain data UT are extracted; in order to ensure unbiased adaptability of the extracted features, two methods comprising a domain-level distribution alignment and a data-pair-level distribution alignment are used;S3-4-1: domain-level distribution alignmentbased on the pseudo-label triple loss and the maximum mean discrepancy, the domain adaptation of each teacher model is performed; in a training process, a distance between the source domain and the target domain in the feature space is reduced by minimizing a maximum mean discrepancy (MMD) loss, so as to achieve the domain-level alignment, and a specific formula is as follows:
18. The multi-source domain adaptive EEG emotional state classification method based on knowledge distillation according to claim 3, wherein the cross-entropy loss is used as an evaluation index of a classification result of the label classifier in the source domain and the target domain, a source classification loss (SCL) is used as a classification loss in the source domain, and a target classification loss (TCL) is used as a classification loss in the target domain; in the source domain, there is a real label, so that the SCL uses the real label and the classification result of the label classifier as comparison objects, and a specific formula is as follows:
19. The multi-source domain adaptive EEG emotional state classification method based on knowledge distillation according to claim 3, wherein a goal optimization and training of the single-source domain teacher model are as follows: in a domain adaptation stage of the teacher model, a final optimization goal is shown in a following formula:
20. The multi-source domain adaptive EEG emotional state classification method based on knowledge distillation according to claim 3, wherein a training of the student model is as follows: after domain adaptation of the teacher models, a plurality of single-source domain models are obtained, in order to better merge the knowledge of the teacher models, a voting-based method is used to select the knowledge of the teacher models to be merged, expressed as a following formula:

Priority Claims (1)

Number	Date	Country	Kind
202310802378.2	Jun 2023	CN	national

MULTI-SOURCE DOMAIN ADAPTIVE ELECTROENCEPHALOGRAM (EEG) EMOTIONAL STATE CLASSIFICATION METHOD BASED ON KNOWLEDGE DISTILLATION

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (1)