This application claims the benefit of priority from Chinese Patent Application No. 202410634652.4, filed on May 22, 2024. The content of the aforementioned application, including any intervening amendments thereto, is incorporated herein by reference in its entirety.
This application relates to data security mining, more particularly to a federated mining method and system for multimodal data based on multiple security policies.
Edge computing has become an important technological tool for multimodal data mining. Some existing studies show that completeness and availability of multimodal data provide data support for data mining in various industries. However, some studies have also found difficulties and problems in edge computing. For example, model robustness has been a key challenge to be addressed in edge computing. There are risks such as data theft, illegal access, tampering, lack of transparency of edge nodes, and other privacy disclosures during edge computing.
In order to cope with various security risks, security solutions based on anonymous authentication, differential privacy, encryption, access control, and identity authentication suitable for edge computing have been proposed. Although these privacy protection methods have specific edge computing privacy protection advantages, they still pose security threats in practical applications. For example, distributed anonymization methods usually create an anonymous region in a virtual location to protect the data of edge nodes. These methods are vulnerable to privacy attacks by attackers incorporating patients' background knowledge, leading to privacy disclosure.
In edge computing research that integrates differential privacy protection theory, the differential privacy protection method provides a generalized privacy protection framework for distributed machine learning. These noise interferences mainly increase the stochastic gradient descent process of machine learning. However, the objective functions in the study are mainly non-convex objective functions, which will increase the computational burden of the model, belong to local optimization, and affect data privacy security and utility. Meanwhile, in edge computing using fusion encryption methods, the models usually need to consume a large amount of computational cost, which increases the burden of model computation and reduces the efficiency of model training and usage. In addition, the access control and authentication are directly applied to edge computing, and too much reliance on trusted third parties increases the computation and storage costs of edge nodes and increases the risk of data leakage and management difficulties. Meanwhile, existing research ignores the quality and security screening and verification of edge nodes, which leads to the vulnerability of edge nodes to threats such as cloning attacks and key theft, which is a key obstacle to further research on edge computing applications.
Therefore, verifying the security of edge nodes and realizing secure access to the nodes is the key to improving security, as well as a hot issue in current research.
In view of the deficiencies in the prior art, this application provides a federated mining method for multimodal data and system based on multiple security policies. Technical solutions of this application are described as follows.
In a first aspect, this application provides a federated mining method for multimodal data based on a multi-security policy, comprising:
In an embodiment, the multiple authentication mechanism is designed as follows.
First verification: before performing the federated computation, distributed edge nodes are credibly verified by using a lightweight verification method based on random forest; and nodes that pass verification can carry out local computation, and nodes that do not pass verification cannot carry out local computation; and
Second validation: after the local federated computation is finished, local model reputation evaluation S is computed, which is expressed as:
S=|F−E|
F represents F1_score, which is a commonly-used evaluation index for machine learning models; F1_score is a harmonic average of precision and recall rates of the model; E represents the error rate; and S∈[0,1]; the higher S value is, the better the performance of the model is; and local nodes are ranked in accordance with S value from the highest to the lowest, and the top 50% of local nodes are selected to participate in the model aggregation process.
In an embodiment, the nodes participating in the federated computation first undergo trusted verification. It is judged whether the number of nodes passing the trusted verification satisfies the condition: whether the number of nodes passing the trusted verification exceeds the current network maximum carrying capacity: if yes, the second node selection verification is carried out; otherwise, the second node selection verification is not carried out.
In an embodiment, the generalized multimodal data feature fusion model based on the multi-head attention mechanism comprises three parts: feature extraction, multimodal feature fusion based on a self-attention mechanism, and multimodal data classification based on the self-attention mechanism.
In an embodiment, the feature extraction comprises:
(1) Image features: key features of image data are extracted using a three-dimensional Convolutional Neural Network (3D-CNN) model. Firstly, the image is preprocessed by scaling, cropping, and normalization to meet input requirements of the 3D-CNN model. Secondly, the models of 3D convolutional layer, 3D pooling layer, normalization layer, normalization layer, activation layer and fully connected layer are constructed. Then, image sequence is taken as the input, and the spatio-temporal features of the image are effectively acquired by sliding the 3D convolution kernel. Then, the pooling layer is introduced to reduce the size of the feature image and the number of parameters, enhance the position invariance of the model, and improve the generalization ability of the model. Finally, through a series of convolutions, pooling and other operations, the target feature information that meets the research needs can be better extracted to express the basic structure and changes of the image, providing more accurate results for model prediction.
(2) Audio features: audio signal features are extracted by using an OpenSmile model. Firstly, the audio signal is preprocessed to meet the requirements of OpenSmile input data. Secondly, the configuration file is loaded to describe the time-domain features, frequency-domain features, filter bank features, advanced frequency-domain features, spectral correlation features, and to specify the set of to-be-extracted audio features and related parameters. Then, OpenSmile is used to automatically extract the audio features based on the relevant settings of the profile. Finally, the extracted audio features are stored in the specified format, where one row represents one sample, and one column represents one feature.
(3) Text features: text data features are extracted by Word2Vec. Firstly, a text corpus capable of storing large-scale text data is constructed. Secondly, the text data in the corpus is segmented using the Natural Language Toolkit (NLTK) tool, which segments the text into individual words or phrases, and further constructs a vocabulary list. The vocabulary list contains all non-repeated words, and each word is assigned with a unique identifier. Then, the Word2Vec model is trained to learn word vectors using the prepared participle data and the vocabulary list. During training, the model predicts the target word by words near the context or predicts the context word by the target word. After training, the trained Word2Vec model is used to extract word vector features from the text data. Finally, the word vectors of all words in the text data set are averaged or weighted to obtain feature representations of the entire text.
In an embodiment, the multimodal feature fusion method based on the self-attention mechanism is as follows.
(A) After the feature extraction is completed, the input N-dimensional modal data is simply spliced to splice the multimodal data into one piece of data.
The input N-dimensional modal data, after the completion of feature extraction, {XA, XB, . . . , XN} corresponds to different modal data, and {d1, d2, . . . , dN} is used to represent different modal data embeddings:
After splicing:
In the fully connected layer, dfusion=da+db+ . . . +dN; Ffusion∈Rn·d
In the above formula, F represents output; W represents weight; X represents input, and b represents bias.
Q, K, and V represent the parameters of the linear projection layer. It should be noted that Q, K, and V represent the parameter matrices Query, Key, and Value within the attention mechanism. The input sequences are passed through three different linear transformation layers to obtain the Query, Key, and Value matrices, respectively. Q, K, and V are expressed in terms of the self-attention mechanism as follows:
(B) Computation of correlation scores: for each position in the data sequence, the correlation score between one position and other positions in the data sequence is computed. The correlation is usually calculated using dot product, scaled dot product or bilinear function.
The similarity relationship between the data is defined as r. The similarity between the data can be calculated by parameters Q and K, which is expressed as:
(C) Weight assignment: the correlation scores are normalized by Softmax to obtain the attention weights of each position relative to the other positions. These weights indicate the dependence degree of the model on other positions when generating the current position representation.
Defining the correlation weights of the Q and K parameters as wij, the correlation weights of these two features can be calculated using the softmax function as follows:
Where Q and K are the parameter matrices; and dx represents the dimension of a matrix K.
(D) Weighted summation: the calculated attention weights are used to weight and sum the representations of all positions to obtain the final self-attention representation. This representation will take into account the information of each position in the entire input sequence and assign different weights according to the importance.
The final feature of the multimodal data is expressed as:
(E) The data features are performed with dimensionality reduction according to the calculated weights, and the primary features are retained to complete the multimodal data fusion.
In an embodiment, the multimodal data classification is designed as follows: data from edge nodes are classified using a multilayer perceptron (MLP).
In an embodiment, the adaptive perturbation mechanism based on cyclic correlation analysis and differential privacy is designed as follows:
A set of random training samples Di in D is used for training. The total correlation value between i parameters during parameter download is expressed as:
The average value of the correlation analysis results is expressed as:
It should be noted that D represents the total dataset; Di represents the subset of random samples in D; j represents the number of training rounds in the range [1, 1]; i represents the parameter in the range [0, N]; N represents the total number of parameters; Rel represents the correlation computation function, including but not limited to the Pearson correlation analysis function.
The current parameter set is performed with differential privacy protection according to the correlation, and Gaussian noise is used to add noise processing, and the stronger the correlation is, the smaller the Gaussian noise is.
The correlation coefficient ρ is expressed as:
The noise & is expressed as:
ε is a noise value, and ε∈(0,1).
In a second aspect, this application further provides a federated mining system for multimodal data comprising:
Compared to the prior art, this application has the following beneficial effects.
This application reduces the training cost by designing a multiple authentication mechanism for federated learning edge nodes to select secure sub-models.
This application proposes a multimodal data feature fusion method based on a multi-head attention mechanism as a generalized model for multimodal data computation, thereby reducing the computational burden.
This application introduces an adaptive perturbation mechanism based on circular correlation analysis, which can dynamically adjust the range of added noise by adding a small amount of noise to model parameters with high correlation and a small amount of noise to model parameters with low correlation.
Of course, technical solutions of this application do not necessarily need to achieve all the advantages described above at the same time.
In order to illustrate the technical solutions in the embodiments of the present disclosure more clearly, the drawings required in the description of the embodiments will be briefly described below. Obviously, presented in the drawings are merely some embodiments of the present disclosure, which are not intended to limit the disclosure. For those skilled in the art, other drawings may also be obtained according to the drawings provided herein without paying creative efforts.
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings of the present disclosure. Described below are merely some embodiments of the disclosure, which are not intended to limit the disclosure. For those skilled in the art, other embodiments obtained based on these embodiments without paying creative efforts should fall within the scope of the disclosure.
In order to reduce the computational burden of the model, and to verify the security of the edge nodes and realize the security of the access to the edge nodes is a hot issue in the current research. In order to solve the above-mentioned technical problems, referring to
Referring to
(S1) A federated learning framework is used as a data mining model for distributed data mining.
(S2) The multiple authentication mechanism is designed, and local nodes are selected to participate in the federated computation to obtain authenticated local edge nodes and aggregate the dataset.
The multiple authentication mechanism is designed as follows.
First verification: before the start of the federated computation, distributed edge nodes are credibly verified by using a lightweight verification method based on random forest; and nodes that pass the verification can perform local computation, and nodes that do not pass the verification cannot perform local computation.
Second validation: after the local federated computation is finished, the value of local model reputation evaluation S is computed, which is expressed as:
S=|F−E|
F represents F1_score, which is a commonly-used evaluation index for machine learning models; F1_score is a harmonic average of precision and recall rates of the model; E represents the error rate; and S∈[0,1]; the higher S value is, the better the performance of the model is; and local nodes are ranked in accordance with S value from the highest to the lowest, and the top 50% of local nodes are selected to participate in the model aggregation process.
The nodes participating in the federated computation first undergo trusted verification. It is judged whether the number of nodes passing the trusted verification satisfies the condition: whether the number of nodes passing the trusted verification exceeds the current network maximum carrying capacity: if yes, the second node selection verification is carried out; otherwise, the second node selection verification is not carried out.
(S3) The local nodes passing the validation are performed with multimodal data fusion and classification by using the designed generalized multimodal data feature fusion model based on the multi-head attention mechanism.
The generalized multimodal data feature fusion model based on the multi-head attention mechanism includes three parts: feature extraction, multimodal feature fusion based on the self-attention mechanism, and multimodal data classification.
Specifically, feature extraction includes:
Image features: key features of image data are extracted using the 3D-CNN model.
Audio features: audio signal features are extracted using the OpenSmile model; and the extracted audio signal features are stored in a specified format.
Text features: text data features are extracted by Word2Vec.
In this embodiment, the multimodal feature fusion method based on the self-attention mechanism is as follows.
(a) After the feature extraction is completed, the input N-dimensional modal data is simply spliced to splice the multimodal data into one piece of data.
(b) Computation of correlation scores: for each position in the data sequence, the correlation score between one position and other positions in the data sequence is computed.
(c) Weight assignment: the correlation scores are normalized by Softmax to obtain the attention weights of each position relative to the other positions.
(d) Weighted summation: the calculated attention weights are used to weight and sum the representations of all positions to obtain the final self-attention representation.
(e) The data features are performed with dimensionality reduction according to the calculated weights, and the primary features are retained to complete the multimodal data fusion.
Specifically, the multimodal data classification is designed: using the multilayer perceptron to classify data from edge nodes.
(S4) In addition, the adaptive perturbation mechanism based on cyclic correlation analysis and differential privacy is designed to add noise round by round and dynamically.
The adaptive perturbation mechanism based on cyclic correlation analysis and differential privacy is designed as follows.
The correlation between the upload and download parameters and the calculation results in each round is calculated. The parameters in the current parameter set are protected with differential privacy according to the correlation, and noise is added by Gaussian noise.
In an embodiment, based on the same inventive conception as the above-described method, a federated mining system for multimodal data based on a multi-security policy is also provided. The federated mining system includes a memory, a processor, and a computer program. The computer program is stored in the memory and runs on the processor. When executing the computer program, the processor implements the above-described federated mining method for multimodal data.
Ablation Experiment with and without the Attention Mechanism
In order to verify the performance of the proposed federated mining method for multimodal data based on multiple security policies, ablation experiments were performed on bimodal and trimodal datasets.
The comparison experimental results of the proposed method on the trimodal CMU-MPSEI dataset were shown in
Ablation Experiments with and without Noise
In order to verify the performance of the proposed method under noise perturbation, bimodal and trimodal ablation experiments with and without noise perturbation were performed.
In order to evaluate the performance of the multimodal feature fusion model with the attention mechanism, the multimodal feature fusion model with the attention mechanism was compared to the Low-rank Multimodal Fusion (LWF) that added the attention mechanism. The comparison results were shown in
The results showed that the overall performance of the proposed method was more stable, and the training loss value decreased and converged rapidly, which was significantly better than MMTM and NHF; and the proposed method has better model prediction and data fitting. The above comparative analysis showed that the proposed method could effectively and accurately realize multimodal feature fusion in bimodal Parkinson disease dataset and trimodal CMU-MPSEI dataset.
In order to verify the performance of the proposed method under sub-model screening, experiments were conducted on bimodal and trimodal datasets. The experimental results were shown in
In order to ensure the quality of large-scale edge nodes under the federated learning framework and to reduce the security threat of low-quality edge nodes, this disclosure adopted the round-by-round correlation analysis method to screen the sub-models. Compared with the traditional method, the round-by-round iterative sub-model selection and parameter updating method were more suitable for the multimodal federated computing framework.
In the traditional federated learning sub-model filtering method, C was the key parameter indicating that the edge nodes were filtered according to a certain probability. Experiments were conducted to compare the filtering probabilities of 0.2, 0.5, and 1 in common sub-models with the proposed bimodal and trimodal sub-model filtering methods based on round-by-round correlation analysis.
This disclosure analyzed the commonly used sub-models with client screening probabilities C=0.2, C=0.5 and C=1 with the proposed bimodal and trimodal sub-model screening methods.
The experimental results showed that the higher the value of C was, the higher the accuracy was. However, the number of edge nodes that needed to upload parameters also increased, and the communication loss between the local model and central model also increased. Therefore, selecting an appropriate C was a practical model optimization scheme. However, the accuracy of the proposed method from the beginning of the iteration was higher than that of the conventional method, because the method could select the appropriate model to participate in uploading the parameters by the round-by-round iteration accuracy and response delay function.
Described above are merely preferred embodiments of the disclosure, which are not intended to limit the disclosure. It should be understood that any modifications and replacements made by those skilled in the art without departing from the spirit of the disclosure should fall within the scope of the disclosure defined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
202410634652.4 | May 2024 | CN | national |