The present invention relates to federated learning, evolutionary algorithms and adaptive intelligent algorithms, and in particular to a federated large model adaptive learning system.
It is difficult for many companies to share big data directly because of reasons such as confidentiality of industry data. Under this need, federated learning with privacy protection ability is highly valued by domestic and foreign scholars. Qiang Yang has proposed secure federated learning in “Federated Learning with Privacy-preserving and Model IP-right protection”, and used the encryption technology to protect data to ensure the security protection of the data during transmission and storage. Guoyi Shi has reviewed privacy protection research in the federated learning framework in detail and explored the advantages and disadvantages of the existing privacy protection technologies and potential solutions in “Privacy preservation in federated learning: An insightful survey from the GDPR perspective”. Bin Cao has proposed many federated learning algorithms and their evolutionary strategies in “Federated neural architecture search for medical data security”. WeifengLv has designed a cross-platform federated learning framework for order scheduling in “Fed-LTD: Towards Cross-Platform Ride Hailing via Federated Learning to Dispatch”, i.e., a plurality of platforms collaborate to make scheduling decisions without sharing their local data, to explore the challenges of privacy and efficiency.
In another aspect, degradation of equipment performance, such as aging and fault, may lead to the failure of AI models, which needs to design adaptive intelligent algorithms. The research of the adaptive intelligent algorithms has appeared. In the field of evolutionary computation, Yun Li et al. have proposed an adaptive evolutionary algorithm in “Adaptive particle swarm optimization” to automatically control algorithm parameters in the evolutionary process. In the field of adaptive deep learning, Yining Dong has proposed a deep dynamic adaptive transfer network in “Deep dynamic adaptive transfer network for rolling bearing fault diagnosis with considering cross-machine instance”, which realizes dynamic adaptive model update. Islam has proposed an automatic machine learning co-exploration framework EVE in “EVE: Environmental Adaptive Neural Network Models for Low-power Energy Harvesting System”, which adaptively selects candidate models.
There is slightly less work that combines federated learning with adaptive and evolutionary ideas. Bin Cao has proposed a heterogeneous large-scale multiobjective federated neuroevolutionary strategy based on a new idea in combination with federated learning and evolutionary ideas in “Large-Scale Multiobjective Federated Neuroevolution for Privacy and Security in the Internet of Things”. Chunhua Xiao has proposed a sparse network model evolutionary algorithm for federated learning in “CBFL: A Communication-Efficient Federated Learning Framework From Data Redundancy Perspective”. Zehui Zhang has proposed an adaptive model aggregation solution for federated learning in “an adaptive federated deep learning algorithm for non-independent identically distributed data”.
With respect to the problem that it is difficult to combine the federated learning algorithm, the evolutionary algorithm and the adaptive intelligent algorithm, the present invention forms a unified federated large model adaptive learning system by sorting out the relationship among the three and combining them organically. The present invention analyzes the task characteristics of large and mini models to design multiple optimization objectives, such as generalization ability, model accuracy, etc. respectively, combines the evolutionary ideas with the adaptive intelligent algorithms; and designs a gradient scaling method to further unify the federated learning, the evolutionary ideas and the adaptive intelligent algorithms to form the federated large model adaptive learning system.
The purpose of the present invention is to construct a federated large model adaptive learning system, which generates an AI model on the premise of reducing the risk of data privacy leakage, and achieves the accurate adaptive update of the AI model through a small amount of the latest data. The contents of the present invention comprise: constructing an adaptive mini model for incremental learning; proposing a gradient scaling method for data privacy protection under federated learning; revealing a correlation between the generalization ability of the model and training data, and proposing a generalization ability evaluation function; designing multiple optimization objectives, updating and repairing the model adaptively through multiobjective evolutionary learning, and improving the usability of a large model.
The technical solution of the present invention is as follows:
A federated large model adaptive learning system can be widely applied to many pre-training large models, such as BERT, ChatGPT, etc., and has the functional properties of effectively improving the universality, functionality and high efficiency of the pre-training large models. The federated adaptive learning system is mainly composed of a mini model adaptive update module, a BERT large model and mini model normalization module, a BERT large model adaptive update module and a system privacy protection module; and the process of applying the federated adaptive learning system to the pre-training large models will be described successively in the order of each module (taking BERT as an example, but the applying object of the learning system is not limited to BERT).
For the problem of difficulty in interaction between BERT large models and mini models, the mini model adaptive update module is designed. Through the adaptive update of the mini models, the performance of the BERT large models can be improved, e.g., improving model accuracy and reducing calculation overhead. Considering three optimization directions of mini model accuracy, mini model forgetting rate and mini model error, the mini model adaptive update module establishes adaptive criteria through the above optimization directions:
From the perspective of universality, in the mini model adaptive update, the accuracy of the mini models will determine the universality of the BERT large models. Therefore, a mini model accuracy submodule is proposed, expressed as follows:
wherein Cm represents the average accuracy value of the mini model after an m incremental stage, and ti represents the accuracy value corresponding to an i stage.
From the perspective of functionality, in the mini model adaptive update, the mini model forgetting rate determines the convergence property of the mini models, and further determines the convergence of the BERT large models. Therefore, the mini model forgetting rate submodule is designed, expressed as follows:
From the perspective of high efficiency, in the mini model adaptive update, the error gradient directly determines the efficiency of the BERT models. Therefore, a mini model error gradient submodule is designed, expressed as follows:
In the process of the mini model adaptive update, mini model gradient information is generated continuously. Therefore, the BERT large model and mini model normalization module will realize normalization of the BERT large modules and the mini models in the gradient information to establish a basis for the implementation of the BERT large model adaptive update module. The gradient information has two properties of size and direction. The privacy protection principle of federated learning allows the mini models to transmit the gradient information to the BERT large models, and the gradient information generated by the mini models is used for feedback learning of the BERT large models. However, there is a huge difference in the number of parameters between the large BERT models and the mini models, and the gradient of the mini models cannot be directly used by the BERT large models. Therefore, a method based on gradient scaling is proposed to assess the difference in the number of parameters between the mini models and the BERT large models and priori knowledge, so as to establish the corresponding relationship between the gradient values of the mini models and the gradient values of the BERT large models. The gradient scaling method is expressed as follows:
wherein Tgrad′ represents the corresponding gradient value of the mini models on the large models, tgrad′ represents the gradient value of the mini models, Trad is the priori gradient value of the large models, tgrad is the priori gradient value of the mini models, Tn is the number of parameters of the large models, and tn is the number of parameters of the mini models.
In the BERT large model and mini model normalization module, the normalization of the mini model gradient information and the BERT large model gradient information is realized. The BERT large model adaptive update module determines the gradient information normalization from two aspects of the generalization ability and gradient fitting to help the BERT large models for adaptive update.
In order to monitor the gap of the learning direction between the BERT large models and mini models, the consistency of collaborative learning directions between the BERT large models and mini models is maintained; through a distributed perception method, the local data is firstly used for measuring the deviation value of the large models preliminarily, and then making secondary measurement in combination with the online data at an edge server. Therefore, a generalization assessment method is proposed for assisting the adaptive learning function of the federated adaptive learning system, expressed as follows:
wherein f(x,y) is called a generalization evaluation function, and g(x,y) is called a distributed perceived similarity function; x and y are the evaluation results of the large models and the mini models respectively; x and y are one-dimensional vectors; μx and μy dare the average values of the two respectively; σx and σy are the variances of the two respectively; and a, are the covariances of the two; δ1 and δ2 are two minimal constants respectively, to prevent σx,y denominator from being 0; α is a scaling factor with a value range of [10,20] to ensure that the range of f(x,y) is between (0,1); λ is a normalization constant which limits the range of the domain of definition;
Through the generalization assessment method, the generalization ability is expressed as follows:
wherein fi(x,y) is a generalization ability value in different tasks; n is the number of the tasks; C is a constant having a value between [0,1], and C constrains the differences in the assessment of the generalization ability among different tasks;
Because the BERT large model cannot obtain the latest data, starting from gradient information, the gradient of the BERT large model is fitted with the mini model as much as possible, to indirectly learn the features of the latest data. The gradient fitting is expressed as follows:
In the mini model adaptive update module, the BERT large model and mini model normalization module and the BERT large model adaptive update module, the federated adaptive learning system is applied to the BERT large model. The system privacy protection module is combined in the federated adaptive learning system, and the federated adaptive learning system is placed in a privacy protection mechanism. The system privacy protection module realizes the privacy protection of the federated adaptive learning system, comprising a noise adding mechanism and an approximate weight matrix average vale mechanism.
With respect to the characteristic that the interactive learning process of the BERT large models and mini models is private, the noise adding mechanism is proposed to hide the contribution of a single client in the aggregation by a method of adding after subsampling to hide in the entire distributed learning process. The specific implementation is as follows:
In random subsampling, the total number of clients is denoted as K; in each round of communication, a random subset Zt with a size T is extracted, and a subscript t represents the number of current rounds; then an administrator distributes a central model of the current round to each client; the central model of the current round is denoted as Wt; the data of the central model is optimized by a customer; each independent client in Zt has a different client model {Wk}k=0T; and Gaussian noise operation is added to each client:
wherein
At the end of each round of communication t, the update Δ
With respect to the characteristic that the interactive learning process of the BERT large models and mini models is complicated, a method for approximating the average value of the weight matrix by distorting the sum of all updates by a Gaussian mechanism is proposed. The method uses the Gaussian mechanism to distort the sum of all the updates, and enhances certain operational sensitivity by using scaling versions instead of real updates. The specific implementation is as follows:
wherein Δ
wherein Laplace represents Laplace noise calculation and ε represents privacy budget.
The present invention has the beneficial effects that: the present invention takes model accuracy, a learning forgetting rate and error gradient as optimization objectives, and forms a multiobjective optimization incremental learning method. The corresponding relationship between the gradient values of the large models and the mini models is established through the gradient scaling method by using a federated learning privacy protection principle, and the gradient information generated by the mini models is transmitted back to the large models for learning to maintain collaborative learning of the large models and the mini models. The distributed perception method is introduced to monitor the learning direction gap of the large models and the mini models, and a multiobjective evolutionary algorithm is formed by combining the gradient fitting optimization objective, the generalization ability optimization objective and the model accuracy optimization objective so that the large models can be updated adaptively and the generalization ability of the large models is greatly improved.
The embodiments of the present invention are further described below in combination with the drawings of the description and the specific technical solution. (taking BERT as an example, but the applying object of the learning system is not limited to BERT).
Referring to
Referring to
As shown in
Referring to
Referring to
Gradient fitting optimization objective:
A series of tasks are created for the mini models, comprising classification and regression, and the loss function is changed to obtain dozens of tasks. The generalization ability of the mini models under different tasks can be assessed independently, so as to obtain the corresponding generalization ability value. According to the proposed adaptive collaborative control function, the generalization ability optimization objective can be obtained by taking the above dozens of index values as collaborative variables.
Generalization ability optimization objective:
wherein fi(x,y) is a generalization ability value in different tasks; n is the number of the tasks; C is a constant having a value between [0,1], and C constrains the differences in the assessment of the generalization ability among different tasks, to obtain more accurate optimization results.
Higher classification accuracy indicates better performance of the large models in the classification tasks, which can help to assess the classification ability of the large models for different types of samples. As shown in Table 1, benefited from the gradient fitting optimization objective and the generalization ability optimization objective, the classification accuracy of BERT under the federated large model adaptive learning system in two data sets is significantly improved compared with the BERT original model.
Considering that there may be an imbalance problem of types in the data sets, it is necessary to measure the recall of the models. As shown in Table 2, the classification recall of BERT under the federated large model adaptive learning system in two data sets is also significantly improved compared with the original model. According to the classification accuracy and recall in Table 2, the federated large model adaptive learning system is effective in two different data sets.
Number | Date | Country | Kind |
---|---|---|---|
202311238148.4 | Sep 2023 | CN | national |