This application relates to the field of artificial intelligence technologies, and in particular, to a model training method, a related system, and a storage medium.
With development of artificial intelligence, a concept of “federated learning” is proposed, so that both federated parties can perform model training to obtain model parameters without providing their own data, and data privacy leakage can be avoided.
Horizontal federated learning, also known as feature-aligned federated learning, is to perform joint machine learning on data with same data features but different users on clients when data features of the clients are much overlapped (that is, data features are aligned) but users are less overlapped. Horizontal federated learning applies to two scenarios: a standard scenario and a non-intersecting scenario. In the standard scenario, labeled data participating in model training is stored on a client, that is, standard supervised training is performed on the client. In the non-intersecting scenario, labeled data participating in model training is stored on a server, and a large amount of unlabeled data is stored on the client. In the non-intersecting scenario, many data labeling tasks need to be processed by personnel with relevant professional knowledge. For example, for a mobile phone application software for yoga posture correction, it is difficult for an ordinary person to determine whether a yoga posture of the ordinary person is completely correct. Therefore, even if a user is willing to label all picture data for a service provider, the service provider can only hire a professional yoga practitioner to label related data.
In current horizontal federated learning, for the non-intersecting scenario, it is usually assumed that a large amount of labeled data exists on the client, and it can be ensured that a training mode of horizontal federated learning is used for model training. However, in an actual case, a small amount of or even no labeled data usually exists on the client, and it is actually difficult to require the client to label the data. Therefore, it is difficult to obtain a high-quality model by using the existing training mode of horizontal federated learning.
This application discloses a model training method, a related system, and a storage medium, to improve a feature extraction capability of a model on unlabeled data.
According to a first aspect, an embodiment of this application provides a model training system. The model training system includes a server and a client, the server maintains labeled data, and the client maintains unlabeled data. The client is configured to train a first model based on the unlabeled data, to obtain a parameter of the first model; the client is further configured to send a parameter of a first subnet in the first model to the server, where the first model further includes a second subnet; the server is configured to train a second model based on the parameter of the first subnet reported by the client and the labeled data, to update a parameter of the second model, where the second model includes the first subnet and a third subnet, and the third subnet corresponds to the second subnet; the server is further configured to send an updated parameter of the first subnet and an updated parameter of the third subnet to the client; and the client is further configured to obtain a target model based on the parameter of the first subnet and the parameter of the third subnet that are from the server, where the target model includes the first subnet and the third subnet.
According to this solution, the client performs training based on the unlabeled data, and then the server performs training based on the parameter of the first subnet reported by the client and the labeled data, and sends the updated parameter of the first subnet and the updated parameter of the third subnet to the client. Further, the client obtains the target model based on the parameter of the first subnet and the parameter of the third subnet. This method ensures security of private data on the client, improves a feature extraction capability of the model on the unlabeled data, and reduces labor costs. In this solution, horizontal federated learning can be performed even when the labeled data exists only on the server and no labeled data exists on the client, to adapt to a real scenario lack of the labeled data.
That the client is configured to train a first model based on the unlabeled data, to obtain a parameter of the first model may be understood as that the client trains the first model based on the unlabeled data, to update the parameter of the first model.
In an embodiment, the first subnet may be configured to perform feature extraction on data input into the subnet.
Compared with the conventional technology in which the client sends all parameters obtained through training to the server, in this solution, communication overheads in a training process can be reduced to some extent because a small amount of data is transmitted.
In an embodiment, when sending the parameter of the first model to the server, the client is configured to send only the parameter of the first subnet in the first model to the server.
In an embodiment, the client is further configured to send a parameter other than the parameter of the first subnet in the first model to the server.
In an embodiment, a quantity of clients is K, K is an integer greater than 1, and the server is further configured to perform aggregation processing on parameters of K first subnets from the K clients, to obtain a processed parameter of the first subnet; and when training the second model of the server based on the parameter of the first subnet reported by the client and the labeled data, to update the parameter of the second model, the server is configured to train the second model of the server based on the processed parameter of the first subnet and the labeled data, to update the parameter of the second model.
In this method, the server performs training based on parameters of first subnets of a plurality of clients, to effectively improve the feature extraction capability of the model on the unlabeled data.
In an embodiment, the third subnet of the second model is configured to output a calculation result of the second model, and the second subnet of the first model is configured to output a calculation result of the first model. The third subnet of the second model has a different structure from that of the second subnet of the first model.
In an embodiment, the third subnet is a Classifier subnet, the second subnet is an MLM subnet, and the like.
In an embodiment, a parameter of the second subnet of the first model remains unchanged before and after training.
This method can reduce training overheads.
In an embodiment, the second model further includes a fourth subnet, and a parameter of the fourth subnet of the second model remains unchanged before and after training.
This method can reduce training overheads.
According to a second aspect, an embodiment of this application provides a model training method, applied to a server. The server maintains labeled data, and the method includes: training a second model based on a parameter of a first subnet reported by a client and the labeled data, to update a parameter of the second model, where the second model includes the first subnet and a third subnet; and sending an updated parameter of the first subnet and an updated parameter of the third subnet to the client.
According to an embodiment of the application, the server performs training based on the parameter of the first subnet reported by the client and the labeled data, and then sends the updated parameter of the first subnet and the updated parameter of the third subnet to the client. The parameter of the first subnet reported by the client is obtained by the client by performing training based on the unlabeled data. This method ensures security of private data on the client, improves a feature extraction capability of the model on the unlabeled data, and reduces labor costs. In this solution, horizontal federated learning can be performed even when the labeled data exists only on the server and no labeled data exists on the client, to adapt to a real scenario lack of the labeled data.
In an embodiment, a quantity of clients is K, K is an integer greater than 1, and the method further includes: performing aggregation processing on parameters of K first subnets from the K clients, to obtain a processed parameter of the first subnet; and the training a second model based on a parameter of a first subnet reported by a client and the labeled data, to update a parameter of the second model includes: training the second model based on the processed parameter of the first subnet and the labeled data, to update the parameter of the second model.
In this method, the server performs training based on parameters of first subnets of a plurality of clients, to effectively improve the feature extraction capability of the model on the unlabeled data.
In an embodiment, the server further maintains unlabeled data, and the training a second model based on a parameter of a first subnet reported by a client and the labeled data, to update a parameter of the second model includes: training a third model based on the parameter of the first subnet reported by the client and the unlabeled data, to update a parameter of the third model; and training the second model based on the parameter of the third model and the labeled data, to update the parameter of the second model.
This method can implement horizontal federated learning in a scenario in which the server maintains the labeled data and the unlabeled data, and further improves a feature extraction capability of the model and reduces labor costs.
According to a third aspect, an embodiment of this application provides a model training method, applied to a client. The client maintains unlabeled data, and the method includes: training a first model based on the unlabeled data, to obtain a parameter of the first model; sending a parameter of a first subnet in the first model to a server, where the first model further includes a second subnet; and obtaining a target model based on the parameter of the first subnet and a parameter of a third subnet that are from the server, where the target model includes the first subnet and the third subnet, and the third subnet corresponds to the second subnet.
According to an embodiment of the application, the client performs training based on the unlabeled data, reports the parameter of the first subnet to the server, and obtains the target model based on the parameter of the first subnet and the parameter of the third subnet that are from the server. The parameter of the first subnet and the parameter of the third subnet that are from the server are obtained by the server through training based on the parameter of the first subnet reported by the client and labeled data. This method ensures security of private data on the client, improves a feature extraction capability of the model on the unlabeled data, and reduces labor costs. In this solution, horizontal federated learning can be performed even when the labeled data exists only on the server and no labeled data exists on the client, to adapt to a real scenario lack of the labeled data.
In an embodiment, the client sends only the parameter of the first subnet in the first model to the server, and does not send a parameter other than the parameter of the first subnet in the first model to the server.
Compared with the conventional technology in which the client sends all parameters obtained through training to the server, in this solution, communication overheads in a training process can be reduced to some extent because a small amount of data is transmitted.
In an embodiment, the method further includes: sending a parameter other than the parameter of the first subnet in the first model to the server.
In an embodiment, a loss value used for performing unsupervised training is obtained based on the unlabeled data of the client and first data, the first data is obtained by inputting second data into the first model for processing, and the second data is obtained by performing masking on the unlabeled data.
According to an embodiment of the application, the client performs masking on the unlabeled data when performing unsupervised training, and calculates the loss value based on the unlabeled data of the client and the data obtained through masking. This method can improve a feature extraction capability of the model on the unlabeled data.
According to a fourth aspect, an embodiment of this application provides a model training method, applied to a client. The client maintains unlabeled data and labeled data, and the method includes: training a first model based on the unlabeled data, to obtain a parameter of the first model; training a fourth model based on the parameter of the first model and the labeled data, to obtain a parameter of the fourth model; sending a parameter of a first subnet and a parameter of a second subnet in the fourth model to a server; and updating the fourth model based on the parameter of the first subnet and the parameter of the second subnet that are from the server.
This method can implement horizontal federated learning in a scenario in which the client maintains the labeled data and the unlabeled data, and further improves a feature extraction capability of the model and reduces labor costs.
In an embodiment, the client sends only the parameter of the first subnet and the parameter of the second subnet in the fourth model to the server, and does not send a parameter other than the parameter of the first subnet and the parameter of the second subnet in the fourth model to the server.
In another embodiment, the method further includes: sending a parameter other than the parameter of the first subnet and the parameter of the second subnet in the fourth model to the server.
According to a fifth aspect, an embodiment of this application provides a model training apparatus. The apparatus includes: a training module, configured to train a second model based on a parameter of a first subnet reported by a client and labeled data, to update a parameter of the second model, where the second model includes the first subnet and a third subnet; and a sending module, configured to send an updated parameter of the first subnet and an updated parameter of the third subnet to the client.
In an embodiment, a quantity of clients is K, and K is an integer greater than 1. The apparatus further includes a processing module, configured to: perform aggregation processing on parameters of K first subnets from the K clients, to obtain a processed parameter of the first subnet. The training module is further configured to: train the second model based on the processed parameter of the first subnet and the labeled data, to update the parameter of the second model.
In an embodiment, the training module is further configured to: train a third model based on the parameter of the first subnet reported by the client and unlabeled data, to update a parameter of the third model; and train the second model based on the parameter of the third model and the labeled data, to update the parameter of the second model.
According to a sixth aspect, an embodiment of this application provides a model training apparatus. The apparatus includes: a training module, configured to train a first model based on unlabeled data, to obtain a parameter of the first model; a sending module, configured to send a parameter of a first subnet in the first model to a server, where the first model further includes a second subnet; and an updating module, configured to obtain a target model based on the parameter of the first subnet and a parameter of a third subnet that are from the server, where the target model includes the first subnet and the third subnet, and the third subnet corresponds to the second subnet.
In an embodiment, the sending module is further configured to send a parameter other than the parameter of the first subnet in the first model to the server.
In another embodiment, the sending module is configured to send only the parameter of the first subnet in the first model to the server, and does not send a parameter other than the parameter of the first subnet in the first model to the server.
According to a seventh aspect, an embodiment of this application provides a model training apparatus. The apparatus includes: a training module, configured to train a first model based on unlabeled data, to obtain a parameter of the first model, and train a fourth model based on the parameter of the first model and labeled data, to obtain a parameter of the fourth model; a sending module, configured to send a parameter of a first subnet and a parameter of a second subnet in the fourth model to a server; and an updating module, configured to update the fourth model based on the parameter of the first subnet and the parameter of the second subnet that are from the server.
In an embodiment, the sending module is further configured to send a parameter other than the parameter of the first subnet and the parameter of the second subnet in the fourth model to the server.
In an embodiment, the sending module is configured to send only the parameter of the first subnet and the parameter of the second subnet in the fourth model to the server, and does not send a parameter other than the parameter of the first subnet and the parameter of the second subnet in the fourth model to the server.
According to an eighth aspect, an embodiment of this application provides a model training apparatus, including a processor and a memory. The memory is configured to store program code, and the processor is configured to invoke the program code to perform the method.
According to a ninth aspect, this application provides a computer storage medium, including computer instructions. When the computer instructions are run on an electronic device, the electronic device is enabled to perform the method according to any one of the possible implementations of the second aspect and/or any one of the possible implementations of the third aspect and/or any one of the possible implementations of the fourth aspect.
According to a tenth aspect, an embodiment of this application provides a computer program product. When the computer program product runs on a computer, the computer is enabled to perform the method according to any one of the possible implementations of the second aspect and/or any one of the possible implementations of the third aspect and/or any one of the possible implementations of the fourth aspect.
It may be understood that the model training system according to the first aspect, the model training apparatus according to the fifth aspect, the model training apparatus according to the sixth aspect, the model training apparatus according to the seventh aspect, the model training apparatus according to the eighth aspect, the computer storage medium according to the ninth aspect, or the computer program product according to the tenth aspect are all configured to perform the method according to any one of the possible implementations of the second aspect and/or any one of the possible implementations of the third aspect and/or any one of the possible implementations of the fourth aspect. Therefore, for beneficial effects that can be achieved by the method, refer to beneficial effects in the corresponding method. Details are not described herein again.
The following describes accompanying drawings used in embodiments of this application.
The following describes embodiments of this application with reference to the accompanying drawings in embodiments of this application. Terms used in implementations of embodiments of this application are merely used to explain embodiments of this application, and are not intended to limit this application.
The model decomposition unit of the client is configured to decompose a model of the client into a plurality of subnets. The training unit of the client trains a decomposed model based on unlabeled data, and then the communication unit of the client sends a parameter of a first subnet obtained through training to the FL-server module.
The aggregation unit of the FL-server module performs aggregation processing on received parameters of first subnets sent by a plurality of clients, and then sends processed parameters of the first subnet to the FL-worker module. The model decomposition unit of the FL-worker module is configured to decompose a model of the server into a plurality of subnets. The training unit of the FL-worker module trains a decomposed model based on the processed parameter of the first subnet and the labeled data, to obtain an updated parameter of the first subnet and an updated parameter of the third subnet, and then sends the updated parameter of the first subnet and the updated parameter of the third subnet to the FL-server module. The FL-server module delivers the updated parameter of the first subnet and the updated parameter of the third subnet to each client, and the client obtains a target model based on the parameter of the first subnet and the parameter of the third subnet that are from the server. Further, the inference unit of the client may perform inference based on the target model.
After model parameters are initialized, unsupervised training of the client is started first. Then, unsupervised training of the client and supervised training of the server are alternately performed. The alternate procedure stops training after a preset condition is met. The preset condition may be that a quantity of iterations is met, or a loss value is less than a preset value. This is not specifically limited in this solution.
In an embodiment,
The first model may be any model, for example, a neural network model, a support vector machine model, or a decision tree model. The first model may correspond to a model 2 in
The training the first model may be performing unsupervised training.
In an embodiment, a loss value used for performing unsupervised training is obtained based on the unlabeled data of the client and first data, the first data is obtained by inputting second data into the first model for processing, and the second data is obtained by performing masking on the unlabeled data.
In an embodiment, unsupervised training may include the following operations.
First, a masking operation is performed on the unlabeled data.
The masking operation is a replacement operation performed on some values in an original data feature, and the replaced value may be a value or a learnable parameter.
For example, the unlabeled data is short message data [specially, selected, into, port, high, science, technology, facial, mask], the masking operation is performed on “into” in the short message data, and data obtained by performing the masking operation is [specially, selected, MASK, port, high, science, technology, facial, mask], and the like.
Then, the masked data is input into a model used for unsupervised training, to obtain an output result of the model.
A similarity function (that is, a loss function) is used to compare a similarity between the output result of the model and the foregoing unlabeled data.
A comparison result is input to an optimizer to update the model parameter.
The foregoing operations are repeatedly performed, and until a stop condition of unsupervised training is met, training is stopped.
The foregoing is merely an example of unsupervised training, and unsupervised training may alternatively be unsupervised training in another form. This is not specifically limited in this solution.
The training the first model to obtain the parameter of the first model may be understood as training the first model to update the parameter of the first model.
During model initialization, the client may perform a decomposition operation on the first model, to separately decompose parameters of subnets of the first model, to perform model training.
The first model includes a plurality of subnets, and the plurality of subnets include the first subnet and the second subnet.
The first subnet may be a subnet configured to perform feature extraction. The second subnet may be a subnet configured to output a calculation result of the first model.
For example, the first model may include an embedding Embedding subnet, a lightweight bidirectional encoding representation ALBERT subnet from a deformer, a masked language model (MLM) subnet, an adaptive moment estimation Adam optimizer subnet, and the like. Correspondingly, the first subnet is an ALBERT subnet, and the second subnet is an MLM subnet.
The subnet may be understood as a submodel. For example, the subnet may be an Embedding submodel, an ALBERT submodel, and an MLM submodel.
The first model is merely an example, and may alternatively be a model including another subnet. This is not specifically limited in this solution.
In an embodiment, the client sends a parameter other than the parameter of the first subnet in the first model to the server.
That is, the client sends the parameter of the first subnet in the first model to the server, and also sends a parameter other than the parameter of the first subnet. For example, a parameter of the second subnet is further sent, or a parameter of another subnet is further sent, or parameters of all other subnets may be further sent to the server. This is not specifically limited in this solution.
In an embodiment, the client sends only the parameter of the first subnet in the first model to the server.
That is, the client does not send a parameter other than the parameter of the first subnet to the server. Compared with the conventional technology in which the client transmits all parameters obtained through training to the server, in this solution, communication overheads in a training process can be reduced because a small amount of data is transmitted.
The second model may be any model, for example, a neural network model, a support vector machine model, or a decision tree model. The second model may correspond to a model 1 in
That the server trains a second model based on the parameter of the first subnet reported by the client and the labeled data, to update a parameter of the second model may be that the server replaces the parameter of the first subnet in the second model based on the parameter of the first subnet reported by the client, to update the parameter of the second model; and then trains the updated second model based on the labeled data, to update the parameter of the second model again.
The training the second model may be performing supervised training on the second model.
In an embodiment, for training of the server, refer to the following operations:
If the optimizer has a parameter, the parameter of the optimizer may alternatively be used for updating.
The foregoing operations are repeatedly performed, and until a stop condition of supervised training is met, performing is stopped. An updated parameter of the second model of the server is obtained.
The stop condition may be that a preset quantity of iterations is met, or a loss value meets a preset requirement. This is not specifically limited in this solution.
In an embodiment, when a quantity of clients is K, and K is an integer greater than 1, before operation 303, the method further includes the following operation:
The server performs aggregation processing on parameters of K first subnets from the K clients, to obtain a processed parameter of the first subnet.
Aggregation processing may be performing, based on a preset weight, weighted summation on parameters of the first subnets sent by the clients, to obtain the processed parameter of the first subnet.
The foregoing manner is merely an example. Certainly, processing may be performed in another form. This is not specifically limited in this solution.
Correspondingly, that the server trains a second model of the server based on the parameter of the first subnet reported by the client and the labeled data, to update a parameter of the second model includes the following operation:
The server trains the second model of the server based on the processed parameter of the first subnet and the labeled data, to update the parameter of the second model.
In an embodiment, the server replaces the parameter of the first subnet in the second model of the server based on the processed parameter of the first subnet, to update the parameter of the second model.
The server trains the updated second model based on the labeled data, to update the parameter of the second model again.
During model initialization, the server may perform a decomposition operation on the second model, to separately decompose parameters of subnets of the second model.
The second model includes a plurality of subnets, and the plurality of subnets include the first subnet and the third subnet.
The first subnet may be a subnet configured to perform feature extraction. The third subnet may be a subnet configured to output a calculation result of the second model.
For example, the second model may include an embedding Embedding subnet, a lightweight bidirectional encoding representation ALBERT subnet from a deformer, a classifier Classifier subnet, an Adam optimizer subnet, and the like. Correspondingly, the third subnet is a Classifier subnet.
The subnet may be understood as a submodel. For example, the subnet may be an Embedding submodel, an ALBERT submodel, and a Classifier submodel.
The second model is merely an example, and may alternatively be a model including another subnet. This is not specifically limited in this solution.
That the third subnet of the second model corresponds to the second subnet of the first model may be understood as that functions of the two are the same. For example, the third subnet of the second model is configured to output the calculation result of the second model, and the second subnet of the first model is configured to output the calculation result of the first model.
The third subnet of the second model has a different structure from that of the second subnet of the first model.
In an embodiment, the third subnet is a Classifier subnet, the second subnet is an MLM subnet, and the like.
In an embodiment, before operation 304, the method may further include the following operations.
The foregoing preset condition may be a quantity of times of repeatedly performing operations 301, 302, 303, 304-1, and 304-3. The client and the server may determine a quantity of stop times in advance, and when the quantity of repetition times is reached, training is stopped.
The foregoing preset condition may alternatively be that the loss value obtained by the server through calculation based on the loss function is less than a preset value, or the like. This is not specifically limited in this solution.
In an embodiment, if the preset condition is not met, the server sends only the updated parameter of the first subnet to the client, and the client updates the first model based on the parameter of the first subnet from the server, and repeatedly performs operations 301, 302, 303, 304-1, and 304-3 until the preset condition is met.
The target model may be used for inference.
According to this solution, the client performs training based on the unlabeled data, and then the server performs training based on the parameter of the first subnet reported by the client and the labeled data, and sends the updated parameter of the first subnet and the updated parameter of the third subnet to the client. Further, the client obtains the target model based on the updated parameter of the first subnet and the updated parameter of the third subnet. This method ensures security of private data on the client, improves a feature extraction capability of the model on the unlabeled data, and reduces labor costs. In this solution, horizontal federated learning can be performed even when the labeled data exists only on the server and no labeled data exists on the client, to adapt to a real scenario lack of the labeled data.
During model initialization, the server may perform a decomposition operation on the second model, to separately decompose parameters of subnets of the second model.
The second model includes a plurality of subnets, and the plurality of subnets include the first subnet and the third subnet.
The first subnet may be a subnet configured to perform feature extraction. The third subnet may be a subnet configured to output a calculation result of the second model.
For example, the second model may include an embedding Embedding subnet, a lightweight bidirectional encoding representation ALBERT subnet from a deformer, a classifier Classifier subnet, an Adam optimizer subnet, and the like. Correspondingly, the third subnet is a Classifier subnet.
The second model is merely an example, and may alternatively be a model including another subnet. This is not specifically limited in this solution.
In an embodiment, a quantity of clients is K, K is an integer greater than 1, and the method further includes:
performing aggregation processing on parameters of K first subnets from the K clients, to obtain a processed parameter of the first subnet.
Aggregation processing may be performing, based on a preset weight, weighted summation on parameters of the first subnets sent by the clients, to obtain the processed parameter of the first subnet.
The foregoing manner is merely an example. Certainly, processing may be performed in another form. This is not specifically limited in this solution.
Correspondingly, that the server trains a second model of the server based on the parameter of the first subnet reported by the client and the labeled data, to update a parameter of the second model includes the following operation:
The server trains the second model of the server based on the processed parameter of the first subnet and the labeled data, to update the parameter of the second model.
In an embodiment, the server replaces the parameter of the first subnet in the second model of the server based on the processed parameter of the first subnet, to update the parameter of the second model.
The server trains the updated second model based on the labeled data, to update the parameter of the second model again.
In an embodiment, the server may perform supervised training on the second model.
An output of the second model is obtained by inputting the labeled data into the second model used for supervised training. A similarity function (that is, a loss function) is used to compare a similarity between the output of the model and the labeled data. A comparison result is input to an optimizer to obtain the parameter used for model updating. If the optimizer has a parameter, the parameter of the optimizer may alternatively be used for updating.
The foregoing operations are repeatedly performed, and until a stop condition of supervised training is met, performing is stopped. An updated parameter of the second model of the server is obtained.
The stop condition of supervised training may be that a preset quantity of iterations is met, or a loss value meets a preset requirement. This is not specifically limited in this solution.
In an embodiment, the server further maintains unlabeled data. That is, the server may perform semi-supervised training.
Correspondingly, the training a second model based on a parameter of a first subnet reported by a client and the labeled data, to update a parameter of the second model includes:
A third model is trained based on the parameter of the first subnet reported by the client and the unlabeled data, to update a parameter of the third model; and the second model is trained based on the parameter of the third model and the labeled data, to update the parameter of the second model. When the stop condition of semi-supervised training of the server is met, operation 402 is performed.
If the foregoing stop condition of semi-supervised training is not met, the parameter of the third model is updated based on the parameter of the second model, the updated third model is trained based on the unlabeled data, to update the parameter of the third model again. The foregoing operations are repeatedly performed until the stop condition of semi-supervised training is met.
The stop condition of semi-supervised training may be that a preset quantity of iterations is met, or a loss value meets a preset requirement. This is not specifically limited in this solution.
In an embodiment, the second model further includes a fourth subnet, and a parameter of the fourth subnet remains unchanged before and after training.
For example, the fourth subnet may be an Embedding subnet.
That is, the parameter of the fourth subnet of the second model is delivered during initialization, and the parameter remains unchanged in a training process.
This method reduces training overheads.
In this process, if only the unlabeled data exists on the client, the client may perform unsupervised training; or if the labeled data also exists on the client, the client may perform semi-supervised training. This is not specifically limited in this solution.
Operation 402 is performed, so that the client obtains the target model and performs inference.
In an embodiment, before operation 402, the method further includes the following operations.
The foregoing preset condition may be a quantity of times of repeatedly performing operation 401, 402-1, and 402-3. The client and the server may determine a quantity of stop times in advance, and when the quantity of repetition times is reached, training is stopped.
The foregoing preset condition may alternatively be that the loss value obtained by the server through calculation based on the loss function is less than a preset value, or the like. This is not specifically limited in this solution.
In an embodiment, if the preset condition is not met, the server sends only the updated parameter of the first subnet to the client, and repeatedly performs operations 401, 402-1, and 402-3 until the preset condition is met.
In supervised training of the server, the server first performs data preprocessing on a short message text. Data preprocessing herein may be a word segmentation operation based on a word analyzer. The server inputs an output result of the word analyzer into a second model. In addition, the server inputs labeled data into a cross-entropy function, and then obtains a similarity through calculation based on the output of the second model. Then, the similarity is input into an optimizer, to update a parameter of the second model.
If a stop condition is not met, the server sends an updated parameter of the ALBERT subnet and an updated parameter of the Classifier subnet to a client. Then, training is performed again based on the parameter of the ALBERT subnet sent by the client. Until the stop condition is met, the server sends the updated parameter of the ALBERT subnet and the updated parameter of the Classifier subnet to the client, so that the client performs inference.
Currently, the server does not have a training engine. Therefore, during training, a model training simulation platform needs to be set up for model training. However, for example, in a real scenario in which a mobile phone interacts with the server, deployment cannot be implemented by using the method.
Based on this, an embodiment of this application further provides a server. The server includes a federated learning server FL-server module and a federated learning worker FL-worker module. Refer to
The federated learning worker FL-worker module is added to the server by using the foregoing method, so that a training task can be performed on the server. In this way, the server can directly perform model training, and model training efficiency is improved.
According to an embodiment of the application, the server performs training based on the parameter of the first subnet reported by the client and the labeled data, and then sends the updated parameter of the first subnet and the updated parameter of the third subnet to the client. The parameter of the first subnet reported by the client is obtained by the client by performing training based on unlabeled data. This method ensures security of private data on the client, improves a feature extraction capability of the model on the unlabeled data, and reduces labor costs. In this solution, horizontal federated learning can be performed even when the labeled data exists only on the server and no labeled data exists on the client, to adapt to a real scenario lack of the labeled data.
The first model may be any model, for example, a neural network model, a support vector machine model, or a decision tree model.
The training the first model may be performing unsupervised training.
In an embodiment, a loss value used for performing unsupervised training is obtained based on the unlabeled data of the client and first data, the first data is obtained by inputting second data into the first model for processing, and the second data is obtained by performing masking on the unlabeled data.
In an embodiment, unsupervised training may include the following operations.
First, a masking operation is performed on the unlabeled data. The masking operation is a replacement operation performed on some values in an original data feature, and the replaced value may be a value or a learnable parameter.
For example, the unlabeled data is short message data [specially, selected, into, port, high, science, technology, facial, mask], the masking operation is performed on “into” in the short message data, and data obtained by performing the masking operation is [specially, selected, MASK, port, high, science, technology, facial, mask].
Then, the masked data is input into a model used for unsupervised training, to obtain an output of the model.
A similarity function (that is, a loss function) is used to compare a similarity between the output of the model and the foregoing unlabeled data.
A comparison result is input to an optimizer to update the model parameter.
The foregoing operations are repeatedly performed, and until a stop condition of unsupervised training is met, training is stopped.
The foregoing is merely an example of unsupervised training, and unsupervised training may alternatively be unsupervised training in another form. This is not specifically limited in this solution.
The foregoing obtaining the parameter of the first model may be understood as training the first model to update the parameter of the first model.
During model initialization, the client may perform a decomposition operation on the first model, to separately decompose parameters of subnets of the first model.
The first model includes a plurality of subnets, and the plurality of subnets include the first subnet and the second subnet.
The first subnet may be a subnet configured to perform feature extraction. The second subnet may be a subnet configured to output a calculation result of the first model.
For example, the first model may include an embedding Embedding subnet, a lightweight bidirectional encoding representation ALBERT subnet from a deformer, a masked language model (MLM) subnet, an adaptive moment estimation Adam optimizer subnet, and the like. Correspondingly, the first subnet is an ALBERT subnet, and the second subnet is an MLM subnet.
The subnet may be understood as a submodel. For example, the subnet may be an Embedding submodel, an ALBERT submodel, and an MLM submodel.
The first model is merely an example, and may alternatively be a model including another subnet. This is not specifically limited in this solution.
In an embodiment, the client sends a parameter other than the parameter of the first subnet in the first model to the server.
That is, the client sends the parameter of the first subnet in the first model to the server, and also sends a parameter other than the parameter of the first subnet. For example, a parameter of the second subnet is further sent, or a parameter of another subnet is further sent, or parameters of all other subnets may be further sent to the server. This is not specifically limited in this solution.
In an embodiment, the client sends only the parameter of the first subnet in the first model to the server.
That is, the client does not send a parameter other than the parameter of the first subnet to the server. Compared with the conventional technology in which all parameters obtained through training are transmitted to the server, in this solution, communication overheads in a training process can be reduced because a small amount of data is transmitted.
The target model may be used for inference.
In an embodiment, before operation 503, the method may further include the following operations.
The foregoing preset condition may be a quantity of times of repeatedly performing operation 501, 502, 503-1, and 503-3. The client and the server may determine a quantity of stop times in advance, and when the quantity of repetition times is reached, training is stopped.
The foregoing preset condition may alternatively be that the loss value obtained by the client through calculation based on the loss function is less than a preset value, or the like. This is not specifically limited in this solution.
As shown in
In this process, if only labeled data exists on the server, the server performs supervised training; or if unlabeled data also exists on the server, the server performs semi-supervised training. This is not specifically limited in this solution.
In an embodiment, the first model further includes a fifth subnet, and a parameter of the fifth subnet remains unchanged before and after training. For example, the fifth subnet may be an Embedding subnet.
That is, the parameter of the fifth subnet of the first model is delivered during initialization, and the parameter remains unchanged in a subsequent training process.
In this method, the parameter of the fifth subnet remains unchanged in the training process, and training overheads are reduced.
In an embodiment, a parameter of a second subnet of the first model remains unchanged before and after training. For example, the second subnet may be an MLM subnet.
That is, the parameter of the second subnet of the first model is delivered during initialization, and the parameter remains unchanged in a subsequent training process. This method reduces training overheads.
According to an embodiment of the application, the client performs training based on the unlabeled data, reports the parameter of the first subnet to the server, and obtains a target model based on the parameter of the first subnet and the parameter of the third subnet that are from the server. The parameter of the first subnet and the parameter of the third subnet that are from the server are obtained by the server through training based on the parameter of the first subnet reported by the client and the labeled data. This method ensures security of private data on the client, improves a feature extraction capability of the model on the unlabeled data, and reduces labor costs. In this solution, horizontal federated learning can be performed even when the labeled data exists only on the server and no labeled data exists on the client, to adapt to a real scenario lack of the labeled data.
The foregoing embodiment is described by using an example in which the client performs unsupervised training. When the client maintains the labeled data and the unlabeled data, the client may perform semi-supervised training. The following describes semi-supervised training performed by the client. As shown in
In an embodiment, unsupervised training is performed on the first model, a loss value used for performing unsupervised training is obtained based on the unlabeled data of the client and first data, the first data is obtained by inputting second data into the first model for processing, and the second data is obtained by performing masking on the unlabeled data.
In an embodiment, unsupervised training may include the following operations.
First, a masking operation is performed on the unlabeled data.
The masking operation is a replacement operation performed on some values in an original data feature and the replaced value may be a value or a learnable parameter.
Then, the masked data is input into a model used for unsupervised training, to obtain an output of the model.
A similarity function (that is, a loss function) is used to compare a similarity between the output of the model and the foregoing unlabeled data.
A comparison result is input to an optimizer to update the model parameter.
The foregoing operations are repeatedly performed, and until a stop condition of unsupervised training is met, training is stopped.
For example, the client updates the parameter of the fourth model based on the parameter of the first model. Then, the fourth model performs supervised training based on the labeled data.
In an embodiment, after operation 602, the method further includes the following operation.
The foregoing stop condition of semi-supervised training on the client side may be a quantity of times of repeatedly performing operations 601, 602, and 6022, or the like. This is not specifically limited in this solution.
The fourth model includes the first subnet and the second subnet.
In an embodiment, a parameter other than the parameter of the first subnet and the parameter of the second subnet in the fourth model is sent to the server.
In an embodiment, only the parameter of the first subnet and the parameter of the second subnet in the fourth model are sent to the server.
Compared with the conventional technology in which the client sends all parameters obtained through training to the server, in this solution, communication overheads in a training process can be reduced to some extent because a small amount of data is transmitted.
In an embodiment, before operation 603, the method further includes the following operations.
According to an embodiment of the application, the client performs unsupervised training on the first model based on the unlabeled data, performs supervised training on the fourth model based on the labeled data, and then sends the parameters of the first subnet and the second subnet of the fourth model to the server. Then, the client performs update based on the parameters of the first subnet and the second subnet from the server. This method ensures security of private data on the client, improves a feature extraction capability of the model on the unlabeled data, and reduces labor costs. In this solution, horizontal federated learning can be performed even when the labeled data exists only on the server and no labeled data exists on the client, to adapt to a real scenario lack of the labeled data.
Communication connections between the memory 701, the processor 702, and the communication interface 703 are implemented through the bus 704.
The memory 701 may be a read-only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM).
The memory 701 may store a program. When the program stored in the memory 701 is executed by the processor 702, the processor 702 and the communication interface 703 are configured to perform operations of the model training method in embodiments of this application.
The processor 702 may be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), a graphics processing unit (GPU), or one or more integrated circuits, and is configured to execute a related program, to implement functions that need to be performed by units in the model training apparatus in embodiments of this application, or perform the model training method in the method embodiments of this application.
The processor 702 may alternatively be an integrated circuit chip and has a signal processing capability. In an embodiment, operations of the model training method in this application may be completed by using an integrated logic circuit of hardware in the processor 702 or an instruction in a form of software. The processor 702 may alternatively be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or another programmable logical device, a discrete gate or a transistor logic device, or a discrete hardware component. The processor may implement or perform the methods, operations, and logical block diagrams that are disclosed in embodiments of this application. The general-purpose processor may be a microprocessor, or the processor may be any conventional processor or the like. Operations of the methods disclosed with reference to embodiments of this application may be directly executed and accomplished by using a hardware decoding processor, or may be executed and accomplished by using a combination of hardware and software modules in the decoding processor. The software module may be located in a mature storage medium in the art, such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, an electrically erasable programmable memory, or a register. The storage medium is located in the memory 701. The processor 702 reads information in the memory 701, and completes, in combination with hardware of the processor 702, functions that need to be performed by units included in the model training apparatus in embodiments of this application, or performs the model training method in the method embodiments of this application.
The communication interface 703 is a transceiver apparatus, for example, but not limited to a transceiver, to implement communication between the apparatus 700 and another device or a communication network. For example, data may be obtained through the communication interface 703.
The bus 704 may include a path for transmitting information between the components (for example, the memory 701, the processor 702, and the communication interface 703) in the apparatus 700.
It should be noted that, although only the memory, the processor, and the communication interface are shown in the apparatus 700 shown in
Consistent with the foregoing embodiment, in another aspect, an embodiment of this application further provides a model training system. As shown in
In an embodiment, when sending the parameter of the first model to the server 801, the client 802 is configured to send only the parameter of the first subnet in the first model to the server 801, and does not send a parameter other than the parameter of the first subnet in the first model to the server 801.
In an embodiment, the client 802 is further configured to send a parameter other than the parameter of the first subnet in the first model to the server 801.
In an embodiment, a quantity of clients 802 is K, K is an integer greater than 1, and the server 801 is further configured to perform aggregation processing on parameters of K first subnets from the K clients 802, to obtain a processed parameter of the first subnet; and when training the second model of the server based on the parameter of the first subnet reported by the client and the labeled data, to update the parameter of the second model, the server 801 is further configured to train the second model of the server 801 based on the processed parameter of the first subnet and the labeled data, to update the parameter of the second model.
In an embodiment, the third subnet of the second model is configured to output a calculation result of the second model, and the second subnet of the first model is configured to output a calculation result of the first model. The third subnet of the second model has a different structure from that of the second subnet of the first model.
As shown in
The training module 8031 is configured to train a second model based on a parameter of a first subnet reported by a client and labeled data, to update a parameter of the second model. The second model includes the first subnet and a third subnet.
The sending module 8032 is configured to send an updated parameter of the first subnet and an updated parameter of the third subnet to the client.
In an embodiment, a quantity of clients is K, and K is an integer greater than 1. The apparatus further includes a processing module, configured to: perform aggregation processing on parameters of K first subnets from the K clients, to obtain a processed parameter of the first subnet. The training module is further configured to: train the second model based on the processed parameter of the first subnet and the labeled data, to update the parameter of the second model.
In an embodiment, the training module 8031 is further configured to: train a third model based on the parameter of the first subnet reported by the client and unlabeled data, to update a parameter of the third model; and train the second model based on the parameter of the third model and the labeled data, to update the parameter of the second model.
As shown in
In an embodiment, the sending module 8042 is further configured to send a parameter other than the parameter of the first subnet in the first model to the server.
In another embodiment, the sending module 8042 is further configured to send only the parameter of the first subnet in the first model to the server, and does not send a parameter other than the parameter of the first subnet in the first model to the server.
As shown in
In an embodiment, the sending module 8052 is further configured to send a parameter other than the parameter of the first subnet and the parameter of the second subnet in the fourth model to the server.
In an embodiment, the sending module 8052 is further configured to send only the parameter of the first subnet and the parameter of the second subnet in the fourth model to the server, and does not send a parameter other than the parameter of the first subnet and the parameter of the second subnet in the fourth model to the server.
An embodiment of this application further provides a chip system. The chip system is applied to an electronic device. The chip system includes one or more interface circuits and one or more processors. The interface circuit and the processor are interconnected through a line. The interface circuit is configured to receive a signal from a memory of the electronic device, and send the signal to the processor. The signal includes computer instructions stored in the memory. When the processor executes the computer instructions, the electronic device performs the model training method.
An embodiment of this application further provides a model training apparatus, including a processor and a memory. The memory is configured to store program code, and the processor is configured to invoke the program code to perform the model training method.
An embodiment of this application further provides a computer-readable storage medium. The computer-readable storage medium stores instructions. When the instructions are run on a computer or a processor, the computer or the processor is enabled to perform one or more operations in any one of the foregoing methods.
An embodiment of this application further provides a computer program product including instructions. When the computer program product runs on a computer or a processor, the computer or the processor is enabled to perform one or more operations in any one of the foregoing methods.
One of ordinary skilled in the art may clearly understand that, for the purpose of convenient and brief description, for detailed working processes of the foregoing system, apparatus, and unit, refer to corresponding processes in the foregoing method embodiments. Details are not described herein again.
It should be understood that unless otherwise specified, “I” in descriptions of this application indicates an “or” relationship between associated objects. For example, AB may indicate A or B. A and B may be singular or plural. In addition, in the descriptions of this application, “a plurality of” means two or more than two unless otherwise specified. “At least one of the following items (pieces)” or a similar expression thereof refers to any combination of these items, including any combination of singular items (pieces) or plural items (pieces). For example, at least one item (piece) of a, b, or c may indicate: a, b, c, a and b, a and c, b and c, or a, b, and c, where a, b, and c may be singular or plural. In addition, to clearly describe the technical solutions in embodiments of this application, terms such as first and second are used in embodiments of this application to distinguish between same items or similar items that provide basically same functions or purposes. One of ordinary skilled in the art may understand that the terms such as “first” and “second” do not limit a quantity or an execution sequence, and the terms such as “first” and “second” do not indicate a definite difference. In addition, in embodiments of this application, the word such as “example” or “for example” is used to represent giving an example, an illustration, or a description. Any embodiment or design scheme described as “example” or “for example” in embodiments of this application should not be explained as being more preferred or having more advantages than another embodiment or design scheme. Exactly, use of the word such as “example” or “for example” is intended to present a relative concept in a manner for ease of understanding.
In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, division into the units is merely logical function division and may be another division in actual implementation. For example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed. The displayed or discussed mutual couplings or direct couplings or communication connections may be implemented through some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, in other words, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected based on actual requirements to achieve the objectives of the solutions of embodiments.
All or some of the foregoing embodiments may be implemented by using software, hardware, firmware, or any combination thereof. When software is used to implement the embodiments, all or some of the embodiments may be implemented in a form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the procedures or functions according to embodiments of this application are all or partially generated. The computer may be a general-purpose computer, a dedicated computer, a computer network, or other programmable apparatuses. The computer instructions may be stored in a computer-readable storage medium, or transmitted by using the computer-readable storage medium. The computer instructions may be transmitted from a web site, computer, server, or data center to another web site, computer, server, or data center in a wired (for example, a coaxial cable, an optical fiber, or a digital subscriber line (DSL)) or wireless (for example, infrared, radio, or microwave) manner. The computer-readable storage medium may be any usable medium accessible by a computer, or a data storage device, such as a server or a data center, integrating one or more usable media. The usable medium may be a read-only memory (ROM), a random access memory (RAM), or a magnetic medium, for example, a floppy disk, a hard disk, a magnetic tape, a magnetic disk, or an optical medium, for example, a digital versatile disc (DVD), or a semiconductor medium, for example, a solid state disk (SSD).
The foregoing descriptions are merely implementations of embodiments of this application, but are not intended to limit the protection scope of embodiments of this application. Any variation or replacement within the technical scope disclosed in embodiments of this application shall fall within the protection scope of embodiments of this application. Therefore, the protection scope of embodiments of this application shall be subject to the protection scope of the claims.
Number | Date | Country | Kind |
---|---|---|---|
202110662048.9 | Jun 2021 | CN | national |
This application is a continuation of International Application No. PCT/CN2022/095802, filed on May 28, 2022, application claims priority to Chinese Patent Application No. 202110662048.9, filed on Jun. 15, 2021. The disclosures of the aforementioned applications are hereby incorporated by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2022/095802 | May 2022 | US |
Child | 18540144 | US |