This application relates to the field of communication technologies, and in particular, to a communication method and a related apparatus.
Distributed learning is a learning method for implementing collaborative learning. Specifically, a plurality of node devices obtain local models through training based on local data, and a central node device fuses the plurality of local models, to obtain a global model. In this way, collaborative learning is implemented on a premise that privacy of user data of the node devices is protected.
The plurality of node devices may respectively train their local models, to obtain related parameters of the local models, for example, weight parameters or weight gradients of the local models. Then, the plurality of node devices send the related parameters of the local models to the central node device. The central node device fuses the related parameters of the local models sent by the plurality of node devices, to obtain a related parameter of the global model, and delivers the related parameter to the node devices. The node devices may update the local models of the node devices based on the related parameter of the global model.
It can be learned from the foregoing technical solution that the node devices respectively send the related parameters of the local models to the central node device, resulting in a large amount of data reported by the node device and large communication overheads. Therefore, how the node devices report the related parameters of the local models with low communication overheads is an urgent problem to be resolved.
A first aspect of this application provides a communication method. The communication method may be performed by a first apparatus. The first apparatus may be a communication device, or may be a component (for example, a chip (or a system)) in the communication device. The communication method includes:
A first apparatus receives at least one quantization threshold from a second apparatus. Then, the first apparatus performs quantization on related information of a first model of the first apparatus based on the at least one quantization threshold. The first apparatus sends first information to the second apparatus, where the first information indicates quantized related information of the first model. This reduces communication overheads for reporting the related information of the first model by the first apparatus, and saves communication resources.
Based on the first aspect, in a possible implementation, the related information of the first model includes an output parameter or an update parameter of the first model, and the update parameter includes a weight gradient or a weight parameter of the first model. In this implementation, two possible parameters included in the related information of the first model are shown, so that the second apparatus fuses training results reported by all apparatuses, to obtain related information of a global model. In this application, models on all apparatuses may be understood as a same model. To distinguish between models on different apparatuses, on a first apparatus side, the model may be referred to as the first model, and on a second apparatus side, the model may be referred to as the global model.
Based on the first aspect, in a possible implementation, before the first apparatus receives the at least one quantization threshold from the second apparatus, the method further includes: The first apparatus sends second information to the second apparatus, where the second information indicates information obtained by processing the related information of the first model, or the second information indicates information obtained by processing related information obtained by performing an Mth round of training on the first model by the first apparatus, and the related information of the first model is related information obtained by performing a Qth round of training on the first model by the first apparatus, where M is an integer greater than or equal to 1 and less than Q, and Q is an integer greater than 1.
In this implementation, the first apparatus may send the second information to the second apparatus, so that the second apparatus determines the at least one quantization threshold. This helps the second apparatus determine an appropriate quantization threshold, and helps the first apparatus perform proper quantization on the related information of the first model, thereby reducing overheads for reporting the related information of the first model by the first apparatus while ensuring precision of the related information of the first model reported by the first apparatus.
Based on the first aspect, in a possible implementation, the related information of the first model includes the output parameter of the first model, and the information obtained by processing the related information of the first model includes an average value of absolute values of values of output parameters of the first model, or the related information of the first model includes the update parameter of the first model, and the information obtained by processing the related information of the first model includes an average value of absolute values of values of update parameters of the first model. In this implementation, two possible implementations of the related information of the first model are shown. The first apparatus may report the average value of the absolute values of the values of the output parameters of the first model or the average value of the absolute values of the values of the update parameters of the first model to the second apparatus. This helps the second apparatus determine the appropriate quantization threshold.
Based on the first aspect, in a possible implementation, the method further includes: The first apparatus receives third information from the second apparatus, where the third information indicates global information of the first model. In this implementation, the first apparatus may update or train the first model based on the global information of the first model.
Based on the first aspect, in a possible implementation, the global information of the first model includes a global output parameter of the first model, or the global information of the first model includes a global update parameter and/or a global learning rate of the first model.
In this implementation, two implementations of the global information of the first model are shown. For example, the global information of the first model includes the global output parameter of the first model, so that the first apparatus trains the first model based on the global output parameter. This helps improve training performance of the first model and improve accuracy of the first model. For example, the global information of the first model includes the global update parameter and/or the global learning rate of the first model, so that the first apparatus updates the first model based on the global update parameter and/or the global learning rate. This helps improve accuracy of the first model.
Based on the first aspect, in a possible implementation, the related information of the first model includes N parameters of the first model, and N is an integer greater than or equal to 1. That the first apparatus performs quantization on related information of a first model of the first apparatus based on the at least one quantization threshold includes: The first apparatus performs quantization on the N parameters based on the at least one quantization threshold, to obtain N quantized parameters. The first information includes the N quantized parameters. That the first apparatus sends first information to the second apparatus includes: The first apparatus modulates the N quantized parameters, to obtain N first signals, and the first apparatus sends the N first signals to the second apparatus.
In this implementation, the first information includes the N quantized parameters. The first apparatus may perform quantization on the N parameters of the first model, modulate the N quantized parameters, and then send the N first signals obtained through modulation, thereby sending the first information.
Based on the first aspect, in a possible implementation, the at least one quantization threshold includes a first quantization threshold and a second quantization threshold. That the first apparatus performs quantization on the N parameters based on the at least one quantization threshold, to obtain N quantized parameters includes: If an ith parameter in the N parameters is greater than the first quantization threshold, the first apparatus quantizes the ith parameter to a first value, where i is an integer greater than or equal to 1 and less than or equal to N, or if an ith parameter in the N parameters is less than or equal to the first quantization threshold and is greater than or equal to the second quantization threshold, the first apparatus quantizes the ith parameter to a second value, or if an ith parameter in the N parameters is less than the second quantization threshold, the first apparatus quantizes the ith parameter to a third value. In this implementation, a specific quantization process in which the first apparatus quantizes the ith parameter is shown, to facilitate implementation of this solution. Further, the at least one quantization threshold includes a plurality of quantization thresholds, so that precision of quantizing the parameter of the first model by the first apparatus is finer. This helps improve accuracy of updating the first model by the first apparatus and improve training performance of the first model.
Based on the first aspect, in a possible implementation, that the first apparatus modulates the N quantized parameters, to obtain N first signals includes: The first apparatus modulates an ith quantized parameter, to obtain an ith first signal, where the ith first signal corresponds to two sequences. When the ith quantized parameter is the first value, a transmit power at which the first apparatus sends a 1st sequence in the two sequences is less than a transmit power at which the first apparatus sends a 2nd sequence in the two sequences, when the ith quantized parameter is the second value, a transmit power at which the first apparatus sends a 1st sequence in the two sequences is equal to a transmit power at which the first apparatus sends a 2nd sequence in the two sequences, or when the ith quantized parameter is the third value, a transmit power at which the first apparatus sends a 1st sequence in the two sequences is greater than a transmit power at which the first apparatus sends a 2nd sequence in the two sequences.
In this implementation, the first apparatus modulates each of the N parameters of the first model to two sequences. The first apparatus controls a transmit power used to send each of the two sequences, so that the second apparatus determines a value of the parameter. The first apparatus does not need to perform channel estimation and equalization, so that corresponding pilot overheads are not required.
Based on the first aspect, in a possible implementation, when the ith quantized parameter is the first value, the 1st sequence in the two sequences is a non-all-0 sequence, and the 2nd sequence is an all-0 sequence, when the ith quantized parameter is the second value, the two sequences are both all-0 sequences, or when the ith quantized parameter is the third value, the 1st sequence in the two sequences is an all-0 sequence, and the 2nd sequence is a non-all-0 sequence. In this implementation, the first apparatus may use the all-0 sequence and/or the non-all-0 sequence to carry the ith quantized parameter. At a same total transmit power, this helps the second apparatus identify a value of the ith quantized parameter, thereby improving power utilization efficiency.
Based on the first aspect, in a possible implementation, that the first apparatus sends first information to the second apparatus includes:
The first apparatus sends the first information to the second apparatus L times, where L is an integer greater than or equal to 1. In this implementation, when the quantity L of sending times is greater than 1, the first apparatus repeatedly sends the first information. This helps the second apparatus select a decision result with a largest quantity of occurrences as a best decision result after separately making decisions, thereby reducing a probability of a decision error, and improving model training performance.
Based on the first aspect, in a possible implementation, the method further includes: The first apparatus receives first indication information from the second apparatus, where the first indication information indicates the quantity L of sending times that the first apparatus sends the first information to the second apparatus. In this implementation, the first apparatus may receive the quantity of sending times indicated by the second apparatus, and send the first information based on the quantity of sending times. This helps the second apparatus determine the quantity of sending times based on an actual requirement, thereby properly utilizing communication resources.
Based on the first aspect, in a possible implementation, the related information of the first model includes N parameters of the first model that are obtained through quantization error compensation, the N parameters obtained through quantization error compensation are obtained by performing error compensation for the N parameters by the first apparatus based on quantization errors respectively corresponding to the N parameters obtained by performing the Qth round of training on the first model by the first apparatus, and a quantization error corresponding to the ith parameter in the N parameters is determined based on an ith parameter obtained by performing a (Q−1)th round of training on the first model and performing quantization error compensation by the first apparatus, where i is an integer greater than or equal to 1 and less than or equal to N, N is an integer greater than or equal to 1, and Q is an integer greater than 1.
In this implementation, the first apparatus may first perform quantization error compensation for the N parameters of the first model, and then perform, based on the at least one quantization threshold, quantization on the N parameters obtained through quantization error compensation. This helps improve the accuracy of updating the first model by the first apparatus and improve the training performance of the first model.
Based on the first aspect, in a possible implementation, the related information of the first model includes N parameters of the first model that are obtained through sparsification, the N parameters of the first model that are obtained through sparsification are the N parameters selected by the first apparatus from K parameters of the first model based on a common sparse mask, and the K parameters of the first model are parameters obtained by performing one round of training on the first model by the first apparatus, where K is an integer greater than or equal to N, K is an integer greater than or equal to 1, and N is an integer greater than or equal to 1.
In this implementation, the first apparatus may first select the N parameters from the K parameters of the first model based on the common sparse mask, and then perform quantization on the N parameters based on the at least one quantization threshold. This helps reduce overheads generated when the first apparatus reports the parameter of the first model.
Based on the first aspect, in a possible implementation, the common sparse mask is a bit sequence, the bit sequence includes K bits, and the K bits one-to-one correspond to the K parameters. When a value of one bit in the K bits is 0, it indicates the first apparatus not to select a parameter corresponding to the bit, or when a value of one bit in the K bits is 1, it indicates the first apparatus to select a parameter corresponding to the bit. In this implementation, a specific form of the common sparse mask is provided. The first apparatus selects parameters based on values of bits in the bit sequence, so that an operation is simple and convenient. This reduces overheads for reporting the parameter of the first model by the first apparatus, and reduces occupation of communication resources.
Based on the first aspect, in a possible implementation, the common sparse mask is determined by the first apparatus based on a sparsity ratio and a pseudo-random number, and the sparsity ratio is indicated by the second apparatus to the first apparatus. In this implementation, a manner of generating the common sparse mask is provided, to facilitate implementation of the solution. Therefore, the first apparatus reports some parameters of the first model based on the common sparse mask, thereby reducing the overheads generated when the first apparatus reports the parameter of the first model.
Based on the first aspect, in a possible implementation, the method further includes: The first apparatus receives second indication information from the second apparatus, where the second indication information indicates the common sparse mask. In this implementation, the first apparatus selects the N parameters from the K parameters of the first model based on the common sparse mask. This helps reduce the overheads generated when the first apparatus reports the parameter of the first model.
Based on the first aspect, in a possible implementation, the method further includes: The first apparatus sends third indication information to the second apparatus, where the third indication information indicates indexes of the N parameters whose absolute values of corresponding values are largest and that are in the K parameters.
In this implementation, the first apparatus may indicate, to the second apparatus, the indexes of the N parameters whose absolute values of corresponding values are largest and that are in the K parameters of the first apparatus. This helps the second apparatus determine the appropriate common sparse mask. The third indication information indicates the indexes of the N parameters whose absolute values of the corresponding values are largest and that are in the K parameters. This helps the first apparatus preferentially feed back a parameter with a large change subsequently, thereby improving model training accuracy and improving model training performance.
Based on the first aspect, in a possible implementation, the first model is a neural network model, and the related information of the first model includes related parameters of neurons at P layers of the neural network model, where P is an integer greater than or equal to 1. In this implementation, the first apparatus may report a parameter of one or more layers of the neural network model. In other words, the first apparatus reports the parameter of the neural network model in a unit of a layer of the neural network model. This helps the first apparatus accurately report the parameter of each layer, thereby improving model training accuracy.
A second aspect of this application provides a communication method. The communication method may be performed by a second apparatus. The second apparatus may be a communication device, or may be a component (for example, a chip (or a system)) in the communication device. The communication method includes:
A second apparatus sends at least one quantization threshold to a first apparatus, where the at least one quantization threshold is used to perform quantization on related information of a first model of the first apparatus. The second apparatus receives first information sent from the first apparatus, where the first information indicates quantized related information of the first model. It can be learned from the foregoing technical solution that, this helps reduce communication overheads for reporting the related information of the first model by the first apparatus, and saves communication resources.
Based on the second aspect, in a possible implementation, the related information of the first model includes an output parameter or an update parameter of the first model, and the update parameter includes a weight gradient or a weight parameter of the first model. In this implementation, two possible parameters included in the related information of the first model are shown, so that the second apparatus fuses training results reported by all apparatuses, to obtain related information of a global model. In this application, models on all apparatuses may be understood as a same model. To distinguish between models on different apparatuses, on a first apparatus side, the model may be referred to as the first model, and on a second apparatus side, the model may be referred to as the global model.
Based on the second aspect, in a possible implementation, the method further includes:
The second apparatus receives second information from the first apparatus, where the second information indicates information obtained by processing the related information of the first model, or the second information indicates information obtained by performing an Mth round of training on the first model and performing processing by the first apparatus, and the related information of the first model is related information obtained by performing a Qth round of training on the first model by the first apparatus, where M is an integer greater than or equal to 1 and less than Q, and Q is an integer greater than 1. The second apparatus determines the at least one quantization threshold based on the second information.
In this implementation, the second apparatus receives the second information from the first apparatus, so that the second apparatus determines the at least one quantization threshold based on the second information. This helps the second apparatus determine an appropriate quantization threshold, and helps the first apparatus perform proper quantization on the related information of the first model, thereby reducing overheads for reporting the related information of the first model by the first apparatus while ensuring precision of the related information of the first model reported by the first apparatus.
Based on the second aspect, in a possible implementation, the related information of the first model includes the output parameter of the first model, and the information obtained by processing the related information of the first model includes an average value of absolute values of values of output parameters of the first model, or the related information of the first model includes the update parameter of the first model, and the information obtained by processing the related information of the first model includes an average value of absolute values of values of update parameters of the first model. In this implementation, two possible implementations of the related information of the first model are shown. The second apparatus may receive, from the first apparatus, the average value of the absolute values of the values of the output parameters of the first model or the average value of the absolute values of the values of the update parameters of the first model. This helps the second apparatus determine the appropriate quantization threshold.
Based on the second aspect, in a possible implementation, the method further includes: The second apparatus receives third information from a third apparatus, where the third information indicates information obtained by processing related information of a second model of the third apparatus, or the third information indicates information obtained by performing an Sth round of training on the second model and performing processing by the third apparatus, and the related information of the second model is related information obtained by performing an Rth round of training on the second model by the third apparatus, where S is an integer greater than or equal to 1 and less than R, and R is an integer greater than 1. That the second apparatus determines the at least one quantization threshold based on the second information includes: The second apparatus determines the at least one quantization threshold based on the second information and the third information.
In this implementation, the second apparatus may further receive the third information from the third apparatus, and determine the at least one quantization threshold based on the second information and the third information. This helps the second apparatus determine the appropriate quantization threshold, thereby reducing overheads for reporting the related information of the first model by the first apparatus while ensuring precision of the related information of the first model reported by the first apparatus.
Based on the second aspect, in a possible implementation, the method further includes: The second apparatus determines global information of the first model based on the first information, and the second apparatus sends fourth information to the first apparatus, where the fourth information indicates the global information of the first model. In this implementation, the second apparatus may determine the global information of the first model based on the first information, and send the global information of the first model to the first apparatus. In this way, the first apparatus updates or trains the first model.
Based on the second aspect, in a possible implementation, the global information of the first model includes a global output parameter of the first model, or the global information of the first model includes a global update parameter and/or a global learning rate of the first model.
In this implementation, two implementations of the global information of the first model are shown. For example, the global information of the first model includes the global output parameter of the first model, so that the first apparatus trains the first model based on the global output parameter. This helps improve training performance of the first model and improve accuracy of the first model. For example, the global information of the first model includes the global update parameter and/or the global learning rate of the first model, so that the first apparatus updates the first model based on the global update parameter and/or the global learning rate. This helps improve accuracy of the first model.
Based on the second aspect, in a possible implementation, the method further includes: The second apparatus receives fifth information from the third apparatus, where the fifth information indicates the related information of the second model of the third apparatus. That the second apparatus determines global information of the first model based on the first information includes: The second apparatus determines the global information of the first model based on the first information and the fifth information. In this implementation, the second apparatus may further receive the fifth information from the third apparatus, and determine the global information of the first model based on the first information and the fifth information. This helps improve accuracy of determining the global information of the first model by the second apparatus and improve accuracy of updating the model.
Based on the second aspect, in a possible implementation, the related information of the first model includes N parameters of the first model, and N is an integer greater than or equal to 1. The related information of the second model includes N parameters of the second model, and the first information includes N quantized parameters of the first model. That the second apparatus receives first information sent from the first apparatus includes: The second apparatus receives N first signals from the first apparatus, where the N first signals carry the N quantized parameters of the first model, and the N first signals one-to-one correspond to the N quantized parameters of the first model. The fifth information includes N quantized parameters of the second model. That the second apparatus receives fifth information from the third apparatus includes: The second apparatus receives N second signals from the third apparatus, where the N second signals carry the N quantized parameters of the second model, and the N second signals one-to-one correspond to the N quantized parameters of the second model. That the second apparatus determines the global information of the first model based on the first information and the fifth information includes: The second apparatus determines the global information of the first model based on the N first signals and the N second signals.
Based on the second aspect, in a possible implementation, an ith first signal in the N first signals corresponds to a first sequence and a second sequence, an ith second signal in the N second signals corresponds to a third sequence and a fourth sequence, a time-frequency resource used by the first apparatus to send the first sequence is the same as a time-frequency resource used by the third apparatus to send the third sequence, a time-frequency resource used by the first apparatus to send the second sequence is the same as a time-frequency resource used by the third apparatus to send the fourth sequence, and the global information of the first model includes N global parameters of the first model, where i is an integer greater than or equal to 1 and less than or equal to N. That the second apparatus determines the global information of the first model based on the N first signals and the N second signals includes: The second apparatus determines a first signal energy sum of the first sequence and the third sequence that are received by the second apparatus, the second apparatus determines a second signal energy sum of the second sequence and the fourth sequence that are received by the second apparatus, and the second apparatus determines an ith global parameter in the N global parameters based on the first signal energy sum and the second signal energy sum. It can be learned that the second apparatus may determine the ith global parameter based on the signal energy of the two sequences that correspond to the ith first signal and that are received by the second apparatus and the signal energy of the two sequences that correspond to the ith second signal and that are received by the second apparatus. In this way, the second apparatus is supported in implementing incoherent reception in multi-user over-the-air signal superimposition transmission and implementing robustness to a fading channel.
Based on the second aspect, in a possible implementation, that the second apparatus determines an ith global parameter in the N global parameters based on the first signal energy sum and the second signal energy sum includes: If a sum of the first signal energy sum and a decision threshold is less than the second signal energy sum, the second apparatus determines a value of the ith global parameter as a first value, or if a sum of the first signal energy sum and the decision threshold is greater than or equal to the second signal energy sum, and a sum of the second signal energy sum and the decision threshold is greater than or equal to the first signal energy sum, the second apparatus determines a value of the ith global parameter as a second value, or if a sum of the second signal energy sum and the decision threshold is less than the first signal energy sum, the second apparatus determines a value of the ith global parameter as a third value.
In this implementation, the process in which the second apparatus determines the ith global parameter is shown. It can be learned from the foregoing description that the three possible conditions of the first signal energy sum and the second signal energy sum correspond to the three decision results of the ith global parameter. Therefore, the ith global parameter is accurately determined. This helps improve accuracy of updating the first model by the first apparatus and improve training performance of the first model.
Based on the second aspect, in a possible implementation, the method further includes: The second apparatus sends first indication information to the first apparatus, where the first indication information indicates a quantity L of sending times that the first apparatus sends the first information to the second apparatus, and L is an integer greater than or equal to 1. In this implementation, the second apparatus indicates, to the first apparatus, the quantity of sending times of sending the first information, so that the first apparatus sends the first information based on the quantity of sending times. This helps the second apparatus determine the quantity of sending times based on an actual requirement, thereby properly utilizing communication resources.
Based on the second aspect, in a possible implementation, the method further includes: The second apparatus sends second indication information to the first apparatus, where the second indication information indicates a common sparse mask, and the common sparse mask indicates the first apparatus to report some parameters obtained by training the first model by the first apparatus. In this implementation, the second apparatus sends the second indication information to the first apparatus, where the second indication information indicates the common sparse mask. Therefore, the first apparatus selects the N parameters from the K parameters of the first model based on the common sparse mask. This helps reduce overheads generated when the first apparatus reports the parameter of the first model.
Based on the second aspect, in a possible implementation, the method further includes: The second apparatus receives third indication information from the first apparatus, where the third indication information indicates indexes of the N parameters whose absolute values of corresponding values are largest and that are in K parameters obtained by performing one round of training on the first model by the first apparatus. The second apparatus receives fourth indication information from the third apparatus, where the fourth indication information indicates indexes of the N parameters whose absolute values of corresponding values are largest and that are in K parameters of the second model of the third apparatus, and the K parameters of the second model are K parameters obtained by performing one round of training on the second model by the third apparatus. The second apparatus determines the common sparse mask based on the third indication information and the fourth indication information. In this implementation, each apparatus indicates the index of the parameter whose absolute value of the corresponding value is the largest in the K parameters of the apparatus. This helps the second apparatus determine the appropriate common sparse mask based on the third indication information and the fourth indication information. In this way, the first apparatus can preferentially feed back a parameter with a large change based on the common sparse mask, thereby improving model training accuracy and model training performance.
A third aspect of this application provides a communication method. The communication method may be performed by a first apparatus. The first apparatus may be a communication device, or may be a component (for example, a chip (or a system)) in the communication device. The communication method includes:
A first apparatus sends first indication information to a second apparatus, where the first indication information indicates indexes of N parameters whose absolute values of corresponding values are largest and that are in K parameters of a first model of the first apparatus, and the K parameters of the first model are K parameters obtained by performing one round of training on the first model by the first apparatus, where K is an integer greater than or equal to N, K is an integer greater than or equal to 1, and N is an integer greater than or equal to 1. Then, the first apparatus receives second indication information from the second apparatus. The second indication information indicates a common sparse mask, the common sparse mask is determined by the second apparatus based on the first indication information, and the common sparse mask indicates the first apparatus to report some parameters obtained by training the first model by the first apparatus.
In the foregoing technical solution, the first apparatus may report the first indication information to the second apparatus, to indicate the indexes of the N parameters whose absolute values of the corresponding values are largest and that are in the K parameters of the first model. Therefore, the second apparatus determines the appropriate common sparse mask based on the first indication information. The first apparatus receives the second indication information from the second apparatus. The second indication information indicates the common sparse mask. Therefore, the first apparatus can preferentially feed back a parameter with a large change based on the common sparse mask. This helps reduce overheads generated when the first apparatus reports the parameter of the first model, improves model training accuracy, and improves model training performance.
A fourth aspect of this application provides a communication method. The communication method may be performed by a second apparatus. The second apparatus may be a communication device, or may be a component (for example, a chip (or a system)) in the communication device. The communication method includes:
A second apparatus receives first indication information from a first apparatus, where the first indication information indicates indexes of N parameters whose absolute values of corresponding values are largest and that are in K parameters of a first model of the first apparatus, and the K parameters of the first model are K parameters obtained by performing one round of training on the first model by the first apparatus, where K is an integer greater than or equal to N, K is an integer greater than or equal to 1, and N is an integer greater than or equal to 1. The second apparatus determines a common sparse mask based on the first indication information, where the common sparse mask indicates the first apparatus to report some parameters obtained by training the first model by the first apparatus. Then, the second apparatus sends second indication information to the first apparatus, where the second indication information indicates the common sparse mask.
In the foregoing technical solution, the second apparatus receives the first indication information from the first apparatus, where the first indication information indicates the indexes of the N parameters whose absolute values of the corresponding values are largest and that are in the K parameters of the first model. Therefore, the second apparatus can determine the appropriate common sparse mask based on the first indication information. The first apparatus can preferentially feed back a parameter with a large change based on the common sparse mask. This reduces overheads generated when the first apparatus reports the parameter of the first model, improves model training accuracy, and improves model training performance.
Based on the fourth aspect, in a possible implementation, the method further includes: The second apparatus receives third indication information from a third apparatus, where the third indication information indicates indexes of N parameters whose absolute values of corresponding values are largest and that are in K parameters of a second model of the third apparatus, and the K parameters of the second model are K parameters obtained by performing one round of training on the second model by the third apparatus. That the second apparatus determines a common sparse mask based on the first indication information includes: The second apparatus determines the common sparse mask based on the first indication information and the third indication information.
In this implementation, the second apparatus may further determine the common sparse mask based on the third indication information reported by the third apparatus, so that the second apparatus determines the appropriate common sparse mask for the first apparatus. The first apparatus can preferentially feed back the parameter with the large change based on the common sparse mask, thereby improving the model training accuracy and the model training performance.
A fifth aspect of this application provides a first apparatus, including:
Based on the fifth aspect, in a possible implementation, the related information of the first model includes an output parameter or an update parameter of the first model, and the update parameter includes a weight gradient or a weight parameter of the first model.
Based on the fifth aspect, in a possible implementation, the transceiver module is further configured to send second information to the second apparatus, where the second information indicates information obtained by processing the related information of the first model, or the second information indicates information obtained by processing related information obtained by performing an Mth round of training on the first model by the first apparatus, and the related information of the first model is related information obtained by performing a Qth round of training on the first model by the first apparatus, where M is an integer greater than or equal to 1 and less than Q, and Q is an integer greater than 1.
Based on the fifth aspect, in a possible implementation, the related information of the first model includes the output parameter of the first model, and the information obtained by processing the related information of the first model includes an average value of absolute values of values of output parameters of the first model, or the related information of the first model includes the update parameter of the first model, and the information obtained by processing the related information of the first model includes an average value of absolute values of values of update parameters of the first model.
Based on the fifth aspect, in a possible implementation, the transceiver module is further configured to receive third information from the second apparatus, where the third information indicates global information of the first model.
Based on the fifth aspect, in a possible implementation, the global information of the first model includes a global output parameter of the first model, or the global information of the first model includes a global update parameter and/or a global learning rate of the first model.
Based on the fifth aspect, in a possible implementation, the related information of the first model includes N parameters of the first model, and N is an integer greater than or equal to 1. The processing module is specifically configured to perform quantization on the N parameters based on the at least one quantization threshold, to obtain N quantized parameters, where the first information includes the N quantized parameters. The transceiver module is specifically configured to modulate the N quantized parameters, to obtain N first signals, and send the N first signals to the second apparatus.
Based on the fifth aspect, in a possible implementation, the at least one quantization threshold includes a first quantization threshold and a second quantization threshold. The processing module is specifically configured to if an ith parameter in the N parameters is greater than the first quantization threshold, quantize the ith parameter to a first value, where i is an integer greater than or equal to 1 and less than or equal to N, or if an ith parameter in the N parameters is less than or equal to the first quantization threshold and is greater than or equal to the second quantization threshold, quantize the ith parameter to a second value, or if an ith parameter in the N parameters is less than the second quantization threshold, quantize the ith parameter to a third value.
Based on the fifth aspect, in a possible implementation, the transceiver module is specifically configured to modulate an ith quantized parameter, to obtain an ith first signal, where the ith first signal corresponds to two sequences. When the ith quantized parameter is the first value, a transmit power at which the first apparatus sends a 1st sequence in the two sequences is less than a transmit power at which the first apparatus sends a 2nd sequence in the two sequences, when the ith quantized parameter is the second value, a transmit power at which the first apparatus sends a 1st sequence in the two sequences is equal to a transmit power at which the first apparatus sends a 2nd sequence in the two sequences, or when the ith quantized parameter is the third value, a transmit power at which the first apparatus sends a 1st sequence in the two sequences is greater than a transmit power at which the first apparatus sends a 2nd sequence in the two sequences.
Based on the fifth aspect, in a possible implementation, when the ith quantized parameter is the first value, the 1st sequence in the two sequences is a non-all-0 sequence, and the 2nd sequence is an all-0 sequence, when the ith quantized parameter is the second value, the two sequences are both all-0 sequences, or when the ith quantized parameter is the third value, the 1st sequence in the two sequences is an all-0 sequence, and the 2nd sequence is a non-all-0 sequence.
Based on the fifth aspect, in a possible implementation, the transceiver module is specifically configured to send the first information to the second apparatus L times, where L is an integer greater than or equal to 1.
Based on the fifth aspect, in a possible implementation, the transceiver module is further configured to receive first indication information from the second apparatus, where the first indication information indicates the quantity L of sending times that the first apparatus sends the first information to the second apparatus.
Based on the fifth aspect, in a possible implementation, the related information of the first model includes N parameters of the first model that are obtained through quantization error compensation, the N parameters obtained through quantization error compensation are obtained by performing error compensation for the N parameters by the first apparatus based on quantization errors respectively corresponding to the N parameters obtained by performing the Qth round of training on the first model by the first apparatus, and Q is the integer greater than 1. A quantization error corresponding to the ith parameter in the N parameters is determined based on an ith parameter obtained by performing a (Q−1)th round of training on the first model and performing quantization error compensation by the first apparatus.
Based on the fifth aspect, in a possible implementation, the related information of the first model includes N parameters of the first model that are obtained through sparsification, the N parameters of the first model that are obtained through sparsification are the N parameters selected by the first apparatus from K parameters of the first model based on a common sparse mask, and the K parameters of the first model are parameters obtained by performing the Qth round of training on the first model by the first apparatus, where K is an integer greater than or equal to N, and K is an integer greater than or equal to 1.
Based on the fifth aspect, in a possible implementation, the common sparse mask is a bit sequence, the bit sequence includes K bits, and the K bits one-to-one correspond to the K parameters. When a value of one bit in the K bits is 0, it indicates the first apparatus not to select a parameter corresponding to the bit, or when a value of one bit in the K bits is 1, it indicates the first apparatus to select a parameter corresponding to the bit.
Based on the fifth aspect, in a possible implementation, the common sparse mask is determined by the first apparatus based on a sparsity ratio and a pseudo-random number, and the sparsity ratio is indicated by the second apparatus to the first apparatus.
Based on the fifth aspect, in a possible implementation, the transceiver module is further configured to receive second indication information from the second apparatus, where the second indication information indicates the common sparse mask.
Based on the fifth aspect, in a possible implementation, the transceiver module is further configured to send third indication information to the second apparatus, where the third indication information indicates indexes of the N parameters whose absolute values of corresponding values are largest and that are in the K parameters.
Based on the fifth aspect, in a possible implementation, the first model is a neural network model, and the related information of the first model includes related parameters of neurons at P layers of the neural network model, where P is an integer greater than or equal to 1.
A sixth aspect of this application provides a second apparatus, including:
Based on the sixth aspect, in a possible implementation, the related information of the first model includes an output parameter or an update parameter of the first model, and the update parameter includes a weight gradient or a weight parameter of the first model.
Based on the sixth aspect, in a possible implementation, the transceiver module is further configured to receive second information from the first apparatus, where the second information indicates information obtained by processing the related information of the first model, or the second information indicates information obtained by performing an Mth round of training on the first model and performing processing by the first apparatus, and the related information of the first model is related information obtained by performing a Qth round of training on the first model by the first apparatus, where M is an integer greater than or equal to 1 and less than Q, and Q is an integer greater than 1. The second apparatus further includes a processing module. The processing module is configured to determine the at least one quantization threshold based on the second information.
Based on the sixth aspect, in a possible implementation, the related information of the first model includes the output parameter of the first model, and the information obtained by processing the related information of the first model includes an average value of absolute values of values of output parameters of the first model, or the related information of the first model includes the update parameter of the first model, and the information obtained by processing the related information of the first model includes an average value of absolute values of values of update parameters of the first model.
Based on the sixth aspect, in a possible implementation, the transceiver module is further configured to receive third information from a third apparatus, where the third information indicates information obtained by processing related information of a second model of the third apparatus, or the third information indicates information obtained by performing an Sth round of training on the second model and performing processing by the third apparatus, and the related information of the second model is related information obtained by performing an Rth round of training on the second model by the third apparatus, where S is an integer greater than or equal to 1 and less than R, and R is an integer greater than 1. The processing module is configured to determine the at least one quantization threshold based on the second information and the third information.
Based on the sixth aspect, in a possible implementation, the processing module is further configured to determine global information of the first model based on the first information, and the transceiver module is further configured to send fourth information to the first apparatus, where the fourth information indicates the global information of the first model.
Based on the sixth aspect, in a possible implementation, the global information of the first model includes a global output parameter of the first model, or the global information of the first model includes a global update parameter and/or a global learning rate of the first model.
Based on the sixth aspect, in a possible implementation, the transceiver module is further configured to receive fifth information from the third apparatus, where the fifth information indicates the related information of the second model of the third apparatus, and the processing module is specifically configured to determine the global information of the first model based on the first information and the fifth information.
Based on the sixth aspect, in a possible implementation, the related information of the first model includes N parameters of the first model, and N is an integer greater than or equal to 1. The related information of the second model includes N parameters of the second model, and the first information includes N quantized parameters of the first model. The transceiver module is specifically configured to receive N first signals from the first apparatus, where the N first signals carry the N parameters of the first model, and the N first signals one-to-one correspond to the N quantized parameters of the first model. The fifth information includes N quantized parameters of the second model. The transceiver module is specifically configured to receive N second signals from the third apparatus, where the N second signals carry the N quantized parameters of the second model, and the N second signals one-to-one correspond to the N quantized parameters of the second model. The processing module is specifically configured to determine the global information of the first model based on the N first signals and the N second signals.
Based on the sixth aspect, in a possible implementation, an ith first signal in the N first signals corresponds to a first sequence and a second sequence, an ith second signal in the N second signals corresponds to a third sequence and a fourth sequence, a time-frequency resource used by the first apparatus to send the first sequence is the same as a time-frequency resource used by the third apparatus to send the third sequence, a time-frequency resource used by the first apparatus to send the second sequence is the same as a time-frequency resource used by the third apparatus to send the fourth sequence, and the global information of the first model includes N global parameters of the first model, where i is an integer greater than or equal to 1 and less than or equal to N. The processing module is specifically configured to determine a first signal energy sum of the first sequence and the third sequence that are received by the second apparatus, determine a second signal energy sum of the second sequence and the fourth sequence that are received by the second apparatus, and determine an ith global parameter in the N global parameters based on the first signal energy sum and the second signal energy sum.
Based on the sixth aspect, in a possible implementation, the processing module is specifically configured to if a sum of the first signal energy sum and a decision threshold is less than the second signal energy sum, determine a value of the ith global parameter as a first value, or if a sum of the first signal energy sum and the decision threshold is greater than or equal to the second signal energy sum, and a sum of the second signal energy sum and the decision threshold is greater than or equal to the first signal energy sum, determine a value of the ith global parameter as a second value, or if a sum of the second signal energy sum and the decision threshold is less than the first signal energy sum, determine a value of the ith global parameter as a third value.
Based on the sixth aspect, in a possible implementation, the transceiver module is further configured to send first indication information to the first apparatus, where the first indication information indicates a quantity L of sending times that the first apparatus sends the first information to the second apparatus, where L is an integer greater than or equal to 1.
Based on the sixth aspect, in a possible implementation, the transceiver module is further configured to send second indication information to the first apparatus, where the second indication information indicates a common sparse mask, and the common sparse mask indicates the first apparatus to report some parameters obtained by training the first model by the first apparatus.
Based on the sixth aspect, in a possible implementation, the transceiver module is further configured to receive third indication information from the first apparatus, where the third indication information indicates indexes of the N parameters whose absolute values of corresponding values are largest and that are in K parameters obtained by performing one round of training on the first model by the first apparatus, and receive fourth indication information from the third apparatus, where the fourth indication information indicates indexes of the N parameters whose absolute values of corresponding values are largest and that are in K parameters of the second model of the third apparatus, and the K parameters of the second model are K parameters obtained by performing one round of training on the second model by the third apparatus. The second apparatus further includes the processing module. The processing module is further configured to determine the common sparse mask based on the third indication information and the fourth indication information.
A seventh aspect of this application provides a first apparatus, including:
An eighth aspect of this application provides a second apparatus, including:
a transceiver module, configured to receive first indication information from a first apparatus, where the first indication information indicates indexes of N parameters whose absolute values of corresponding values are largest and that are in K parameters of a first model of the first apparatus, and the K parameters of the first model are K parameters obtained by performing one round of training on the first model by the first apparatus, where K is an integer greater than or equal to N, K is an integer greater than or equal to 1, and N is an integer greater than or equal to 1, and a processing module, configured to determine a common sparse mask based on the first indication information, where the common sparse mask indicates the first apparatus to report some parameters obtained by training the first model by the first apparatus.
The transceiver module is further configured to send second indication information to the first apparatus, where the second indication information indicates the common sparse mask.
Based on the eighth aspect, in a possible implementation, the transceiver module is further configured to receive third indication information from a third apparatus, where the third indication information indicates indexes of N parameters whose absolute values of corresponding values are largest and that are in K parameters of a second model of the third apparatus, and the K parameters of the second model are K parameters obtained by performing one round of training on the second model by the third apparatus.
The processing module is specifically configured to determine the common sparse mask based on the first indication information and the third indication information.
For the fifth aspect or the seventh aspect, the first apparatus may be a communication device, the transceiver module may be a transceiver or an input/output interface, and the processing module may be a processor.
In another implementation, the first apparatus is a chip, a chip system, or a circuit disposed in the communication device. When the first apparatus is the chip, the chip system, or the circuit disposed in the communication device, the transceiver module may be an input/output interface, an interface circuit, an output circuit, an input circuit, a pin, a related circuit, or the like on the chip, the chip system, or the circuit. The processing module may be a processor, a processing circuit, a logic circuit, or the like.
For the sixth aspect or the eighth aspect, the second apparatus may be a communication device, the transceiver module may be a transceiver or an input/output interface, and the processing module may be a processor.
In another implementation, the second apparatus is a chip, a chip system, or a circuit disposed in a communication device. When the second apparatus is the chip, the chip system, or the circuit disposed in the communication device, the transceiver module may be an input/output interface, an interface circuit, an output circuit, an input circuit, a pin, a related circuit, or the like on the chip, the chip system, or the circuit. The processing module may be a processor, a processing circuit, a logic circuit, or the like.
A ninth aspect of this application provides a first apparatus. The first apparatus includes a processor and a memory. The memory stores a computer program or computer instructions, and the processor is configured to invoke and run the computer program or the computer instructions stored in the memory, so that the processor implements any one of the implementations of the first aspect or the third aspect.
Optionally, the first apparatus further includes a transceiver, and the processor is configured to control the transceiver to receive or send a signal.
A tenth aspect of this application provides a second apparatus. The second apparatus includes a processor and a memory. The memory stores a computer program or computer instructions, and the processor is configured to invoke and run the computer program or the computer instructions stored in the memory, so that the processor implements any one of the implementations of the second aspect or the fourth aspect.
Optionally, the second apparatus further includes a transceiver, and the processor is configured to control the transceiver to receive or send a signal.
An eleventh aspect of this application provides a first apparatus, including a processor and an interface circuit. The processor is configured to communicate with another apparatus through the interface circuit, and perform the method in either the first aspect or the third aspect. There are one or more processors.
A twelfth aspect of this application provides a second apparatus, including a processor and an interface circuit. The processor is configured to communicate with another apparatus through the interface circuit, and perform the method in either the second aspect or the fourth aspect. There are one or more processors.
A thirteenth aspect of this application provides a first apparatus, including a processor, configured to connect to a memory, and invoke a program stored in the memory, to perform the method in either the first aspect or the third aspect. The memory may be located inside the first apparatus, or may be located outside the first apparatus. There are one or more processors.
A fourteenth aspect of this application provides a second apparatus, including a processor, configured to connect to a memory, and invoke a program stored in the memory, to perform the method in either the second aspect or the fourth aspect. The memory may be located inside the second apparatus, or may be located outside the second apparatus. There are one or more processors.
In an implementation, the first apparatus in the fifth aspect, the seventh aspect, the ninth aspect, the eleventh aspect, and the thirteenth aspect may be a chip (or a system).
In an implementation, the second apparatus in the sixth aspect, the eighth aspect, the tenth aspect, the twelfth aspect, and the fourteenth aspect may be a chip (or a system).
A fifteenth aspect of this application provides a computer program product including instructions. When the computer program product runs on a computer, the computer is enabled to perform any implementation of any one of the first aspect to the fourth aspect.
A sixteenth aspect of this application provides a computer-readable storage medium, including computer instructions. When the instructions are run on a computer, the computer is enabled to perform any implementation of any one of the first aspect to the fourth aspect.
A seventeenth aspect of this application provides a chip apparatus, including a processor, configured to invoke a computer program or computer instructions in a memory, so that the processor performs any implementation of any one of the first aspect to the fourth aspect.
Optionally, the processor is coupled to the memory through an interface.
An eighteenth aspect of this application provides a communication system. The communication system includes the first apparatus in the fifth aspect and the second apparatus in the sixth aspect, or the communication system includes the first apparatus in the seventh aspect and the second apparatus in the eighth aspect.
According to the foregoing technical solutions, it can be learned that embodiments of this application have the following advantages.
In the foregoing technical solution, the first apparatus receives the at least one quantization threshold from the second apparatus. Then, the first apparatus performs quantization on the related information of the first model of the first apparatus based on the at least one quantization threshold. The first apparatus sends the first information to the second apparatus, where the first information indicates the quantized related information of the first model. This reduces communication overheads for reporting the related information of the first model by the first apparatus, and saves communication resources.
Embodiments of this application provide a communication method and a related apparatus, to reduce communication overheads for reporting related information of a first model by a first apparatus, and save communication resources.
The following clearly describes technical solutions in embodiments of this application with reference to accompanying drawings in embodiments of this application. It is clear that the described embodiments are merely a part rather than all of embodiments of this application. All other embodiments obtained by a person skilled in the art based on embodiments of this application without creative efforts shall fall within the protection scope of this application.
Reference to “an embodiment”, “some embodiments”, or the like described in this application means that one or more embodiments of this application include a specific feature, structure, or characteristic described with reference to embodiments. Therefore, statements such as “in one embodiment”, “in some embodiments”, “in some other embodiments”, and “in other embodiments” that appear at different places in this specification do not necessarily mean referring to a same embodiment. Instead, the statements mean “one or more but not all of embodiments”, unless otherwise specifically emphasized in another manner. The terms “include”, “contain”, “have”, and variants thereof all mean “include but are not limited to”, unless otherwise specifically emphasized in another manner.
In descriptions of this application, unless otherwise specified, “/” means “or”. For example, A/B may indicate A or B. A term “and/or” in this specification describes only an association relationship between associated objects and indicates that there may be three relationships. For example, A and/or B may indicate the following three cases: Only A exists, both A and B exist, and only B exists. In addition, “at least one” means one or more, and “a plurality of” means two or more. “At least one of the following items (pieces)” or a similar expression thereof means any combination of these items, including a singular item (piece) or any combination of plural items (pieces). For example, at least one item (piece) of a, b, or c may indicate a, b, c, a and b, a and c, b and c, or a, b, and c, where a, b, and c may be singular or plural.
It may be understood that, in this application, an “indication” may include a direct indication, an indirect indication, an explicit indication, and an implicit indication. When a piece of indication information is described as indicating A, it may be understood that the indication information carries A, directly indicates A, or indirectly indicates A.
In this application, information indicated by the indication information is referred to as to-be-indicated information. In a specific implementation process, there are many manners of indicating the to-be-indicated information. For example, the manners include but are not limited to a manner in which the to-be-indicated information, for example, the to-be-indicated information itself or an index of the to-be-indicated information, may be directly indicated, or a manner in which the to-be-indicated information may be indirectly indicated by indicating other information. There is an association relationship between the other information and the to-be-indicated information. Alternatively, only a part of the to-be-indicated information may be indicated, and the other part of the to-be-indicated information is known or pre-agreed on. For example, specific information may alternatively be indicated in an arrangement sequence of a plurality of pieces of information that is pre-agreed on (for example, specified in a protocol), to reduce indication overheads to some extent.
The to-be-indicated information may be sent as a whole, or may be divided into a plurality of pieces of sub-information for separate sending. In addition, sending periodicities and/or sending occasions of these pieces of sub-information may be the same or may be different. A specific sending method is not limited in this application. The sending periodicities and/or the sending occasions of these pieces of sub-information may be predefined, for example, predefined according to a protocol, or may be configured by sending configuration information to a receive end device by a transmit end device.
The technical solutions of this application may be applied to a cellular communication system related to the 3rd generation partnership project (3GPP), for example, a 4th generation (4G) communication system, a 5th generation (5G) communication system, or a communication system after the 5th generation communication system, and for example, a 6th generation communication system. For example, the 4th generation communication system may include a long term evolution (LTE) communication system. The 5th generation communication system may include a new radio (NR) communication system. The technical solutions of this application may also be applied to a wireless fidelity (Wi-Fi) system, a communication system that supports convergence of a plurality of wireless technologies, a device-to-device (D2D) system, a vehicle-to-everything (V2X) communication system, and the like.
The communication system to which the technical solutions of this application are applicable includes a first apparatus and a second apparatus. Optionally, the communication system further includes a third apparatus.
The following describes some possible forms of the first apparatus and the second apparatus. This application is still applicable to another form, and the following implementations do not constitute a limitation on this application.
1. A first apparatus is a first terminal device or a chip in the first terminal device, and a second apparatus is a network device or a chip in the network device. In this implementation, the first apparatus and the second apparatus may perform the communication method provided in this application.
Optionally, a third apparatus is a second terminal device or a chip in the second terminal device. The third apparatus may perform the communication method provided in this application.
It should be noted that the first terminal device and the second terminal device are used as an example for description. In actual application, the network device may perform the communication method provided in this application with more terminal devices.
2. A first apparatus is a first network device or a chip in the first network device, and a second apparatus is a terminal device or a chip in the terminal device. In this implementation, the first apparatus and the second apparatus may perform the communication method provided in this application.
Optionally, a third apparatus is a second network device or a chip in the second network device. The third apparatus may perform the communication method provided in this application.
It should be noted that the first network device and the second network device are used as an example for description. In actual application, the terminal device may perform the communication method provided in this application with more network devices.
3. A first apparatus is a first terminal device or a chip in the first terminal device, and a second apparatus is a second terminal device or a chip in the second terminal device. In this implementation, the first apparatus and the second apparatus may perform the communication method provided in this application.
Optionally, a third apparatus is a third terminal device or a chip in the third terminal device. The third apparatus may perform the communication method provided in this application.
It should be noted that the first terminal device, the second terminal device, and the third terminal device are used as an example for description. In actual application, the first terminal device may perform the communication method provided in this application with more terminal devices.
The following describes a terminal device and a network device in this application.
The terminal device is a device having a wireless transceiver function, and further has a computing capability. The terminal device may perform machine learning training based on local data, and send, to the network device, related information of a model obtained by the terminal device through training.
The terminal device may be user equipment (UE), an access terminal, a subscriber unit, a subscriber station, a mobile station (MS), a remote station, a remote terminal, a mobile device, a user terminal, a wireless communication device, a user agent, or a user apparatus. Alternatively, the terminal device may be a satellite phone, a cellular phone, a smartphone, a wireless data card, a wireless modem, or a machine-type communication device, or may be a cordless phone, a session initiation protocol (SIP) phone, a wireless local loop (WLL) station, a personal digital assistant (PDA), a handheld device having a wireless communication function, a computing device, another processing device connected to a wireless modem, a vehicle-mounted device, a communication device carried on a high-altitude aircraft, a wearable device, an uncrewed aerial vehicle, a robot, a terminal in D2D, a terminal in V2X, a virtual reality (VR) terminal device, an augmented reality (AR) terminal device, a wireless terminal in industrial control, a wireless terminal in self-driving, a wireless terminal in telemedicine (remote medical), a wireless terminal in a smart grid, a wireless terminal in transportation safety, a wireless terminal in a smart city, a wireless terminal in a smart home, a terminal device in a future communication network, or the like. This is not limited in this application.
The network device has a wireless transceiver function, and further has a computing capability. The network device is configured to communicate with the terminal device. In other words, the network device may be a device that connects the terminal device to a wireless network. For example, the network device may be a network node having a computing capability. For example, the network device may be an artificial intelligence (AI) node, a computing node, or an access network node having an AI capability on a network side (for example, an access network or a core network). The network device may fuse models trained by a plurality of terminal devices, and then send an obtained model to the terminal devices. In this way, collaborative learning between the plurality of terminal devices is implemented.
The network device may be a node in a radio access network. The network device may be referred to as a base station, or may be referred to as a radio access network (RAN) node or a RAN device. The network device may be an evolved NodeB (eNB or eNodeB) in LTE, a next generation NodeB (gNB) in a 5G network, a base station in a future evolved public land mobile network (PLMN), a broadband network gateway (BNG), an aggregation switch or a non-3rd generation partnership project (3GPP) access device, or the like. Optionally, the network device in embodiments of this application may include base stations in various forms, for example, a macro base station, a micro base station (also referred to as a small cell), a relay station, an access point, a device that implements a base station function in a communication system evolved after 5G, an access point (AP) in a Wi-Fi system, a transmitting and receiving point (TRP), a transmitting point (TP), a mobile switching center, and a device that undertakes a base station function in D2D communication, V2X device communication, or machine-to-machine (M2M) communication. The network device may further include a central unit (CU) and a distributed unit (DU) in a cloud access network (C-RAN) system, and a network device in a non-terrestrial network (NTN) communication system. That is, the network device may be deployed on a high-altitude platform or a satellite. This is not limited in this application.
The following describes a possible communication system to which this application is applicable.
In a possible implementation, the terminal device 101, the terminal device 102, and the network device 103 may perform the communication method provided in this application, to reduce overheads for reporting related information of a model of the terminal device by the terminal device, and reduce communication overheads.
It should be noted that
Distributed learning is a learning method for implementing collaborative learning. Specifically, a plurality of node devices obtain local models through training based on local data, and a central node device fuses the plurality of local models, to obtain a global model. In this way, collaborative learning is implemented on a premise that privacy of user data of the node devices is protected.
The plurality of node devices may respectively train the local models of the node devices, to obtain related parameters of the local models, for example, a weight parameter or a weight gradient of the local model. Then, the plurality of node devices send the related parameters of the local models to the central node device. The central node device fuses the related parameters of the local models sent by the plurality of node devices, to obtain a related parameter of the global model, and delivers the related parameter to each node device. Each node device may update a local model of the node device based on the related parameter of the global model. It can be learned from the foregoing technical solution that all the node devices respectively send the related parameters of the local models to the central node device, resulting in a large amount of data reported by the node device and large communication overheads. Therefore, how the node devices report the related parameters of the local models with low communication overheads is an urgent problem to be resolved.
The following describes mathematical symbols in this application.
mean(x): indicates that an average value of all elements in a vector x is obtained.
abs(y): indicates that an absolute value of each element in a vector y is obtained.
mean(x1,y1): indicates that an average value of an element x1 and an element y1 is obtained.
The following describes the technical solutions of this application with reference to specific embodiments.
201: A second apparatus sends at least one quantization threshold to a first apparatus. Correspondingly, the first apparatus receives the at least one quantization threshold from the second apparatus.
The at least one quantization threshold is used by the first apparatus to perform quantization on related information of a first model. Optionally, the first model may be a model configured by the second apparatus for the first apparatus. Optionally, the first model may be a neural network model.
Optionally, the related information of the first model is obtained by performing one round of training on the first model by the first apparatus.
Optionally, the related information of the first model includes an output parameter or an update parameter of the first model. The output parameter of the first model may be understood as output data of the first model. For ease of description, the output parameter of the first model is collectively referred to as an output parameter below. The update parameter of the first model includes a weight parameter or a weight gradient of the first model. For example, the first model is a neural network model, and the related information of the first model includes an output parameter of the neural network model. Alternatively, the related information of the first model includes a weight parameter or a weight gradient of the neural network model.
In a possible implementation, the first apparatus is a first terminal device, the second apparatus is a network device, and the at least one quantization threshold may be carried in downlink control information, a radio resource control (RRC) message, or a media access control control element (MAC CE).
In another possible implementation, the first apparatus is a network device, the second apparatus is a terminal device, and the at least one quantization threshold may be carried in uplink control information.
The following describes a possible implementation in which the second apparatus determines the at least one quantization threshold. Optionally, the embodiment shown in
201
a: The first apparatus sends second information to the second apparatus. Correspondingly, the second apparatus receives the second information from the first apparatus.
The following describes two possible implementations of the second information.
Implementation 1: The second information indicates information obtained by processing the related information of the first model.
Optionally, the second information includes the information obtained by processing the related information of the first model, or the second information indicates the information obtained by processing the related information of the first model.
For example, the related information of the first model includes the output parameter of the first model. The information obtained by processing the related information of the first model includes an average value or a weighted value of absolute values of output parameters of the first model. For example, the output parameter of the first model includes an output parameter A, an output parameter B, and an output parameter C that are of the first model. The first apparatus averages absolute values respectively corresponding to the output parameter A, the output parameter B, and the output parameter C, to obtain an average value of the absolute values of the output parameters. The second information includes the average value or the weighted value of the absolute values of the output parameters of the first model. Alternatively, the second information indicates the average value or the weighted value of the absolute values of the output parameters of the first model.
For example, the second information is indication information, and a correspondence between a value of the indication information and the average value or the weighted value of the absolute values of the output parameters of the first model may be shown in Table 1 or Table 2.
For example, the related information of the first model includes the update parameter of the first model. The information obtained by processing the related information of the first model includes an average value or a weighted value of absolute values of update parameters of the first model. For example, the update parameter of the first model includes a weight gradient ΔwQ1, a weight gradient ΔwQ2, and a weight gradient ΔwQ3 that are obtained by performing a Qth round of training on the first model by the first apparatus. The first apparatus averages absolute values respectively corresponding to the weight gradient ΔwQ1, the weight gradient ΔwQ2, and the weight gradient ΔwQ3, to obtain an average value of the absolute values of the weight gradients of the first model. The second information includes the average value or the weighted value of the absolute values of the update parameters of the first model. Alternatively, the second information indicates the average value or the weighted value of the absolute values of the update parameters of the first model. For example, the second information is indication information, and a correspondence between a value of the indication information and the average value or the weighted value of the absolute values of the update parameters of the first model may be shown in Table 2.
Implementation 2: The second information indicates information obtained by processing related information obtained by performing an Mth round of training on the first model by the first apparatus. The related information of the first model is related information obtained by performing a Qth round of training on the first model by the first apparatus. M is an integer greater than or equal to 1 and less than Q, and Q is an integer greater than 1.
In the implementation 2, the second information includes the information obtained by processing the related information obtained by performing the Mth round of training on the first model by the first apparatus, or the second information indicates the information obtained by processing the related information obtained by performing the Mth round of training on the first model by the first apparatus. For the information obtained by processing the related information obtained by performing the Mth round of training on the first model by the first apparatus, refer to related descriptions of the information obtained by processing the related information of the first model.
The implementation 2 is similar to the implementation 1. For details, refer to related descriptions of the implementation 1.
In a possible implementation, the first apparatus is a terminal device, the second apparatus is a network device, and the second information may be carried in downlink control information, an RRC message, or a MAC CE. In another possible implementation, the first apparatus is a network device, the second apparatus is a terminal device, and the second information may be carried in uplink control information.
201
b: The second apparatus determines the at least one quantization threshold based on the second information.
For example, the at least one quantization threshold includes one quantization threshold. The second information includes the average value of the absolute values of the weight gradients of the first model. The quantization threshold γ1=mean(abs(ΔwQ))*a, and a indicates a control factor, and is used to control a range of quantization. A value range of a is [0, +∞). abs(ΔwQ) represents the absolute value of the weight gradient obtained by performing the Qth round of training on the first model by the first apparatus.
For example, the at least one quantization threshold includes two quantization thresholds: a first quantization threshold and a second quantization threshold. The first quantization threshold γ1=mean(abs(ΔwQ))*a, and the second quantization threshold −γ1=−mean(abs(ΔwQ))*a. abs(ΔwQ) represents the absolute value of the weight gradient obtained by performing the Qth round of training on the first model by the first apparatus.
Optionally, the embodiment shown in
201
c: A third apparatus sends third information to the second apparatus. Correspondingly, the second apparatus receives the third information from the third apparatus.
The third information indicates information obtained by processing related information of a second model of the third apparatus. Alternatively, the third information indicates information obtained by performing an Sth round of training on the second model and performing processing by the third apparatus. The related information of the second model is related information obtained by performing an Rth round of training on the second model by the third apparatus. S is an integer greater than or equal to 1 and less than R, and R is an integer greater than 1. The third information is similar to the second information. For details, refer to the foregoing related descriptions of the second information.
It should be noted that the second model may be a model configured by the second apparatus for the third apparatus. The first model and the second model may be a same model. For example, both the first model and the second model are global models configured by the second apparatus. The first model and the second model in this specification are intended to distinguish between models on the first apparatus and the second apparatus, and may be actually a same model.
Based on step 201c above, optionally, step 201b above specifically includes the following step.
The second apparatus determines the at least one quantization threshold based on the second information and the third information.
For example, the second information includes the average value of the absolute values of the weight gradients of the first model. The third information includes an average value of absolute values of weight gradients of the second model. The second apparatus determines the at least one quantization threshold based on the average value of the absolute values of the weight gradients of the first model and the average value of the absolute values of the weight gradients of the second model. For example, the at least one quantization threshold includes two quantization thresholds: a first quantization threshold and a second quantization threshold. The first quantization threshold γ1=mean(mean(abs(ΔwQ)), mean(abs(ΔwR)))*a, and the second quantization threshold −γ1=−mean(mean(abs(ΔwQ)), mean(abs(ΔwR)))*a. N weight gradients obtained by performing the Qth round of training on the first model by the first apparatus are represented by a vector ΔwQ. N weight gradients obtained by performing the Rth round of training on the second model by the second apparatus are represented by a vector ΔwR.
It should be noted that step 201a to step 201c above are merely an example in which the second apparatus determines the at least one quantization threshold based on the second information of the first apparatus and the third information of the third apparatus, to perform the technical solutions of this application. In actual application, the second apparatus may receive related information, indicated by a plurality of apparatuses, of a model, and determine the at least one quantization threshold based on the related information of the model. This is not specifically limited in this application.
It should be noted that there is no fixed execution sequence between step 201c and step 201a. Step 201a may be performed before step 201c, or step 201c may be performed before step 201a, or step 201a and step 201c may be simultaneously performed based on a situation. This is not specifically limited in this application.
202: The first apparatus performs quantization on the related information of the first model of the first apparatus based on the at least one quantization threshold.
It can be learned from the foregoing description that the related information of the first model includes the output parameter or the update parameter of the first model. Herein, the technical solutions of this application are described by using an example in which the related information of the first model includes N parameters of the first model. N is an integer greater than or equal to 1. Therefore, step 202 above specifically includes: The first apparatus performs quantization on the N parameters of the first model based on the at least one quantization threshold, to obtain N quantized parameters. For example, as shown in
In a possible implementation, the at least one quantization threshold includes one quantization threshold γ1. The related information of the first model includes the N parameters of the first model. Step 202 above specifically includes: If an ith parameter in the N parameters is greater than the quantization threshold γ1, the first apparatus quantizes the ith parameter to a first value, where i is an integer greater than or equal to 1 and less than or equal to N, or if an ith parameter in the N parameters is less than or equal to the quantization threshold γ1, the first apparatus quantizes the ith parameter to a third value. Alternatively, step 202 above specifically includes: If an ith parameter in the N parameters is greater than or equal to the quantization threshold γ1, the first apparatus quantizes the ith parameter to a first value, where i is an integer greater than or equal to 1 and less than or equal to N, or if an ith parameter in the N parameters is less than the quantization threshold γ1, the first apparatus quantizes the ith parameter to a third value.
For example, the first value is +1, and the third value is −1. The N parameters of the first model are N weight gradients of the first model. An ith weight gradient in the N weight gradients is represented as ΔwQi. When the weight gradient ΔwQi is greater than the quantization threshold γ1, the weight gradient ΔwQi is quantized to +1, or when the weight gradient ΔwQi is less than or equal to the quantization threshold γ1, the weight gradient ΔwQi is quantized to −1. An ith quantized weight gradient si may be represented according to the following formula 1:
The foregoing shows the process in which the first apparatus quantizes the ith parameter in the N parameters of the first model, and is also applicable to a process of quantizing another parameter in the N parameters. Details are not described herein again.
It should be noted that, optionally, if the ith parameter in the N parameters is greater than or equal to the quantization threshold γ1, the first apparatus quantizes the ith parameter to the first value, where i is an integer greater than or equal to 1 and less than or equal to N, or if the ith parameter in the N parameters is less than or equal to the quantization threshold γ1, the first apparatus quantizes the ith parameter to the third value. In other words, if the ith parameter is equal to the quantization threshold γ1, the first apparatus may quantize the ith parameter to the first value or the third value. In this case, the first apparatus may randomly quantize the ith parameter to the first value or the third value in a random quantization manner.
In another possible implementation, the at least one quantization threshold includes two quantization thresholds: a first quantization threshold γ1 and a second quantization threshold −γ1. The related information of the first model includes the N parameters of the first model. Step 202 above specifically includes: If an ith parameter in the N parameters is greater than the first quantization threshold γ1, the first apparatus quantizes the ith parameter to a first value, where i is an integer greater than or equal to 1 and less than or equal to N, if an ith parameter in the N parameters is less than or equal to the first quantization threshold γ1 and is greater than or equal to the second quantization threshold −γ1, the first apparatus quantizes the ith parameter to a second value, or if an ith parameter in the N parameters is less than the second quantization threshold −γ1, the first apparatus quantizes the ith parameter to a third value. Alternatively, step 202 above specifically includes: If an ith parameter in the N parameters is greater than or equal to the first quantization threshold γ1, the first apparatus quantizes the ith parameter to a first value, where i is an integer greater than or equal to 1 and less than or equal to N, if an ith parameter in the N parameters is less than the first quantization threshold γ1 and is greater than the second quantization threshold −γ1, the first apparatus quantizes the ith parameter to a second value, or if the ith parameter in the N parameters is less than or equal to the second quantization threshold −γ1, the first apparatus quantizes the ith parameter to a third value.
For example, the first value is +1, the second value is 0, and the third value is −1. The N parameters of the first model are N weight gradients of the first model. An ith weight gradient in the N weight gradients is represented as ΔwQi. When the weight gradient ΔwQi is greater than the first quantization threshold γ1, the weight gradient ΔwQi is quantized to +1. When the weight gradient ΔwQi is less than the second quantization threshold −γ1, the weight gradient ΔwQi is quantized to −1. When the weight gradient ΔwQi is less than or equal to the first quantization threshold γ1 and is greater than or equal to the second quantization threshold −γ1, the weight gradient ΔwQi is quantized to 0. Therefore, an ith quantized weight gradient si may be represented according to the following formula 2:
The foregoing shows the process in which the first apparatus quantizes the ith parameter in the N parameters of the first model, and is also applicable to a process of quantizing another parameter in the N parameters. Details are not described herein again. In the foregoing implementation, the first apparatus may quantize the parameter of the first model based on a plurality of quantization thresholds. This helps improve quantization precision and improve model convergence speed and performance. Further, it can be learned from the foregoing formula 2 that a value of si may be 0, and it indicates that the first apparatus may not update the ith parameter when a value of the ith parameter falls within a range between the second quantization threshold and the first quantization threshold. For example, if the ith parameter is caused by training noise, the first apparatus does not update the ith parameter. This helps improve accuracy of a first model obtained by the second apparatus through training.
It should be noted that, optionally, if the ith parameter in the N parameters is greater than or equal to the first quantization threshold γ1, the first apparatus quantizes the ith parameter to the first value, where i is the integer greater than or equal to 1 and less than or equal to N, if the ith parameter in the N parameters is less than or equal to the first quantization threshold γ1 and is greater than or equal to the second quantization threshold −γ1, the first apparatus quantizes the ith parameter to the second value, or if the ith parameter in the N parameters is less than the second quantization threshold −γ1, the first apparatus quantizes the ith parameter to the third value. In other words, for the ith parameter, if the ith parameter is equal to the first quantization threshold γ1, the first apparatus may quantize the ith parameter to the first value or the second value. In this case, the first apparatus may randomly quantize the ith parameter to the first value or the second value in a random quantization manner.
It should be noted that, optionally, if the ith parameter in the N parameters is greater than the first quantization threshold γ1, the first apparatus quantizes the ith parameter to the first value, where i is the integer greater than or equal to 1 and less than or equal to N, if the ith parameter in the N parameters is less than or equal to the first quantization threshold γ1 and is greater than or equal to the second quantization threshold −γ1, the first apparatus quantizes the ith parameter to the second value, or if the ith parameter in the N parameters is less than or equal to the second quantization threshold −γ1, the first apparatus quantizes the ith parameter to the third value. In other words, for the ith parameter, if the ith parameter is equal to the second quantization threshold −γ1, the first apparatus may quantize the ith parameter to the second value or the third value. In this case, the first apparatus may randomly quantize the ith parameter to the second value or the third value in a random quantization manner.
The foregoing shows an example in which the at least one quantization threshold includes one quantization threshold and two quantization thresholds. In actual application, the at least one quantization threshold may include three quantization thresholds, four quantization thresholds, or more quantization thresholds. This is not specifically limited in this application, and examples are not provided one by one herein.
Optionally, in step 202 above, the related information of the first model includes N parameters of the first model that are obtained through quantization error compensation. For the N parameters obtained through quantization error compensation, refer to related descriptions in step 202a below.
Optionally, the embodiment shown in
202
a: The first apparatus performs error compensation for the N parameters based on quantization errors respectively corresponding to the N parameters of the first model, to obtain the N parameters obtained through quantization error compensation.
The N parameters of the first model are obtained by performing the Qth round of training on the first model by the first apparatus. A quantization error corresponding to the ith parameter in the N parameters is determined based on an ith parameter obtained by performing a (Q−1)th round of training on the first model and performing quantization error compensation by the first apparatus.
For example, the ith parameter in the N parameters of the first model is the ith weight gradient ΔwQi, and an ith weight gradient obtained through quantization error compensation may be represented as ΔwQi′=ΔwQi+eQ-1i. eQ-1i=ΔwQ-1i′−η*Q(ΔwQ-1i′), ΔwQ-1i′ represents an ith weight gradient obtained through quantization error compensation and the (Q−1)th round of training. η indicates a global learning rate. Q(ΔwQ-1i′) represents quantization on ΔwQ-1i′.
It should be noted that the first apparatus may determine a quantization error eQi corresponding to an ith parameter obtained through a (Q+1)th round of training, where eQi=ΔwQi′−η*Q(ΔwQi′). Therefore, the first apparatus performs quantization error compensation for N parameters obtained through the (Q+1)th round of training.
Based on step 202a above, the related information of the first model includes the N parameters obtained through quantization error compensation. Optionally, step 202 above specifically includes: The first apparatus performs, based on the at least one quantization threshold, quantization on the N parameters obtained through quantization error compensation. For a specific quantization process, refer to related descriptions of step 202 above.
It can be learned that, in step 202a above, the first apparatus performs quantization error compensation for the N parameters based on the quantization errors respectively corresponding to the N parameters of the first model. This helps improve accuracy of updating the first model by the second apparatus and improve model training performance.
Optionally, in step 202 above, the related information of the first model includes N parameters of the first model that are obtained through sparsification. For the N parameters of the first model that are obtained through sparsification, refer to related descriptions in step 202b below.
Optionally, the embodiment shown in
202
b: The first apparatus selects N parameters from K parameters of the first model based on a common sparse mask, to obtain the N parameters of the first model that are obtained through sparsification.
In a possible implementation, the K parameters of the first model are obtained by performing one round of training on the first model by the first apparatus.
In another possible implementation, the K parameters of the first model are obtained by performing one round of training on the first model and performing quantization error compensation by the first apparatus. A process in which the first apparatus performs quantization error compensation for the K parameters is similar to that in step 202a above. For details, refer to related descriptions of step 202a above.
Optionally, the common sparse mask is a bit sequence, and the bit sequence includes K bits. The K bits one-to-one correspond to the K parameters. When a value of one bit in the K bits is 0, it indicates the first apparatus not to select a parameter corresponding to the bit. When a value of one bit in the K bits is 1, it indicates the first apparatus to select a parameter corresponding to the bit. Alternatively, when a value of one bit in the K bits is 0, it indicates the first apparatus to select a parameter corresponding to the bit. When a value of one bit in the K bits is 1, it indicates the first apparatus not to select a parameter corresponding to the bit. For example, the K parameters include 10 weight gradients of the first model. The bit sequence is 100011100, and the bit sequence one-to-one corresponds to the 10 weight gradients from a most significant bit to a least significant bit. For example, a 1st bit in the bit sequence corresponds to a 1st weight gradient in the 10 weight gradients, a 2nd bit in the bit sequence corresponds to a 2nd weight gradient in the 10 weight gradients, and the rest can be deduced by analogy. A 10th bit in the bit sequence corresponds to a 10th weight gradient in the 10 weight gradients. In this case, it can be learned that the related information of the first model includes the 1st weight gradient, a 5th weight gradient, a 6th weight gradient, a 7th weight gradient, and an 8th weight gradient in the 10 weight gradients.
The following describes two possible implementations in which the first apparatus obtains the common sparse mask.
Implementation 1: The common sparse mask is determined by the first apparatus based on a sparsity ratio and a pseudo-random number. The sparsity ratio is indicated by the second apparatus to the first apparatus.
It should be noted that a plurality of apparatuses need to use a same common sparse mask, so that all of the plurality of apparatuses send, to the second apparatus, parameters with a same index of models configured on the apparatuses. In addition, the plurality of apparatuses may send the parameters with the same index on a same time-frequency resource. This helps reduce communication resources required by the plurality of apparatuses to report model parameters and improve communication resource utilization. In this way, the second apparatus is supported in receiving, on a same time-frequency resource, the parameters with the same index that are sent by the plurality of apparatuses. In other words, the second apparatus is supported in implementing model fusion through superimposition of over-the-air signals.
It should be noted that the second apparatus may indicate different sparsity ratios to the first apparatus in different training phases. For example, in a start phase of training, the sparsity ratio may be small. In this way, it is convenient for the second apparatus to obtain more related information of the model, to implement fast model convergence. In a convergence phase of training, the sparsity ratio may be large.
Implementation 2: The following describes the implementation 2 based on step 201e.
Optionally, the embodiment shown in
202
b: The second apparatus sends second indication information to the first apparatus. Correspondingly, the first apparatus receives the second indication information from the second apparatus. The second indication information indicates the common sparse mask.
With reference to an embodiment shown in
Based on step 202b above, optionally, step 202 above specifically includes: The first apparatus performs, based on the at least one quantization threshold, quantization on the N parameters of the first model that are obtained through sparsification. For a specific quantization process, refer to related descriptions of step 202 above. For example, as shown in
It can be learned that, in step 202b above, the first apparatus selects some parameters of the first model based on the common sparse mask. This helps reduce overheads for reporting the parameters of the first model by the first apparatus.
There is no fixed execution sequence between step 201e above and step 201a, step 201b, step 201c, and step 201. Step 201e may be performed before step 201a, step 201b, step 201c, and step 201. Alternatively, step 201a, step 201b, step 201c, and step 201 may be performed before step 201e. Alternatively, step 201e, step 201a, step 201b, step 201c, and step 201 may be simultaneously performed based on a situation.
203: The first apparatus sends first information to the second apparatus. The first information indicates quantized related information of the first model. Correspondingly, the second apparatus receives the first information from the first apparatus.
In a possible implementation, the first information includes quantized related information of the first model. For example, the related information of the first model includes the N parameters of the first model, and the first information includes N quantized parameters of the first model.
In another possible implementation, the first information is indication information, and the indication information indicates the quantized related information of the first model.
Optionally, the related information of the first model includes the quantized N parameters of the first model. The following describes a possible implementation of step 203 above. Optionally, step 203 above specifically includes step 2003a and step 2003b.
2003
a: The first apparatus modulates the N quantized parameters of the first model, to obtain N first signals. The N first signals one-to-one correspond to the N parameters.
2003
b: The first apparatus sends the N first signals to the second apparatus. Correspondingly, the second apparatus receives the N first signals from the first apparatus.
The following describes step 2003a and step 2003b above with reference to the quantization example shown in the foregoing formula 2.
The first apparatus modulates an ith parameter in the N quantized parameters of the first model, to obtain an ith first signal. The ith first signal corresponds to two sequences, and each of the two sequences includes at least one symbol. The following describes two possible implementations in which the first apparatus sends the two sequences, so that the second apparatus determines a value of the ith quantized parameter.
Implementation 1: When the ith quantized parameter is the first value, a transmit power at which the first apparatus sends a 1st sequence in the two sequences is less than a transmit power at which the first apparatus sends a 2nd sequence in the two sequences. When the ith quantized parameter is the second value, a transmit power at which the first apparatus sends a 1st sequence in the two sequences is equal to a transmit power at which the first apparatus sends a 2nd sequence in the two sequences. When the ith quantized parameter is the third value, a transmit power at which the first apparatus sends a 1st sequence in the two sequences is greater than a transmit power at which the first apparatus sends a 2nd sequence in the two sequences.
Optionally, when the ith quantized parameter is the first value, the 1st sequence in the two sequences is an all-0 sequence, and the 2nd sequence is a non-all-0 sequence. When the ith quantized parameter is the second value, the two sequences are both all-0 sequences. When the ith quantized parameter is the third value, the 1st sequence in the two sequences is a non-all-0 sequence, and the 2nd sequence is an all-0 sequence. For example, the first value is +1, the second value is 0, and the third value is −1. The ith first signal carries the ith parameter si, and the ith parameter corresponds to two sequences. For various values of the ith parameter, the two corresponding sequences (namely, a sequence 1 and a sequence 2) are respectively shown in Table 3.
Both c0 and c1 are sequences of a specific length. For example, both a length of c0 and a length of c1 are 1. In other words, each of the sequences includes one symbol. Optionally, both c0 and c1 may be Zadoff-Chu sequences, and the Zadoff-Chu sequence may be briefly referred to as a ZC sequence.
Implementation 2: When the ith quantized parameter is the first value, a transmit power at which the first apparatus sends a 1st sequence in the two sequences is greater than a transmit power at which the first apparatus sends a 2nd sequence in the two sequences. When the ith quantized parameter is the second value, a transmit power at which the first apparatus sends a 1st sequence in the two sequences is equal to a transmit power at which the first apparatus sends a 2nd sequence in the two sequences. When the ith quantized parameter is the third value, a transmit power at which the first apparatus sends a 1st sequence in the two sequences is less than a transmit power at which the first apparatus sends a 2nd sequence in the two sequences.
Optionally, when the ith quantized parameter is the first value, the 1st sequence in the two sequences is a non-all-0 sequence, and the 2nd sequence is an all-0 sequence. When the ith quantized parameter is the second value, the two sequences are both all-0 sequences. When the ith quantized parameter is the third value, the 1st sequence in the two sequences is an all-0 sequence, and the 2nd sequence is a non-all-0 sequence. For example, the first value is +1, the second value is 0, and the third value is −1. The ith first signal carries the ith parameter si, and the ith parameter corresponds to two sequences. For various values of the ith parameter, the two corresponding sequences (namely, a sequence 1 and a sequence 2) are respectively shown in Table 4.
For c0 and c1, refer to the foregoing related descriptions. Details are not described herein again.
It should be noted that the foregoing shows a possible example of the first value, the second value, and the third value. In actual application, the first value, the second value, and the third value may alternatively be other values. This is not specifically limited in this application. For example, the first value is 0.7, the second value is 0, and the third value is −0.7.
As shown in
It can be learned that the first apparatus receives the at least one quantization threshold from the second apparatus. Then, the first apparatus performs quantization on the related information of the first model of the first apparatus based on the at least one quantization threshold. The first apparatus sends the first information to the second apparatus, where the first information indicates the quantized related information of the first model. This reduces communication overheads for reporting the related information of the first model by the first apparatus, and saves communication resources.
Optionally, the embodiment shown in
204: The second apparatus determines global information of the first model based on the first information.
The global information of the first model includes a global output parameter of the first model. Alternatively, the global information of the first model includes a global update parameter and/or a global learning rate of the first model. The global output parameter of the first model may be understood as global output data of the first model. The global update parameter of the first model includes a global weight parameter or a global weight gradient of the first model.
Optionally, the global information of the first model includes N global parameters of the first model, and the global parameter is an output parameter or an update parameter. For a process of determining the N global parameters, refer to related descriptions below.
Optionally, the first information includes the N quantized parameters of the first model, and the second apparatus may determine the global learning rate η based on the N quantized parameters of the first model.
For example, the N quantized parameters of the first model include N weight gradients obtained by performing the Qth round of training on the first model and performing quantization by the first apparatus. Specifically, the N weight gradients of the first model are represented by a vector ΔwQ. In other words, the vector ΔwQ includes the N weight gradients obtained by performing the Qth round of training on the first model by the first apparatus. The second apparatus may determine the global learning rate η=mean(abs(Δwq)). The vector Δwq includes a quantized non-0 weight parameter in the vector ΔwQ.
It should be noted that, optionally, the first apparatus may alternatively send sixth information to the second apparatus. The sixth information indicates an average value of absolute values of values of quantized non-0 parameters in the N parameters of the first model. The second apparatus determines the global learning rate based on the sixth information.
For example, the N parameters of the first model are the N weight gradients obtained by performing the Qth round of training on the first model by the first apparatus. Specifically, the N weight gradients of the first model are represented by the vector ΔwQ. In this case, the second apparatus may determine the global learning rate η=mean(abs(Δwq)), where mean(abs(wq)) is indicated by the first apparatus to the second apparatus based on the sixth information, and abs(Δwq) indicates an absolute value of a value of the quantized non-0 weight parameter in the vector ΔwQ.
It should be noted that, optionally, the global learning rate η is variable. For example, the global learning rate η is a constant that varies with a quantity of training rounds.
It should be noted that, in step 204 above, the second apparatus determines the global learning rate based on the first information. In actual application, the second apparatus may determine the global learning rate based on the second information. Optionally, the second apparatus determines the global learning rate based on the second information and the third information. This is not specifically limited in this application.
In a possible implementation, the first model is the neural network model. The related information of the first model includes related parameters of neurons at all layers of the neural network model. In this implementation, the N global parameters of the first model that are included in the global information of the first model in step 204 above are global parameters of the neurons at all the layers.
In this implementation, the at least one quantization threshold and the global learning rate are uniformly set for the neurons at all the layers of the neural network model.
In another possible implementation, the first model is the neural network model. The related information of the first model includes related parameters of neurons at P layers of the neural network model, where P is an integer greater than or equal to 1.
In this implementation, the N global parameters of the first model that are included in the global information of the first model in step 204 above are global parameters of the neurons at the P layers.
In this implementation, the at least one quantization threshold and the global learning rate are uniformly set for the neurons at the P layers of the neural network model. For a neuron at a layer other than the P layers of the neural network model, a corresponding quantization threshold and global learning rate should be additionally determined.
Optionally, the embodiment shown in
203
a: The third apparatus sends fifth information to the second apparatus. The fifth information indicates quantized related information of the second model. Correspondingly, the second apparatus receives the fifth information from the third apparatus.
Specifically, the quantized related information of the second model is obtained by performing quantization on the related information of the second model based on the at least one quantization threshold by the third apparatus. For a specific quantization process, refer to related descriptions of step 202 above.
Optionally, the related information of the second model includes N parameters of the second model. For the second model, refer to the foregoing related descriptions. Optionally, step 203a above specifically includes step 1 and step 2.
Step 1: The third apparatus modulates the N parameters of the second model, to obtain N second signals. The N second signals carry the N parameters of the second model, and the N second signals one-to-one correspond to the N parameters of the second model.
Step 2: The third apparatus sends the N second signals to the second apparatus. Correspondingly, the second apparatus receives the N second signals from the third apparatus.
Step 1 and step 2 are similar to step 2003a and step 2003b above. For details, refer to related descriptions of step 2003a and step 2003b above. Details are not described herein again.
Optionally, the ith first signal in the N first signals corresponds to a first sequence and a second sequence. The first sequence is a 1st sequence in the two sequences corresponding to the ith first signal, and the second sequence is a 2nd sequence in the two sequences corresponding to the ith first signal. An ith second signal in the N second signals corresponds to a third sequence and a fourth sequence. The third sequence is a 1st sequence in the two sequences corresponding to the ith second signal, and the fourth sequence is a 2nd sequence in the two sequences corresponding to the ith second signal. i is an integer greater than or equal to 1 and less than or equal to N. A time-frequency resource used by the first apparatus to send the first sequence is the same as a time-frequency resource used by the third apparatus to send the third sequence. A time-frequency resource used by the first apparatus to send the second sequence is the same as a time-frequency resource used by the third apparatus to send the fourth sequence. In this way, the second apparatus is supported in implementing incoherent reception in multi-user over-the-air signal superimposition transmission.
It should be noted that there is no fixed execution sequence between step 203 and step 203a. Step 203 may be performed before step 203a, or step 203a may be performed before step 203, or step 203 and step 203a may be simultaneously performed based on a situation. This is not specifically limited in this application.
Based on step 203 and step 203a above, optionally, step 204 above specifically includes: The second apparatus determines the global information of the first model based on the first information and the fifth information.
Specifically, the second apparatus determines the global information of the first model based on the N first signals and the N second signals. The following describes a possible implementation of step 204 above by using an example in which the ith first signal in the N first signals corresponds to the first sequence and the second sequence, and the ith second signal in the N second signals corresponds to the third sequence and the fourth sequence. The time-frequency resource used by the first apparatus to send the first sequence is the same as the time-frequency resource used by the third apparatus to send the third sequence. The time-frequency resource used by the first apparatus to send the second sequence is the same as the time-frequency resource used by the third apparatus to send the fourth sequence.
Optionally, step 204 above specifically includes step 204a to step 204c.
204
a: The second apparatus determines a first signal energy sum of the first sequence and the third sequence that are received by the second apparatus.
For example, the first signal energy sum may be represented as ∥{tilde over (y)}2i-1∥2.
204
b: The second apparatus determines a second signal energy sum of the second sequence and the fourth sequence that are received by the second apparatus.
For example, the second signal energy sum may be represented as ∥{tilde over (y)}2i∥2.
204
c: The second apparatus determines an ith global parameter in the N global parameters based on the first signal energy sum and the second signal energy sum.
Based on the implementation 1 in step 2003b above, optionally, step 204c above specifically includes the following step:
If a sum of the first signal energy sum and a decision threshold is less than the second signal energy sum, the second apparatus determines a value of the ith global parameter as the first value, or if a sum of the first signal energy sum and the decision threshold is greater than or equal to the second signal energy sum, and a sum of the second signal energy sum and the decision threshold is greater than or equal to the first signal energy sum, the second apparatus determines a value of the ith global parameter as the second value, or if a sum of the second signal energy sum and the decision threshold is less than the first signal energy sum, the second apparatus determines a value of the ith global parameter as the third value.
For example, the first value is +1, the second value is 0, and the third value is −1. The global information of the first model includes N global weight gradients of the first model, and an ith global weight gradient ai in the N global weight gradients may be represented as a formula 3:
γ2 indicates the decision threshold, the first signal energy sum may be represented as ∥y2i-1∥2, and the second signal energy sum may be represented as ∥{tilde over (y)}2i∥2.
Based on the implementation 2 in step 2003b above, optionally, step 204c above specifically includes the following step.
If the first signal energy sum is greater than a sum of the second signal energy sum and the decision threshold, the second apparatus determines a value of the ith global parameter as the first value, or if the first signal energy sum is less than or equal to a sum of the second signal energy sum and the decision threshold, and the second signal energy sum is less than or equal to a sum of the first signal energy sum and the decision threshold, the second apparatus determines a value of the ith global parameter as the second value, or if the second signal energy sum is greater than a sum of the first signal energy sum and the decision threshold, the second apparatus determines a value of the ith global parameter as the third value.
For example, the first value is +1, the second value is 0, and the third value is −1. The global information of the first model includes N global weight gradients of the first model, and an ith global weight gradient ai in the N global weight gradients may be represented as a formula 4:
γ2 indicates the decision threshold, the first signal energy sum may be represented as ∥{tilde over (y)}2i-1∥2, and the second signal energy sum may be represented as ∥{tilde over (y)}2i∥2.
The process of step 204a to step 204c above shows a process in which the second apparatus determines the ith global parameter. The second apparatus may determine another global parameter in the N global parameters by using a similar process. Details are not described herein again.
It should be noted that the second apparatus may determine the decision threshold based on the N first signals and/or the N second signals. For example, the first apparatus sends the ith first signal to the second apparatus, and the third apparatus sends the ith second signal to the second apparatus. The ith first signal and the ith second signal occupy a same time-frequency resource. The second apparatus receives a superimposed signal yi on the time-frequency resource. This is similar to another first signal and second signal, and examples are not described one by one herein. For example, the decision threshold γ2=mean(abs(|y2i|2−|y2i-1|2), where 0<i≤N, i is an integer)*b. ∥y2i-1∥2 represents the first signal energy sum, and ∥{tilde over (y)}2i∥2 represents the second signal energy sum. For the first signal energy sum and the second signal energy sum, refer to the foregoing related descriptions. b indicates a control factor, is used to control the decision threshold, and affects a quantity of non-0 elements in the global parameter and updating of the first model.
It can be learned that the second apparatus may determine the ith global parameter based on the signal energy of the two sequences that correspond to the ith first signal and that are received by the second apparatus and the signal energy of the two sequences that correspond to the ith second signal and that are received by the second apparatus. In this way, the second apparatus is supported in implementing incoherent reception in multi-user over-the-air signal superimposition transmission and implementing robustness to a fading channel.
Optionally, the second apparatus may determine the global learning rate based on the first information and the fifth information.
For example, the N quantized parameters of the first model include the N weight gradients obtained by performing the Qth round of training on the first model and performing quantization by the first apparatus. Specifically, the N weight gradients of the first model are represented by the vector ΔwQ. In other words, the vector ΔwQ includes the N weight gradients obtained by performing the Qth round of training on the first model by the first apparatus. N quantized parameters of the second model include N weight gradients obtained by performing the Rth round of training on the second model and performing quantization by the second apparatus. Specifically, the N weight gradients of the second model are represented by the vector ΔwR. In other words, the vector ΔwR includes the N weight gradients obtained by performing a Qth round of training on the second model by the second apparatus. Therefore, the second apparatus may determine the global learning rate η=mean(mean(abs(Δwq)), mean(abs(Δwr))). The vector Δwq includes a quantized non-0 weight parameter in the vector ΔwQ. The vector Δwr includes a quantized non-0 weight gradient in the vector ΔwR.
It should be noted that, optionally, the first apparatus may send the sixth information to the second apparatus. The sixth information indicates the absolute value of the value of the quantized non-0 parameter in the N parameters of the first model. The third apparatus sends seventh information to the second apparatus. The seventh information indicates an average value of absolute values of values of quantized non-0 parameters in the N parameters of the second model. The second apparatus determines the global learning rate based on the sixth information and the seventh information.
For example, the N quantized parameters of the first model include the N weight gradients obtained by performing the Qth round of training on the first model and performing quantization by the first apparatus. Specifically, the N weight gradients of the first model are represented by the vector ΔwQ. In other words, the vector ΔwQ includes the N weight gradients obtained by performing the Qth round of training on the first model by the first apparatus. The N quantized parameters of the second model include the N weight gradients obtained by performing the Rth round of training on the second model and performing quantization by the second apparatus. Specifically, the N weight gradients of the second model are represented by the vector ΔwR. In other words, the vector ΔwR includes the N weight gradients obtained by performing the Qth round of training on the second model by the second apparatus. The first apparatus indicates, to the second apparatus based on the sixth information, an average value mean(abs(Δwq)) of absolute values of values of quantized non-0 weight gradients in the vector ΔwQ. The third apparatus indicates, to the second apparatus based on the seventh information, an average value mean(abs(Δwr)) of absolute values of values of quantized non-0 weight gradients in the vector ΔwR. Therefore, the second apparatus may determine the global learning rate η=mean(mean(abs(Δwq)), mean(abs(Δwr))).
205: The second apparatus sends fourth information to the first apparatus. The fourth information indicates the global information of the first model determined by the second apparatus. Correspondingly, the first apparatus receives the fourth information from the second apparatus.
The fourth information includes the global information of the first model determined by the second apparatus. Alternatively, the fourth information indicates the global information of the first model determined by the second apparatus. For example, the second apparatus encodes or modulates the global information of the first model, to obtain the fourth information, and indicates the global information of the first model to the first apparatus based on the fourth information. For the global information of the first model, refer to the foregoing related descriptions.
For example, the fourth information includes the N global weight gradients of the first model that are determined by the second apparatus. The N global weight gradients are represented by a vector A. Therefore, the first apparatus may update the weight parameter of the first model to wQ=wQ-1+η*A. wQ-1 indicates a global weight parameter of the first model obtained by performing a (Q−1)th round of updating on the first model by the first apparatus. wQ indicates a global weight parameter of the first model obtained by performing a Qth round of updating on the first model by the first apparatus. η indicates the global learning rate.
For example, the fourth information includes N global output parameters of the first model that are determined by the second apparatus. The first apparatus may perform the (Q+1)th round of training on the first model, to obtain the N actual output parameters of the first model. The first apparatus trains the first model based on the N actual output parameters and the N global output parameters, to obtain the weight parameter of the first model.
Optionally, the embodiment shown in
201
d: The second apparatus sends first indication information to the first apparatus. The first indication information indicates a quantity L of sending times that the first apparatus sends the first information to the second apparatus. Correspondingly, the first apparatus receives the first indication information from the second apparatus. L is an integer greater than or equal to 1.
Based on step 201d above, optionally, step 203 above specifically includes: The first apparatus sends the first information to the second apparatus L times. Correspondingly, the second apparatus receives the first information from the first apparatus L times.
In this implementation, the second apparatus may indicate the first apparatus to repeatedly send the first information to the second apparatus a plurality of times. It can be learned from the related descriptions of step 204 above that an energy-based gradient decision of the second apparatus may be incorrect due to channel noise and randomness of incoherent superimposition of signals. Therefore, the first apparatus repeatedly sends the first information. This helps the second apparatus select a decision result with a largest quantity of occurrences as a best decision result after separately making decisions, thereby reducing a probability of a decision error, and improving model training performance.
For example, as shown in
It should be noted that, optionally, the quantity L of sending times may be set based on at least one factor of a model training phase, a quantity of users participating in model training, and a signal-to-noise ratio of a channel. For example, in a later phase of training, when the quantity of users participating in model training is small and the signal-to-noise ratio is low, the quantity of sending times may be large.
It should be noted that the foregoing describes the technical solutions of this application by using an example in which the second apparatus determines the global learning rate based on the first information. The second apparatus may determine the global learning rate based on the first information and/or the third information. This is not specifically limited in this application.
It should be noted that the embodiment shown in
This application further provides another embodiment. The embodiment is similar to the embodiment shown in
2004
a: A second apparatus sends first information to a fourth apparatus. Correspondingly, the fourth apparatus receives the first information from the second apparatus.
For the first information, refer to related descriptions of step 203 in the embodiment shown in
2004
b: The fourth apparatus determines global information of a first model based on the first information.
Step 2004b is similar to step 204 in the embodiment shown in
Optionally, this embodiment further includes step 2004d, and step 2004d may be performed before step 2004b.
2004
d: The second apparatus sends fifth information to the fourth apparatus. Correspondingly, the fourth apparatus receives the fifth information from the second apparatus.
For the fifth information, refer to related descriptions of step 203a in the embodiment shown in
It should be noted that there is no fixed execution sequence between step 2004a and step 2004d. Step 2004a may be performed before step 2004d, or step 2004d may be performed before step 2004a, or step 2004a and step 2004d may be simultaneously performed based on a situation.
2004
c: The fourth apparatus sends fourth information to the second apparatus, where the fourth information indicates the determined global information of a first model. Correspondingly, the second apparatus receives the fourth information from the fourth apparatus.
For the fourth information, refer to related descriptions of step 205 in the embodiment shown in
It should be noted that the first apparatus may be a first terminal device. The second apparatus may be a network device. The third apparatus may be a second terminal device. The fourth apparatus may be a server. The foregoing embodiment describes a process in which the server obtains the related information that is of the model of the terminal device and that is managed by the network device, and determines the global information of the first model with reference to the related information of the model. In actual application, the server may obtain related information that is of the model of the terminal device and that is respectively managed by a plurality of network devices, and determine the global information of the first model with reference to the related information of the model. This is not specifically limited in this application.
The following describes, with reference to
401: A first apparatus sends third indication information to a second apparatus. The third indication information indicates indexes of N parameters whose absolute values of corresponding values are largest and that are in K parameters obtained by performing one round of training on a first model by the first apparatus. Correspondingly, the second apparatus receives the third indication information from the first apparatus.
The K parameters of the first model are obtained by performing one round of training on the first model by the first apparatus. The first apparatus determines the N parameters whose absolute values of the corresponding values are largest and that are in the K parameters. Then, the first apparatus sends the third indication information to the second apparatus.
Optionally, the third indication information is a bit sequence, the bit sequence includes K bits, and the K bits one-to-one correspond to the K parameters of the first model. When a value of one bit in the bit sequence is 0, it indicates that the first apparatus does not indicate a parameter corresponding to the bit. When a value of one bit in the bit sequence is 1, it indicates that the first apparatus indicates a parameter corresponding to the bit. For a related example of the bit sequence, refer to related descriptions in
402: A third apparatus sends fourth indication information to the second apparatus. The fourth indication information indicates indexes of N parameters whose absolute values of corresponding values are largest and that are in K parameters of a second model of the third apparatus. Correspondingly, the second apparatus receives the fourth indication information from the third apparatus.
The K parameters of the second model are obtained by performing one round of training on the second model by the third apparatus. The third apparatus determines the N parameters whose absolute values of corresponding values are largest and that are in the K parameters of the second model. Then, the third apparatus sends the fourth indication information to the second apparatus.
Optionally, a form of the fourth indication information is similar to that of the third indication information. For details, refer to related descriptions of step 401 above. Details are not described herein again.
403: The second apparatus determines a common sparse mask based on the third indication information and the fourth indication information.
For the common sparse mask, refer to related descriptions in the embodiment shown in
It should be noted that
For example, as shown in
In another possible implementation, the related information of the first model includes the output parameter of the first model, and the information obtained by processing the related information of the first model includes an average value of absolute values of values of output parameters of the first model, or the related information of the first model includes the update parameter of the first model, and the information obtained by processing the related information of the first model includes an average value of absolute values of values of update parameters of the first model.
In another possible implementation, the transceiver module 601 is further configured to receive third information from the second apparatus, where the third information indicates global information of the first model.
In another possible implementation, the global information of the first model includes a global output parameter of the first model, or the global information of the first model includes a global update parameter and/or a global learning rate of the first model.
In another possible implementation, the related information of the first model includes N parameters of the first model, where N is an integer greater than or equal to 1. The processing module 602 is specifically configured to perform quantization on the N parameters based on the at least one quantization threshold, to obtain N quantized parameters.
The transceiver module 601 is specifically configured to modulate the N quantized parameters, to obtain N first signals, and send the N first signals to the second apparatus.
In another possible implementation, the at least one quantization threshold includes a first quantization threshold and a second quantization threshold. The processing module 602 is specifically configured to, if an ith parameter in the N parameters is greater than the first quantization threshold, quantize the ith parameter to a first value, where i is an integer greater than or equal to 1 and less than or equal to N, or if an ith parameter in the N parameters is less than or equal to the first quantization threshold and is greater than or equal to the second quantization threshold, quantize the ith parameter to a second value, or if an ith parameter in the N parameters is less than the second quantization threshold, quantize the ith parameter to a third value.
In another possible implementation, the transceiver module 601 is specifically configured to modulate an ith quantized parameter, to obtain an ith first signal, where the ith first signal corresponds to two sequences.
When the ith quantized parameter is the first value, a transmit power at which the first apparatus 600 sends a 1st sequence in the two sequences is less than a transmit power at which the first apparatus 600 sends a 2nd sequence in the two sequences, when the ith quantized parameter is the second value, a transmit power at which the first apparatus 600 sends a 1st sequence in the two sequences is equal to a transmit power at which the first apparatus 600 sends a 2nd sequence in the two sequences, or when the ith quantized parameter is the third value, a transmit power at which the first apparatus 600 sends a 1st sequence in the two sequences is greater than a transmit power at which the first apparatus 600 sends a 2nd sequence in the two sequences.
In another possible implementation, when the ith quantized parameter is the first value, the 1st sequence in the two sequences is a non-all-0 sequence, and the 2nd sequence is an all-0 sequence, when the ith quantized parameter is the second value, the two sequences are both all-0 sequences; or when the ith quantized parameter is the third value, the 1st sequence in the two sequences is an all-0 sequence, and the 2nd sequence is a non-all-0 sequence.
In another possible implementation, the transceiver module 601 is specifically configured to send the first information to the second apparatus L times, where L is an integer greater than or equal to 1.
In another possible implementation, the transceiver module 601 is further configured to receive first indication information from the second apparatus, where the first indication information indicates the quantity L of sending times that the first apparatus 600 sends the first information to the second apparatus.
In another possible implementation, the related information of the first model includes N parameters of the first model that are obtained through quantization error compensation, the N parameters obtained through quantization error compensation are obtained by performing error compensation for N parameters by the first apparatus 600 based on quantization errors respectively corresponding to the N parameters obtained by performing the Qth round of training on the first model by the first apparatus 600, and a quantization error corresponding to the ith parameter in the N parameters is determined based on an ith parameter obtained by performing a (Q−1)th round of training on the first model by the first apparatus 600 and an ith parameter obtained through quantization error compensation and the (Q−1)th round of training, where i is an integer greater than or equal to 1 and less than or equal to N, N is an integer greater than or equal to 1, and Q is an integer greater than 1.
In another possible implementation, the related information of the first model includes N parameters of the first model that are obtained through sparsification, the N parameters of the first model that are obtained through sparsification are N parameters selected by the first apparatus 600 from K parameters of the first model based on a common sparse mask, and the K parameters of the first model are parameters obtained by performing one round of training on the first model by the first apparatus 600, where K is an integer greater than or equal to N, K is an integer greater than or equal to 1, and N is an integer greater than or equal to 1.
In another possible implementation, the common sparse mask is a bit sequence, the bit sequence includes K bits, and the K bits one-to-one correspond to the K parameters. When a value of one bit in the K bits is 0, it indicates the first apparatus 600 not to select a parameter corresponding to the bit, or when a value of one bit in the K bits is 1, it indicates the first apparatus 600 to select a parameter corresponding to the bit.
In another possible implementation, the common sparse mask is determined by the first apparatus 600 based on a sparsity ratio and a pseudo-random number, and the sparsity ratio is indicated by the second apparatus to the first apparatus 600.
In another possible implementation, the transceiver module 601 is further configured to receive second indication information from the second apparatus, where the second indication information indicates the common sparse mask.
In another possible implementation, the transceiver module 601 is further configured to send third indication information to the second apparatus, where the third indication information indicates indexes of the N parameters whose absolute values of corresponding values are largest and that are in the K parameters.
In another possible implementation, the first model is a neural network model, and the related information of the first model includes related parameters of neurons at P layers of the neural network model, where P is an integer greater than or equal to 1.
The first apparatus 700 includes a transceiver module 701. Optionally, the first apparatus 700 further includes a processing module 702.
The transceiver module 701 is configured to send first indication information to a second apparatus, where the first indication information indicates indexes of N parameters whose absolute values of corresponding values are largest and that are in K parameters of a first model of the first apparatus 700, and the K parameters of the first model are K parameters obtained by performing one round of training on the first model by the first apparatus 700, where K is an integer greater than or equal to N, K is an integer greater than or equal to 1, and N is an integer greater than or equal to 1, and receive second indication information from the second apparatus, where the second indication information indicates a common sparse mask, the common sparse mask is determined by the second apparatus based on the first indication information, and the common sparse mask indicates the first apparatus 700 to report some parameters obtained by training the first model by the first apparatus.
The following describes a second apparatus provided in embodiments of this application.
The second apparatus 800 includes a transceiver module 801. Optionally, the second apparatus 800 further includes a processing module 802.
The transceiver module 801 is configured to send at least one quantization threshold to a first apparatus, where the at least one quantization threshold is used to perform quantization on related information of a first model of the first apparatus, and receive first information sent from the first apparatus, where the first information indicates quantized related information of the first model.
In a possible implementation, the related information of the first model includes an output parameter or an update parameter of the first model, and the update parameter includes a weight gradient or a weight parameter of the first model.
In another possible implementation, the transceiver module 801 is further configured to: receive second information from the first apparatus, where the second information indicates information obtained by processing the related information of the first model, or the second information indicates information obtained by performing an Mth round of training on the first model and performing processing by the first apparatus, and the related information of the first model is related information obtained by performing a Qth round of training on the first model by the first apparatus, where M is an integer greater than or equal to 1 and less than Q, and Q is an integer greater than 1.
The processing module 802 is configured to determine the at least one quantization threshold based on the second information.
In another possible implementation, the related information of the first model includes the output parameter of the first model, and the information obtained by processing the related information of the first model includes an average value of absolute values of values of output parameters of the first model, or the related information of the first model includes the update parameter of the first model, and the information obtained by processing the related information of the first model includes an average value of absolute values of values of update parameters of the first model.
In another possible implementation, the transceiver module 801 is further configured to receive third information from a third apparatus, where the third information indicates information obtained by processing related information of a second model of the third apparatus, or the third information indicates information obtained by performing an Sth round of training on the second model and performing processing by the third apparatus, and the related information of the second model is related information obtained by performing an Rth round of training on the second model by the third apparatus, where S is an integer greater than or equal to 1 and less than R, and R is an integer greater than 1.
The processing module 802 is configured to determine the at least one quantization threshold based on the second information and the third information.
In another possible implementation, the processing module 802 is further configured to determine global information of the first model based on the first information.
The transceiver module 801 is further configured to: send fourth information to the first apparatus, where the fourth information indicates the global information of the first model.
In another possible implementation, the global information of the first model includes a global output parameter of the first model, or the global information of the first model includes a global update parameter and/or a global learning rate of the first model.
In another possible implementation, the transceiver module 801 is further configured to receive fifth information from the third apparatus, where the fifth information indicates the related information of the second model of the third apparatus.
The processing module 802 is specifically configured to determine the global information of the first model based on the first information and the fifth information.
In another possible implementation, the related information of the first model includes N parameters of the first model, where N is an integer greater than or equal to 1. The related information of the second model includes N parameters of the second model.
The transceiver module 801 is specifically configured to receive N first signals from the first apparatus, where the N first signals carry the N parameters of the first model, and the N first signals one-to-one correspond to the N parameters of the first model.
The transceiver module 801 is specifically configured to receive N second signals from the third apparatus, where the N second signals carry N parameters of the second model, and the N second signals one-to-one correspond to the N parameters of the second model.
The processing module 802 is specifically configured to determine the global information of the first model based on the N first signals and the N second signals.
In another possible implementation, an ith first signal in the N first signals corresponds to a first sequence and a second sequence, an ith second signal in the N second signals corresponds to a third sequence and a fourth sequence, a time-frequency resource used by the first apparatus to send the first sequence is the same as a time-frequency resource used by the third apparatus to send the third sequence, a time-frequency resource used by the first apparatus to send the second sequence is the same as a time-frequency resource used by the third apparatus to send the fourth sequence, and the global information of the first model includes N global parameters of the first model, where i is an integer greater than or equal to 1 and less than or equal to N. The processing module 802 is specifically configured to determine a first signal energy sum of the first sequence and the third sequence that are received by the second apparatus 800, determine a second signal energy sum of the second sequence and the fourth sequence that are received by the second apparatus 800, and determine an ith global parameter in the N global parameters based on the first signal energy sum and the second signal energy sum.
In another possible implementation, the processing module 802 is specifically configured to if the first signal energy sum is less than a sum of the second signal energy sum and a decision threshold, determine a value of the ith global parameter as a first value, or if the first signal energy sum is greater than or equal to a sum of the second signal energy sum and the decision threshold, and the second signal energy sum is less than or equal to a sum of the first signal energy sum and the decision threshold, determine a value of the ith global parameter as a second value, or if the second signal energy sum is greater than a sum of the first signal energy sum and the decision threshold, determine a value of the ith global parameter as a third value.
In another possible implementation, the transceiver module 801 is further configured to send first indication information to the first apparatus, where the first indication information indicates a quantity L of sending times that the first apparatus sends the first information to the second apparatus 800, where L is an integer greater than or equal to 1.
In another possible implementation, the transceiver module 801 is further configured to send second indication information to the first apparatus, where the second indication information indicates a common sparse mask, and the common sparse mask indicates the first apparatus to report some parameters obtained by training the first model by the first apparatus.
In another possible implementation, the transceiver module 801 is further configured to receive third indication information from the first apparatus, where the third indication information indicates indexes of N parameters whose absolute values of corresponding values are largest and that are in K parameters obtained by performing one round of training on the first model by the first apparatus, and receive fourth indication information from the third apparatus, where the fourth indication information indicates indexes of N parameters whose absolute values of corresponding values are largest and that are in K parameters of the second model of the third apparatus, and the K parameters of the second model are K parameters obtained by performing one round of training on the second model by the third apparatus.
The processing module 802 is further configured to determine the common sparse mask based on the third indication information and the fourth indication information.
The second apparatus 900 includes a transceiver module 901 and a processing module 902.
The transceiver module 901 is configured to receive first indication information from a first apparatus, where the first indication information indicates indexes of N parameters whose absolute values of corresponding values are largest and that are in K parameters of a first model of the first apparatus, and the K parameters of the first model are K parameters obtained by performing one round of training on the first model by the first apparatus, where K is an integer greater than or equal to N, K is an integer greater than or equal to 1, and N is an integer greater than or equal to 1.
The processing module 902 is configured to determine a common sparse mask based on the first indication information, where the common sparse mask indicates the first apparatus to report some parameters obtained by training the first model by the first apparatus.
The transceiver module 901 is further configured to send second indication information to the first apparatus, where the second indication information indicates the common sparse mask.
In a possible implementation, the transceiver module 901 is further configured to receive third indication information from a third apparatus, where the third indication information indicates indexes of N parameters whose absolute values of corresponding values are largest and that are in K parameters of a second model of the third apparatus, and the K parameters of the second model are K parameters obtained by performing one round of training on the second model by the third apparatus.
The processing module 902 is specifically configured to determine the common sparse mask based on the first indication information and the third indication information.
An embodiment of this application further provides a terminal device.
As shown in the figure, the terminal device 1000 includes a processor 1010 and a transceiver 1020. Optionally, the terminal device 1000 further includes a memory 1030. The processor 1010, the transceiver 1020, and the memory 1030 may communicate with each other through an internal connection path, to transfer a control signal and/or a data signal. The memory 1030 is configured to store a computer program. The processor 1010 is configured to invoke the computer program from the memory 1030 and run the computer program, to control the transceiver 1020 to send or receive a signal. Optionally, the terminal device 1000 may further include an antenna 1040, configured to send, via a radio signal, uplink data or uplink control signaling output by the transceiver 1020.
The processor 1010 and the memory 1030 may be integrated into one processing apparatus. The processor 1010 is configured to execute program code stored in the memory 1030, to implement the foregoing functions. During specific implementation, the memory 1030 may alternatively be integrated into the processor 1010, or may be independent of the processor 1010. For example, the processor 1010 may correspond to the processing module 602 in
The transceiver 1020 may correspond to the transceiver module 601 in
It should be understood that the terminal device 1000 shown in
The processor 1010 may be configured to perform an action implemented inside the first apparatus or the second apparatus described in the foregoing apparatus embodiments, and the transceiver 1020 may be configured to perform a receiving or sending action of the first apparatus or the second apparatus described in the foregoing apparatus embodiments. For details, refer to the descriptions in the foregoing apparatus embodiments. Details are not described herein again.
Optionally, the terminal device 1000 may further include a power supply 1050, configured to supply power to various devices or circuits in the terminal device.
In addition, to improve functions of the terminal device, the terminal device 1000 may further include one or more of an input unit 1060, a display unit 1070, an audio circuit 1080, a camera 1090, a sensor 1100, and the like, and the audio circuit may further include a speaker 1082, a microphone 1084, and the like.
This application further provides a network device.
For example, in a 5G communication system, the network device 1100 may include a CU, a DU, and an AAU. Compared with a network device in an LTE communication system that includes one or more radio frequency units, for example, a remote radio unit (RRU) and one or more baseband units (BBU), in the network device 1100, a non-real-time part of the original BBU is split off and redefined as the CU that is responsible for processing a non-real-time protocol and service, some physical layer processing functions of the BBU are combined with the original RRU and a passive antenna into the AAU, and remaining functions of the BBU are redefined as the DU that is responsible for processing a physical layer protocol and a real-time service. In short, the CU and the DU are distinguished based on real-time performance of processed content, and the AAU is a combination of the RRU and the antenna.
The CU, the DU, and the AAU may be deployed separately or together. Therefore, there may be a plurality of network deployment forms. A possible deployment form, as shown in
The AAU 11100 may implement a transceiver function, is referred to as a transceiver unit 11100, and corresponds to the transceiver module 601 in
The CU and DU 11200 may implement an internal processing function, are referred to as a processing unit 11200, and correspond to the processing module 602 in
In addition, the network device is not limited to the form shown in
In an example, the processing unit 11200 may include one or more boards. A plurality of boards may jointly support a radio access network (such as an LTE network) of a single access standard, or may separately support radio access networks (such as an LTE network, a 5G network, a future network, or another network) of different access standards. The CU and DU 11200 further include a memory 11201 and a processor 11202. The memory 11201 is configured to store necessary instructions and data. The processor 11202 is configured to control the network device to perform a necessary action, for example, configured to control the network device to perform an operation procedure related to the first apparatus or the second apparatus in the foregoing method embodiments. The memory 11201 and the processor 11202 may serve one or more boards. In other words, a memory and a processor may be disposed on each board. Alternatively, a plurality of boards may share a same memory and a same processor. In addition, a necessary circuit may further be disposed on each board.
It should be understood that the network device 1100 shown in
The CU and DU 11200 may be configured to perform an action implemented inside the first apparatus or the second apparatus described in the foregoing method embodiments, and the AAU 11100 may be configured to perform a receiving or sending action of the first apparatus or the second apparatus described in the foregoing method embodiments. For details, refer to the descriptions in the foregoing method embodiments. Details are not described herein again.
This application further provides a computer program product. The computer program product includes computer program code. When the computer program code is run on a computer, the computer is enabled to perform the method in either of the embodiments shown in
This application further provides a computer-readable medium. The computer-readable medium stores program code. When the program code is run on a computer, the computer is enabled to perform the method in either of the embodiments shown in
This application further provides a communication system. The communication system includes a first apparatus and a second apparatus. The first apparatus is configured to perform some or all steps performed by the first apparatus in the embodiments shown in
Optionally, the communication system further includes a third apparatus. The third apparatus is configured to perform some or all steps performed by the third apparatus in the embodiments shown in
An embodiment of this application further provides a chip apparatus, including a processor. The processor is configured to invoke a computer program or computer instructions stored in a memory, so that the processor performs the methods in the embodiments shown in
In a possible implementation, an input of the chip apparatus corresponds to a receiving operation in the embodiments shown in
Optionally, the processor is coupled to the memory through an interface.
Optionally, the chip apparatus further includes the memory, and the memory stores the computer program or the computer instructions.
The processor mentioned anywhere above may be a general-purpose central processing unit, a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits configured to control program execution of the methods in the embodiments shown in
It may be clearly understood by a person skilled in the art that, for convenience and brevity of description, for explanations and beneficial effect of related content in any one of the communication apparatuses provided above, refer to the corresponding method embodiments provided above. Details are not described herein again.
In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus, and method may be implemented in another manner. For example, the described apparatus embodiments are merely examples. For example, division into the units is merely logical function division. There may be another division manner during actual implementation. For example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented through some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in an electrical form, a mechanical form, or another form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located at one location, or may be distributed on a plurality of network units. Some or all of the units may be selected based on actual requirements to achieve the objectives of the solutions of embodiments.
In addition, functional units in embodiments of this application may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units may be integrated into one unit. The integrated unit may be implemented in a form of hardware, or may be implemented in a form of a software functional unit.
When the integrated unit is implemented in the form of the software functional unit and sold or used as an independent product, the integrated unit may be stored in a computer-readable storage medium. Based on such an understanding, the technical solutions of this application essentially, or the part contributing to the conventional technology, or all or some of the technical solutions may be implemented in a form of a software product. The computer software product is stored in a storage medium and includes several instructions for instructing a computer device (which may be a personal computer, a server, or a network device) to perform all or some of the steps of the methods described in embodiments of this application. The foregoing storage medium includes any medium that can store program code, such as a USB flash drive, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disc.
In conclusion, the foregoing embodiments are merely intended for describing the technical solutions of this application other than limiting this application. Although this application is described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that modifications may still be made to the technical solutions described in the foregoing embodiments or equivalent replacements may still be made to some technical features thereof, without departing from the scope of the technical solutions of embodiments of this application.
This application is a continuation of International Application No. PCT/CN2022/119814, filed on Sep. 20, 2022, the disclosure of which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2022/119814 | Sep 2022 | WO |
Child | 19080209 | US |