WIRELESS FEDERATED LEARNING FRAMEWORK AND RESOURCE OPTIMIZATION METHOD

Information

  • Patent Application
  • 20240297700
  • Publication Number
    20240297700
  • Date Filed
    November 06, 2023
    a year ago
  • Date Published
    September 05, 2024
    3 months ago
Abstract
A wireless federated learning (FL) framework and a resource optimization method are provided to resolve a problem that FL is not suitable for many hardware-constrained Internet of Things (IoT) devices with a small amount of computing resources. In the framework, users with sufficient computing resources upload locally trained model parameters to a base station, and users with limited computing resources only need to send training data to the base station. The base station performs data training and model aggregation to obtain a global model. In this way, the users with limited computing resources and the users with sufficient computing resources cooperatively train the global model. To improve a data transmission rate and reduce an aggregation error of FL, a non-convex optimization problem is constructed to jointly design user transmit power and a reception strategy of the base station, and solves the problem through a successive convex approximation (SCA) method.
Description
CROSS REFERENCE TO THE RELATED APPLICATIONS

This application is based upon and claims priority to Chinese Patent Application No. 202310182495.3, filed on Mar. 1, 2023, the entire contents of which are incorporated herein by reference.


TECHNICAL FIELD

The present disclosure relates to the field of wireless resource management and computer technologies. and in particular, to a wireless federated learning (FL) framework and a resource optimization method.


BACKGROUND

A large quantity of Internet of Things (IoT) devices such as low-end sensors, high-definition cameras, and advanced robots are deployed on a network edge such that the next-generation network can support various emerging applications, such as smart factories, autonomous driving, interactive games, and the metaverse. Success of these applications is based on full use of huge amounts of data.


Traditional centralized learning (CL) needs to acquire data samples from edge users. However, a large amount of data is distributed on geographically isolated terminals, which brings huge communication overheads to CL. In addition, considerable transmission delays affect real-time performance of applications. Furthermore, directly sending raw data poses a risk of privacy leakage for individual users.


To address the main challenge of CL, federated learning (FL) allows users to locally update models, thereby avoiding the transmission of raw data. In comparison with CL, a core idea of FL is to replace the data upload with model sharing. This not only plays an important role in protecting user privacy, but also greatly reduces the communication burden.


Success of FL relies on abundant computing resources on IoT terminals. However, this is impractical for many hardware-constrained IoT devices, such as low-end sensors with limited computing resources. Therefore, it is extremely challenging to directly implement existing (centralized or federated) machine learning frameworks in heterogencous IoT devices with different computing capabilities.


To address these challenges, it is necessary to explore a new machine learning framework to ease the computational requirements for these resource-constrained IoT devices. In addition, to improve the data transmission rate and reduce aggregation errors of FL, it is important to propose a new learning and optimization method to make full use of distributed data and computing resources of an FL server and all clients.


SUMMARY

To overcome defects of the prior art, the present disclosure provides a wireless FL framework and a resource optimization method.


To achieve the foregoing objective, the present disclosure provides the following technical solutions:


The present disclosure provides a wireless FL framework, including:

    • N CL users with limited computing resources, where the CL users send training data to a base station for CL to participate in FL;
    • K FL users with sufficient computing resources, where the FL, users obtain local models through local training data, and upload local model parameters as an aggregation model to the base station; and
    • the base station serving as an FL server and configured to compute a global model, where the base station performs CL on the training data accumulated by the CL users to obtain a CL model, and performs weighted summation on the CL model and the received aggregation model based on a data amount to obtain the global model.


Further, it is assumed that there are T FL cycles in total, which are represented as a set T={1, 2, . . . , T}. In the tth FL cycle, a local model update formula of the kth FL user is as follows:








w
k

(

t
+
1

)


=



w
~


(
t
)


-

η


g
k

(
t
)





,


g
k

(
t
)


=




F
k

(



w
~


(
t
)


;

D
k

(
t
)



)



,


k





A centralized update formula of the base station is as follows:









w
_


(

t
+
1

)


=



w
~


(
t
)


-

η



g
_


(
t
)





,








g
_


(
t
)


=




F

(



w
~


(
t
)


;

D

(
t
)



)






A global model aggregation formula is as follows:








w
~


(

t
+
1

)


=






"\[LeftBracketingBar]"


D

(
t
)




"\[RightBracketingBar]"





w
_


(

t
+
1

)



+






k
=

N
+
1






N
+
K







"\[LeftBracketingBar]"


D
k

(
t
)




"\[RightBracketingBar]"




w
k

(

t
+
1

)









"\[LeftBracketingBar]"


D

(
t
)




"\[RightBracketingBar]"


+






k
=

N
+
1






N
+
K






"\[LeftBracketingBar]"


D
k

(
t
)




"\[RightBracketingBar]"









where η is a learning rate of a stochastic gradient descent method, D(t)t=1tΣn=1NDn(t) is a training data set accumulated at the base station in t cycles, FR(⋅) and gk(t) are respectively a local loss function and a gradient of the kth FL user, F(⋅) and g(t) are respectively a loss function and a gradient of the CL at the base station, and {tilde over (w)}(t+1) is the global model aggregated at the base station.


The present disclosure further provides a resource optimization method for the foregoing wireless FL framework, including the following steps:

    • S1: initializing, by the base station, a training task and the global model, and sending the global model to all users:
    • S2: after receiving the global model, computing, by the FL user, a local loss function and a gradient based on the local training data, and updating the local model;
    • S3: using, by the FL users and the CL users, a same frequency band to respectively upload the local models and send the training data simultaneously; and detecting, by the base station, a received signal through imperfect successive interference cancellation (SIC), and separating the training data and the aggregation model;
    • S4: obtaining data transmission rates of all CL users and a mean square error (MSE) of the aggregation model of all FL users, and detecting signals of all CL users in an order of 1, 2, . . . , N through imperfect SIC;
    • S5: constructing a non-convex optimization problem based on the data transmission rates and the MSE of the aggregation model; and
    • S6: transforming and solving the non-convex optimization problem in S5 through a successive convex approximation (SCA) method, and outputting user transmit power and a reception strategy.


Further, the local loss function in S2 is an MSE loss function or a cross-entropy loss function.


Further, S3 specifically includes:

    • before the user uploads the local model or sends the training data, normalizing a local training data set {Dn} of the CL users into a communication symbol set {sn} and the local models {wk} of the FL users into a computation symbol set {sk}, where a superimposed signal received by the base station is as follows:






y
=









n
=
1




N




h
n




p
n




s
n






CL


users


+








k
=

N
+
1






N
+
K





h
k




p
k




s
k






FL


users


+



n
0



Noise








    • where hn(hk) is a channel coefficient from the nth (kth) user to the base station, pn(pk) is transmit power of the nth (kth) user and falls within an interval [0, Pmax], and n0˜CN(0,σ2) is additive noise in a channel;

    • adjusting the user transmit power such that SIC constraints of different users satisfy:













"\[LeftBracketingBar]"



h
1




p
1





"\[RightBracketingBar]"






"\[LeftBracketingBar]"



h
2




p
2





"\[RightBracketingBar]"








"\[LeftBracketingBar]"



h
N




p
N





"\[RightBracketingBar]"






"\[LeftBracketingBar]"



h
k




p
k





"\[RightBracketingBar]"



,


k







    • after receiving the superimposed signal y, the signals {sn} of all CL users are detected by the base station in the order of 1, 2, . . . , N , where after the signals of all CL users are detected, a residual signal is as follows:










y
^

=



ϖ






n
=
1

N



h
n




p
n




s
n




+




k
=

N
+
1



N
+
K




h
k




p
k




s
k



+

n
0








    • where ω is an interference coefficient of the residual signal, ω=0 corresponds to perfect SIC, and ω=1 corresponds to no SIC; and

    • obtaining, by the base station, an estimated value ŝ=aŷ/K of the aggregation model from the residual signal ŷ by using a receiving factor a, and decoding the signals {sn} and ŝ into the training data set of all CL users and the aggregation model through denormalization post-processing, to obtain an average FL model.





Further, in S4, a formula of the data transmission rate of the nth CL user is as follows:








R
n

=


log
2

(

1
+






"\[LeftBracketingBar]"


h
n



"\[RightBracketingBar]"


2



p
n




ϖ





i
=
1


n
-
1







"\[LeftBracketingBar]"


h
i



"\[RightBracketingBar]"


2



p
i




+




j
=

n
+
1



N
+
K







"\[LeftBracketingBar]"


h
j



"\[RightBracketingBar]"


2



p
j



+

σ
2




)


,


n







    • where ω is the interference coefficient of the residual signal, ω=0 corresponds to perfect SIC, ω=1 corresponds to no SIC, and a sum of the data transmission rates of all CL users is Rsumn=1NRn.





Further, in S4, a formula of the MSE of the aggregation model is as follows:






MSE
=


1

K
2




(


ϑ
CL

+






k
=

N
+
1






N
+
K







"\[LeftBracketingBar]"




ah
k




p
k



-
1



"\[RightBracketingBar]"


2


+





"\[LeftBracketingBar]"

a


"\[RightBracketingBar]"


2



σ
2



)








    • where










ϑ
CL

=

ϖ







n
=
1




N






"\[LeftBracketingBar]"



ah
n




p
n





"\[RightBracketingBar]"


2









    •  is interference of the signals of the CL users.





Further, in S5, the non-convex optimization problem is constructed as follows:








max

p
,
a




R
sum


-

λ

MSE









s
.
t
.


R
n




R
min


,


n








p
n

,


p
k



[

0
,

P
max


]


,


n

,
k







MSE


ò
0


,










"\[LeftBracketingBar]"



h
1




p
1





"\[RightBracketingBar]"






"\[LeftBracketingBar]"



h
2




p
2





"\[RightBracketingBar]"








"\[LeftBracketingBar]"



h
N




p
N





"\[RightBracketingBar]"






"\[LeftBracketingBar]"



h
k




p
k





"\[RightBracketingBar]"



,


k







    • where p=[p1, p2, . . . , pN, pn+1, . . . , PN+K] is the transmit power, Rmin is a minimum data transmission rate required by the CL user, Pmax is maximum transmit power of all users, custom-character is a maximum aggregation model error that the FL users can tolerate, and λ is a constant used to strike a balance between the sum rate Rsum and the MSE.





Further, a convex optimization problem obtained through transformation in S6 is as follows:








max

p
,
a
,
γ
,
b









n
=
1




N




log
2

(

1
+

γ
n


)



-

λ

MSE










s
.
t
.


h
1
2




p
1





h
2
2



p
2








h
N
2



p
N





h
k
2



p
k



,



k

K











log
2

(

1
+

γ
n


)



R
min


,



n

N








MSE



?
0

,








p
n

,


p
k



[

0
,

P
max


]


,


n

,
k









h
n
2



p
n





ϖ







i
=
1





n
-
1





h
i
2

(




τ
in

2



p
i
2


+


1

2


τ
in





γ
n
2



)



+






j
=

n
+
1






N
+
K





h
j
2

(




τ
jn

2



p
j
2


+


1

2


τ
jn





γ
n
2



)


+


σ
2



γ
n




,










(

a

(

)


)

2

+

2



a

(

)


(

a
-

a

(

)



)







b
i
2

/

p
i



,


i







    • where γ=[γ1, γ2, . . . , γN] and b=[b1, b2, . . . , bN, bN+1, bN+2, . . . , bN+K] are introduced auxiliary vectors, γn=2Rn−1,∀n, bi=a√{square root over (pi)},∀i∈N∪K, and τin and τjn are convex upper bound (CUB) coefficients.





Further, solving the convex optimization problem in S6 includes:

    • S6.1: initializing p(0), a(0), γ(0), b(0), ω, a maximum quantity L of iterations, and a threshold ε, and setting an iteration index custom-character=0;
    • S6.2: computing an objective function value custom-character=custom-character−λMScustom-character;
    • S6.3: given custom-character, custom-character, custom-character, and custom-character, updating custom-character and custom-character by using custom-character=custom-character/custom-character and custom-character=custom-character/custom-character.
    • S6.4: given custom-character and custom-character, solving the convex optimization problem through a mathematical toolkit CVX to obtain custom-character, custom-character, custom-character, and custom-character;
    • S6.5: computing an objective function value custom-character=custom-character−λMScustom-character;
    • S6.6: updating custom-character=custom-character+1 and computing ΔU=|custom-charactercustom-character|; and
    • S6.7: repeating S6.3 to S6.6 until ΔU≤ε or custom-character≥and outputting user transmit power custom-character and a receiving factor custom-character.


Compared with the prior art, the present disclosure has the following beneficial effects:

    • (1) The wireless FL framework provided in the present disclosure can allow the users with limited computing resources and the users with sufficient computing resources to participate in FL together, and make full use of data of heterogeneous users to improve performance of a machine learning model.
    • (2) In the present disclosure, all users use the same frequency band to upload the models or send the data simultaneously such that spectrum resources can be effectively saved and communication delay can be reduced. The base station detects the signals through imperfect SIC. This is more in line with an actual situation
    • (3) In the resource optimization method provided in the present disclosure, the non-convex optimization problem is constructed based on the user data transmission rates and the MSE of the aggregation model to jointly design the user transmit power and the reception strategy of the base station. The problem is solved through the SCA method such that the sum rate can be effectively maximized and the MSE can be effectively minimized.





BRIEF DESCRIPTION OF THE DRAWINGS

To describe the technical solutions in embodiments of the present application or in the prior art more clearly; the following briefly describes the accompanying drawings required for describing the embodiments. Apparently, the accompanying drawings in the following description show merely some embodiments of the present disclosure, and persons of ordinary skill in the art may still derive other accompanying drawings from these accompanying drawings.



FIG. 1 is a structural diagram of a wireless FL framework and a transceiver according to an embodiment of the present disclosure; and



FIG. 2 is an algorithm flowchart of a resource optimization method for a wireless FL framework according to an embodiment of the present disclosure.





DETAILED DESCRIPTION OF THE EMBODIMENTS

Success of FL relies on abundant computing resources of local users. However, this is impractical for many hardware-constrained IoT devices, such as low-end sensors with small amounts of computing resources. To resolve the foregoing problem, the present disclosure proposes a wireless FL framework such that users with limited computing resources and users with sufficient computing resources cooperatively train a global model. In the framework, the users with sufficient computing resources upload locally trained model parameters to a base station, and the users with limited computing resources only need to send training data to the base station. The base station performs data training and model aggregation to obtain the global model. In addition, to improve a data transmission rate and reduce an aggregation error of FL, the present disclosure constructs a non-convex optimization problem to jointly design user transmit power and a reception strategy of the base station, and solves the problem through an SCA method.


To better understand the technical solutions, the foregoing describes in detail a method in the present disclosure with reference to the accompanying drawings.


Referring to FIG. 1, this embodiment includes the following steps:

    • S1: A wireless FL framework is proposed such that users with limited computing resources and users with sufficient computing resources cooperatively train a global model. In the framework, the users with sufficient computing resources upload trained model parameters to a base station, and the users with limited computing resources only need to send training data to the base station.


As shown in FIG. 1, in the framework, it is considered that there are N users with limited computing resources and K users with sufficient computing resources, and the base station serves as an FL server and is configured to compute the global model. The users with limited computing resources cannot meet a requirement of local data training due to limited computing power and can only send the training data to the base station for CL to participate in FL, which are also referred to as CL users and are represented as a set Ncustom-character{1, 2, . . . , N}. The users with sufficient computing resources can obtain local models from local training data and upload local model parameters to the base station without sending training data, which are also referred to as FL users and are represented as a set Kcustom-character{N+1, N+2, . . . , N+K}.

    • S2. CL is performed by the base station on accumulated data to obtain a CL model, and weighted summation is performed on the CL model and the received aggregation model based on a data amount to obtain the global model. It is assumed that there are T FL cycles in total, which are represented as a set T={1, 2 . . . , T}. In the tth cycle, the training data of the nth CL user is denoted as Dn(t), the local model of the kth FL user is denoted as wk(t), and a stochastic gradient descent method is used for training. In this case, local model update by the user, centralized update by the base station, and global model aggregation are expressed as follows:
    • Local model update: wk(t+1)={tilde over (w)}(t)−ηgk(t), ∇Fk({tilde over (w)}(t);Dk(t),∀k.
    • Centralized update by the base station: wk(t+1)={tilde over (w)}(t)−ηg(t), g(t)=∇F({tilde over (w)}(t);D(t)).
    • Global model aggregation:








w
~


(

t
+
1

)


=







"\[LeftBracketingBar]"


D

(
t
)




"\[RightBracketingBar]"





w
_


(

t
+
1

)



+






k
=

N
+
1






N
+
K







"\[LeftBracketingBar]"


D
k

(
t
)




"\[RightBracketingBar]"




w
k

(

t
+
1

)









"\[LeftBracketingBar]"


D

(
t
)




"\[RightBracketingBar]"


+






k
=

N
+
1






N
+
K






"\[LeftBracketingBar]"


D
k

(
t
)




"\[RightBracketingBar]"





.







    • η is a learning rate of the stochastic gradient descent method. D(t)t=1tΣn=1NDn(t) is a training data set accumulated at the base station in t cycles. Fk(⋅) and gk(t) are respectively a local loss function and a gradient of the kth FL user. F(⋅) and g(t) are respectively a loss function and a gradient of the CL at the base station. {tilde over (w)}k(t+1) is the global model aggregated at the base station.

    • S3: A training task and the global model are initialized by the base station, and the global model is sent to all users.

    • S4: The local models are updated by the users with sufficient computing resources based on the local training data After receiving the global model. the user with sufficient computing resources, namely, the FL user, computes a local loss function and a gradient based on the local training data. The loss function may be an MSE loss function or a cross-entropy loss function. This depends on the specific learning task. Then, the FL user updates the local model through the local model update formula in S2.

    • S5: A same frequency band is used by the users with sufficient computing resources and the users with limited resources to respectively upload the local models and send the training data simultaneously. A received signal is detected by the base station through imperfect SIC, and the training data and the aggregation model are separated.





Considering a contradiction between limited bandwidth and a large quantity of users in actual IoT, to save communication bandwidth and reduce delay, all users use the same timeslot and frequency band to upload the local models or send the training data in the present disclosure. Before the user uploads the local model or sends the training data, a local training data set {Dn} of the CL users is normalized into a communication symbol set {sn} and the local models {wk} of the FL users are normalized into a computation symbol set {sk}. A superimposed signal received by the base station is as follows:






y
=









n
=
1




N




h
n




p
n




s
n






CL


users


+








k
=

N
+
1






N
+
K





h
k




p
k




s
k






FL


users


+



n
0



Noise








    • hn(hk) is a channel coefficient from the nth (kth) user to the base station, pn(pk) is transmit power of the nth (kth) user and falls within an interval [0, Pmax], and n0˜CN(0,σ2) is additive noise in a channel. After receiving the superimposed signal, the base station detects the signal through imperfect SIC due to a decoding error, and separates the training data and the aggregation model. User transmit power is adjusted such that SIC constraints of different users satisfy:













"\[LeftBracketingBar]"



h
1




p
1





"\[RightBracketingBar]"






"\[LeftBracketingBar]"



h
2




p
2





"\[RightBracketingBar]"








"\[LeftBracketingBar]"



h
N




p
N





"\[RightBracketingBar]"






"\[LeftBracketingBar]"



h
k




p
k





"\[RightBracketingBar]"



,


k





After receiving the superimposed signal y, the base station detects the signals {sn} of all CL users in an order of 1, 2, . . . , N . After the signals of all CL users are detected, a residual signal is as follows:







y
^

=



ϖ






n
=
1

N



h
n




p
n




s
n




+




k
=

N
+
1



N
+
K




h
k




p
k




s
k



+

n
0









    • ω is an interference coefficient of the residual signal, ω=0 corresponds to perfect SIC, and ω=1 corresponds to no SIC. To obtain an average FL model, the base station obtains an estimated value ŝ=aŷ/K of the aggregation model from the residual signal ŷ by using a receiving factor a, and decode the signals {sn} and ŝ through post-processing such as denormalization into the training data set of all CL users and the aggregation model.

    • S6: Data transmission rates of all users with limited computing resources and an MSE of the aggregation model of all users with sufficient computing resources are obtained. The signals of all CL users are detected in the order of 1, 2, . . . , N through imperfect SIC based on the expression of the superimposed signal received by the base station in S5. The data transmission rate of the nth CL user is as follows:











R
n

=


log
2

(

1
+






"\[LeftBracketingBar]"


h
n



"\[RightBracketingBar]"


2



p
n




ϖ





i
=
1


n
-
1







"\[LeftBracketingBar]"


h
i



"\[RightBracketingBar]"


2



p
i




+




j
=

n
+
1



N
+
K







"\[LeftBracketingBar]"


h
j



"\[RightBracketingBar]"


2



p
j



+

σ
2




)


,


n








    • ω is an interference coefficient of the residual signal, ω=0 corresponds to perfect SIC, and ω=1 corresponds to no SIC. A sum (sum rate) of the data transmission rates of all CL users is Rsumn=1NRn.





It is assumed that an aggregation model obtained by the base station is ideally






s
=







k
=

N
+
1



N
+
K




s
k

/

K
.






The actual aggregation model obtained in S5 is ŝ=aŷ/K. An aggregation error is measured by the MSE. The MSE is expressed as follows:






MSE
=


1

K
2




(


ϑ
CL

+






k
=

N
+
1






N
+
K







"\[LeftBracketingBar]"




ah
k




p
k



-
1



"\[RightBracketingBar]"


2


+





"\[LeftBracketingBar]"

a


"\[RightBracketingBar]"


2



σ
2



)











ϑ
CL

=

ϖ







n
=
1




N






"\[LeftBracketingBar]"



ah
n




p
n





"\[RightBracketingBar]"


2









    • is the interference of the signals of the CL users.

    • S7: A non-convex optimization problem is constructed based on the data transmission rates and the MSE of the aggregation model. The communication-centric CL user hopes that the data transmission rate can be maximized, and the computing-centric FL user hopes that the MSE of the aggregation model can be minimized. To meet the requirements of both types of users, the present disclosure jointly designs user transmit power and a reception strategy of the base station. The optimization problem is constructed as follows:












max

p
,
a






R
sum

-

λ

MSE







s
.
t
.







R
n



R
min


,


n











p
n

,


p
k



[

0
,

P
max


]


,


n

,
k










MSE


ò
0


,













"\[LeftBracketingBar]"



h
1




p
1





"\[RightBracketingBar]"






"\[LeftBracketingBar]"



h
2




p
2





"\[RightBracketingBar]"









"\[LeftBracketingBar]"



h
N




p
N





"\[RightBracketingBar]"






"\[LeftBracketingBar]"



h
k




p
k





"\[RightBracketingBar]"



,


k










    • p=[p1, p2, . . . , pN, pn+1, . . . , PN+K] is the transmit power, Rmin is a minimum data transmission rate required by the CL user. Pmax is maximum transmit power of all users. custom-character is a maximum aggregation model error that the FL users can tolerate. λ is a constant used to strike a balance between the sum rate and the MSE. Because the optimization variables p,a are coupled to each other in the objective function and the constraints, the foregoing optimization problem is a non-convex optimization problem, and it is difficult to obtain an optimal solution.

    • S8: The non-convex optimization problem in S7 is transformed and solved through an SCA method. In the present disclosure, to solve the non-convex optimization problem constructed in S7, auxiliary variables γn=2Rn−1,∀n, and bi=a√{square root over (pi)},∀i∈N∪K are introduced. The data transmission rate Rn can be rewritten as Rn=log2(1+γn). The MSE can be rewritten as follows:









MSE
=


1

K
2




(


ϖ







n
=
1

N



h
n
2



b
n
2


+







k
=

N
+
1



N
+
K





(



h
k



b
k


-
1

)

2


+


a
2



σ
2



)






Therefore, the non-convex optimization problem in S7 is transformed into the following optimization problem:









max

p
,
a
,
γ
,
b












n
=
1

N




log


2



(

1
+

γ
n


)


-

λ

MSE







s
.
t
.







h
1
2



p
1





h
2
2



p
2








h
N
2



p
N





h
k
2



p
k



,




k

K















log


2



(

1
+

γ
n


)




R
min


,



n

N















h
n
2



p
n




ψ
n

+

σ
2





γ
n


,



n

N













a



p
i





b
i


,



i


N

K












MSE



?
0

,











p
n

,


p
k



[

0
,

P
max


]


,


n

,
k









    • γ=[γ1, γ2, . . . , γN] and b=[b1, b2, . . . , bN, bN+1, bN+2, . . . , bN+K] are introduced auxiliary vectors and are used as optimization variables.










ψ
n

=


ϖ







i
=
1


n
-
1




h
i
2



p
i


+







j
=

n
+
1



N
+
K




h
j
2



p
j









    •  is the interference of the signal of the nth CL user. In this case, Rn is a concave function with respect to {γn}, and MSE is a convex function with respect to a and {bi}. Therefore, the objective function of the foregoing optimization problem is a concave function with respect to a, {bi}, and {γi}. It can be learned that in the constraints of the foregoing optimization problem, except that













h
n
2



p
n




ψ
n

+

σ
2





γ
n


,






    •  ∀n∈N and a√{square root over (pi)}≥bi,∀i∈N∪K are non-convex, the other constraints are convex. The following transforms the two non-convex constraints into convex constraints through CUB and SCA methods.





To solve the non-convex constraint










h
n
2



p
n




ψ
n

+

σ
2





γ
n


,




∀n∈N, it is equivalently transformed into the following expression:








h
n
2



p
n





ϖ







i
=
1


n
-
1




h
i
2



p
i



γ
n


+







j
=

n
+
1



N
+
K




h
j
2



p
j



γ
n


+


σ
2



γ
n







It can be learned that the product terms piγn and pjγn in the foregoing expression are non-convex. To solve them, the present disclosure lets f(piγn)=piγn. A CUB of f(piγn) is







g

(


p
i

,

γ
n

,

τ
in


)

=




τ
in

2



p
i
2


+


1

2


τ
in






γ
n
2

.







τin is a CUB coefficient. It can be easily proved that when τinn/pi, f(piγn)≤g(pinin) can be taken as equal. Similarly, let f(pjn)=pjn. The CUB of f(pjn) is







g

(


p
j

,

γ
n

,

τ
jn


)

=




τ
jn

2



p
j
2


+


1

2


τ
jn






γ
n
2

.







After the CUB is replaced by using pin and pjn, the non-convex constraint










h
n
2



p
n




ψ
n

+

σ
2





γ
n


,



n

N






is transformed into the following convex constraint:








h
n
2



p
n





ϖ







i
=
1


n
-
1





h
i
2

(




τ
in

2



p
i
2


+


1

2


τ
in





γ
n
2



)


+







j
=

n
+
1



N
+
K





h
j
2

(




τ
jn

2



p
j
2


+


1

2


τ
jn





γ
n
2



)


+


σ
2



γ
n







In the foregoing convex constraint, the CUB coefficient τin is updated to custom-character=custom-character/custom-character, custom-character and custom-character are solutions of the optimization problem in a custom-characterth iteration. Similarly, the CUB coefficient τjn is updated to custom-character=custom-character/custom-character.


To solve the non-convex constraint a√{square root over (pi)}≥bi,∀i∈N∪K , when p1≠0, it is rewritten as follows:







a



b
i



p
i




,



i




a
2




b
i
2


p
i




,


i





Non-convexity of the foregoing expression originates from the term a2 on the left side of the inequality. The present disclosure uses a first-order Taylor expansion of a2 at a point custom-character for replacement such that the non-convex constraint a√{square root over (pi)}≥bi,∀i∈N∪K is transformed into the following convex constraint:










(

a

(

)


)

2

+

2



a

(

)


(

a
-

a

(

)



)






b
i
2

/

p
i



,


i





Based on the foregoing approximation, the non-convex optimization problem in S7 can be transformed into the following convex optimization problem:









max

p
,
a
,
γ
,
b












n
=
1

N




log


2



(

1
+

γ
n


)


-

λ

MSE







s
.
t
.







h
1
2



p
1





h
2
2



p
2








h
N
2



p
N





h
k
2



p
k



,




k

K















log


2



(

1
+

γ
n


)




R
min


,



n

N











MSE



?
0

,











p
n

,


p
k



[

0
,

P
max


]


,


n

,
k












h
n
2



p
n





ϖ







i
=
1


n
-
1




h
i
2



(




τ
in

2



p
i
2


+


1

2


τ
in





γ
n
2



)


+







j
=

n
+
1



N
+
K




h
j
2



(




τ
jn

2



p
j
2


+


1

2


τ
jn





γ
n
2



)


+


σ
2



γ
n




,













(

a

(

)


)

2

+

2


a

(

)




(

a
-

a

(

)



)






b
i
2

/

p
i



,


i








The foregoing convex optimization problem can be solved through a mathematical toolkit CVX to obtain an optimal solution.


As shown in FIG. 2, solving the non-convex optimization problem in S7 includes the following steps:

    • S8.1: p(0), a(0), γ(0), b(0), ω, a maximum quantity L of iterations, and a threshold ε are initialized, and an iteration index custom-character=0 is set.
    • S8.2: An objective function value custom-character=custom-character−AMScustom-character is computed.
    • S8.3: Given custom-character, custom-character, custom-character, and custom-character, custom-character and custom-character are updated by using custom-character=custom-character/custom-character and custom-character=custom-character/custom-character.
    • S8.4: Given custom-character and custom-character, the convex optimization problem is solved through the mathematical toolkit CVX to obtain custom-character, custom-character, custom-character, and custom-character.
    • S8.5: An objective function value custom-character=custom-charactercustom-character is computed.
    • S8.6: custom-character=custom-character+1 is updated and ΔU=|custom-charactercustom-character| is computed.
    • S8.7: S8.3 to S8.6 are repeated until ΔU≤ε or custom-character≥L and user transmit power custom-character and a receiving factor custom-character are output.


Herein, the user transmit power has an upper bound and the MSE is non-negative. Therefore, the objective function of the optimization problem has an upper bound, and convergence of the foregoing iterative algorithm can be ensured.

    • S9: The same frequency band and the transmit power solved in S8 are used by the users to upload the local models or send the training data simultaneously. User data is detected by the base station through imperfect SIC and decoding is performed by using the reception strategy solved in SS to obtain the aggregation model.


S10: CL is performed by the base station on the accumulated data to obtain the CL model, and weighted summation is performed on the CL model and the received aggregation model based on the data amount to obtain the global model.

    • S11: The global model is broadcast by the base station to all users, and S3 to S10 are repeated until a total cycle quantity requirement is met.


In summary, the wireless FL framework provided in the present disclosure can allow the users with limited computing resources and the users with sufficient computing resources to participate in FL together, and make full use of data of heterogeneous users to improve performance of a machine learning model. In the present disclosure, all users use the same frequency band to upload the models or send the data simultaneously such that spectrum resources can be effectively saved and communication delay can be reduced. The base station detects the signals through imperfect SIC. This is more in line with an actual situation. In addition, in the present disclosure, the non-convex optimization problem is constructed based on the user data transmission rates and the MSE of the aggregation model to jointly design the user transmit power and the reception strategy of the base station. The problem is solved through the SCA method such that the sum rate can be effectively maximized and the MSE can be effectively minimized.


The foregoing embodiments are used only to describe the technical solutions of the present disclosure, and are not intended to limit same. Although the present disclosure is described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still modify the technical solutions described in the foregoing embodiments, or make equivalent substitutions to some technical features therein. These modifications or substitutions do not make the essence of the corresponding technical solutions depart from the spirit and scope of the technical solutions of the embodiments of the present disclosure.

Claims
  • 1. A wireless federated learning (FL) framework, comprising: N centralized learning (CL) users with limited computing resources, wherein the CL users send training data to a base station for CL to participate in FL;K FL users with sufficient computing resources, wherein the FL users obtain local models through local training data, and upload local model parameters as an aggregation model to the base station; andthe base station serving as an FL server and configured to compute a global model, wherein the base station performs CL on the training data accumulated by the CL users to obtain a CL model, and performs weighted summation on the CL model and the aggregation model based on a data amount to obtain the global model.
  • 2. The wireless FL framework according to claim 1, wherein it is assumed that there are T FL cycles in total, which are represented as a set T={1, 2, . . . , T}; and in tth FL cycle, a local model update formula of a kth FL user is as follows:
  • 3. A resource optimization method for the wireless FL framework according to claim 1, comprising the following steps: S1: initializing, by the base station, a training task and the global model, and sending the global model to all users:S2: after receiving the global model, computing, by the FL user, a local loss function and a gradient based on the local training data, and updating the local model;S3: using, by the FL users and the CL users, a same frequency band to respectively upload the local models and send the training data simultaneously; and detecting, by the base station, a received signal through imperfect successive interference cancellation (SIC), and separating the training data and the aggregation model;S4: obtaining data transmission rates of all CL users and a mean square error (MSE) of the aggregation model of all FL users, and detecting signals of all CL users in an order of 1, 2, . . . , N through imperfect SIC;S5: constructing a non-convex optimization problem based on the data transmission rates and the MSE of the aggregation model; andS6: transforming and solving the non-convex optimization problem in S5 through a successive convex approximation (SCA) method, and outputting user transmit power and a reception strategy.
  • 4. The resource optimization method for the wireless FL framework according to claim 3, wherein the local loss function in S2 is an MSE loss function or a cross-entropy loss function.
  • 5. The resource optimization method for the wireless FL framework according to claim 3, wherein S3 comprises: before the user uploads the local model or sends the training data, normalizing a local training data set {Dn} of the CL users into a communication symbol set {sn} and the local models {wk} of the FL users into a computation symbol set {sk}, where a superimposed signal received by the base station is as follows:
  • 6. The resource optimization method for the wireless FL framework according to claim 5, wherein in S4, a formula of the data transmission rate of the nth CL user is as follows:
  • 7. The resource optimization method for the wireless FL framework according to claim 6, wherein in S4, a formula of the MSE of the aggregation model is as follows:
  • 8. The resource optimization method for the wireless FL framework according to claim 6, wherein in S5, the non-convex optimization problem is constructed as follows:
  • 9. The resource optimization method for the wireless FL framework according to claim 3, wherein a convex optimization problem obtained through transformation in S6 is as follows:
  • 10. The resource optimization method for the wireless FL framework according to claim 9, wherein solving the convex optimization problem in S6 comprises: S6.1: initializing p(0), a(0), γ(0), b(0), ω, a maximum quantity L of iterations, and a threshold ε, and setting an iteration index =0;S6.2: computing an objective function value =−;S6.3: given , , , and , updating and by using =/ and =/;S6.4: given and , solving the convex optimization problem through a mathematical toolkit CVX to obtain , , , and ;S6.5: computing an objective function value =−AMS;S6.6: updating =+1 and computing ΔU=|−; andS6.7: repeating S6.3 to S6.6 until ΔU≤ε or ≥L and outputting user transmit power and a receiving factor .
  • 11. A resource optimization method for the wireless FL framework according to claim 2, comprising the following steps: S1: initializing, by the base station, a training task and the global model, and sending the global model to all users;S2: after receiving the global model, computing, by the FL user, a local loss function and a gradient based on the local training data, and updating the local model;S3: using, by the FL users and the CL users, a same frequency band to respectively upload the local models and send the training data simultaneously; and detecting, by the base station, a received signal through imperfect successive interference cancellation (SIC), and separating the training data and the aggregation model;S4: obtaining data transmission rates of all CL users and a mean square error (MSE) of the aggregation model of all FL users, and detecting signals of all CL users in an order of 1, 2, . . . . N through imperfect SIC:S5: constructing a non-convex optimization problem based on the data transmission rates and the MSE of the aggregation model; andS6: transforming and solving the non-convex optimization problem in S5 through a successive convex approximation (SCA) method, and outputting user transmit power and a reception strategy.
  • 12. The resource optimization method for the wireless FL framework according to claim 11, wherein the local loss function in S2 is an MSE loss function or a cross-entropy loss function.
  • 13. The resource optimization method for the wireless FL framework according to claim 11, wherein S3 comprises: before the user uploads the local model or sends the training data, normalizing a local training data set {Dn} of the CL users into a communication symbol set {sn} and the local models {wk} of the FL users into a computation symbol set {sk}, where a superimposed signal received by the base station is as follows:
  • 14. The resource optimization method for the wireless FL framework according to claim 13, wherein in S4, a formula of the data transmission rate of the nth CL user is as follows:
  • 15. The resource optimization method for the wireless FL framework according to claim 14, wherein in S4, a formula of the MSE of the aggregation model is as follows:
  • 16. The resource optimization method for the wireless FL framework according to claim 14, wherein in S5, the non-convex optimization problem is constructed as follows:
  • 17. The resource optimization method for the wireless FL framework according to claim 11, wherein a convex optimization problem obtained through transformation in S6 is as follows:
  • 18. The resource optimization method for the wireless FL framework according to claim 17, wherein solving the convex optimization problem in S6 comprises: S6.1: initializing p(0), a(0), γ(0), b(0), ω, a maximum quantity L of iterations, and a threshold ε, and setting an iteration index =0;S6.2: computing an objective function value =−;S6.3: given , , , and , updating and by using =/ and =/;S6.4: given and , solving the convex optimization problem through a mathematical toolkit CVX to obtain , , , and ;S6.5: computing an objective function value =−AMS;S6.6: updating =+1 and computing ΔU=|−; andS6.7: repeating S6.3 to S6.6 until ΔU≤ε or ≥L and outputting user transmit power and a receiving factor .
Priority Claims (1)
Number Date Country Kind
202310182495.3 Mar 2023 CN national