The present application claims the benefit of priority of U.S. Provisional Patent Application No. 63/341,045, titled Multi-Cell Non-Coherent Over-The-Air Computation for Federated Edge Learning, filed May 12, 2022, and which is fully incorporated herein by reference for all purposes.
Over-the-air computation (OAC) refers to the computation of mathematical functions by exploiting the superposition property of wireless multiple-access channel [1]. It has initially been considered in wireless sensor networks to reduce the latency due to a large number of nodes [2]-[4]. Recently, OAC has shown it is also a prominent solution to address the latency issue of federated edge learning (FEEL) [5] or distributed training problems in a wireless network [6]. Nevertheless, apart from a few works [7], FEEL with OAC is primarily investigated in a single cell in the uplink (UL), although the practical wireless networks often consist of multiple cells. In this disclosure, we address this issue and propose a framework for FEEL based on a non-coherent OAC scheme in both UL and downlink (DL) in a multi-cell environment.
One of the major challenges in the OAC is the detrimental impact of wireless channels on the coherent symbol superposition. To address this issue, a majority of the state-of-the-art solutions rely on pre-equalization techniques. For instance, broadband analog aggregation (BAA) over orthogonal frequency division multiplexing (OFDM) with truncated-channel inversion (TCI) is investigated to obtain unbiased estimates of the weights or gradients[8-9]. One-bit broadband digital aggregation (OBDA), inspired by distributed training by majority vote (MV) with the sign stochastic gradient descend (signSGD)[11], is proposed to facilitate the implementation of FEEL for a practical wireless system, which also uses TCI[10]. Alternatively, the conjugate of the channel can be utilized instead of TCI[12]. Further, it is assumed that the channel state information (CSI) for each edge device (ED) is available at the edge server (ES)[13-14]. The impact of the channel on OAC is mitigated through beamforming techniques.
The state-of-the-art OAC techniques are often suitable for a single cell where the OAC occurs in the UL due to the pre-equalization. In addition, pre-equalization techniques require sample-level precise time synchronization, which causes another shortcoming when multiple aggregation nodes exist in a wireless network. Prior art investigates for FEEL in a single cell scenario by non-coherent computation through frequency-shift keying (FSK)-based MV (FSK-MV) and pulse-position modulation (PPM)-based MV (PPM-MV)[15-16]. The main strategy in these aforementioned studies is to dedicate two resources where either of the two resources are activated based on the sign of the gradient. The MV at the ES is detected through an energy detector. Since the information is not encoded in the amplitude or the phase in this strategy, the need for CSI at the EDs and the ES are eliminated, and the precise time-synchronization requirement is relaxed. Because of these unique features, we consider non-coherent OAC in a multi-cell environment.
In this disclosure, we propose an OAC framework where OAC occurs in both UL and DL in a multi-cell environment with FSK-based MV. As opposed to a single-cell solution, multiple ESs first detect the MVs through the UL OAC. Afterward, each ED determines the sign of the gradient by aggregating the ESs' signals in the DL with another OAC. We show the convergence of the non-convex loss function problem for FEEL with the proposed scheme and evaluate the proposed framework numerically. We show the efficacy of the proposed framework by comparing it with a single-cell scenario for both homogeneous and heterogeneous data distributions.
The disclosure deals with a system and method for a framework where OAC occurs in both UL and DL, sequentially, in a multi-cell environment to address the latency and the scalability issues of FEEL. To eliminate the CSI at the EDs and ESs and relax the time-synchronization requirement for the OAC, we use a non-coherent computation scheme, i.e., FSK-based majority vote (MV) (FSK-MV). With the proposed framework, multiple ESs function as the aggregation nodes in the UL and each ES determines the MVs independently. After the ESs broadcast the detected MVs, the EDs determine the sign of the gradient through another OAC in the DL. Hence, intercell interference is exploited for the OAC. In this disclosure, we prove the convergence of the non-convex optimization problem for the FEEL with the proposed OAC framework. We also numerically evaluate the efficacy of the proposed method by comparing the test accuracy in both multi-cell and single-cell scenarios for both homogeneous and heterogeneous data distributions.
Regarding notations herein: E[⋅] is the expectation operation; I[⋅] is the indicator function; and the function sign (⋅) results in 1, −1, or ±1 at random for a positive, a negative, or a zero-valued argument, respectively
It is to be understood that the presently disclosed subject matter equally relates to apparatus and system subject matter as well as associated and/or corresponding methodologies. One exemplary such method relates to a non-coherent over-the-air computation methodology occurring in both uplink (UL) and downlink (DL), sequentially, in a multi-cell environment for federated edge learning (FEEL) without using channel state information (CSI) at a plurality of edge devices (EDs) or at edge servers (ESs). Such methodology preferably comprises providing a distributed machine-learning model to be trained with the update vectors received at a plurality of edge servers (ESs) as transmitted from a plurality of edge devices (EDs); and conducting methodology operations preferably comprising transmitting local updates vectors as weighted votes with respective of the plurality of edge servers (ESs) functioning as aggregation nodes in the UL via a wireless multi-cell environment, independently detecting orthogonal signaling based majority vote (MV) data at each ES in the UL, broadcasting the detected MVs from the ESs, and inputting the MVs into the machine-learning model to be updated, wherein the EDs determine the sign of the gradient through over-the-air computation using orthogonal signaling based majority vote (MV) in the DL.
In some embodiments of the foregoing methodology, such methodology may further include providing one or more processors; and providing one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors, cause the one or more processors to perform the methodology operations.
Other example aspects of the present disclosure are directed to systems, apparatus, tangible, non-transitory computer-readable media, user interfaces, memory devices, and electronic devices for ultrafast photovoltaic spectroscopy. To implement methodology and technology herewith, one or more processors may be provided, programmed to perform the steps and functions as called for by the presently disclosed subject matter, as will be understood by those of ordinary skill in the art.
Another exemplary embodiment of presently disclosed subject matter relates to a non-coherent over-the-air computation system for both uplink (UL) and downlink (DL) channels in a multi-cell environment, for federated edge learning (FEEL) without using channel state information (CSI) at a plurality of edge devices (EDs) or at edge servers (ESs). Such system preferably comprises a machine-learning model training to process data received at a plurality of edge servers (ESs) as transmitted from a plurality of edge devices (EDs); one or more processors; and one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors, cause the one or more processors to perform operations. Such operations preferably comprise transmitting local update vectors as weighted votes with respective of the plurality of edge servers (ESs) functioning as aggregation nodes in the UL channel via a wireless multi-cell environment, independently detecting orthogonal signaling based majority vote (MV) data at each ES in the UL channel, broadcasting the detected MVs from the ESs, and inputting the MVs into the machine-learning model to be updated, wherein the EDs determine the sign of the gradient through over-the-air computation using orthogonal signaling based majority vote (MV) in the DL channel.
Additional objects and advantages of the presently disclosed subject matter are set forth in, or will be apparent to, those of ordinary skill in the art from the detailed description herein. Also, it should be further appreciated that modifications and variations to the specifically illustrated, referred and discussed features, elements, and steps hereof may be practiced in various embodiments, uses, and practices of the presently disclosed subject matter without departing from the spirit and scope of the subject matter. Variations may include, but are not limited to, substitution of equivalent means, features, or steps for those illustrated, referenced, or discussed, and the functional, operational, or positional reversal of various parts, features, steps, or the like.
Still further, it is to be understood that different embodiments, as well as different presently preferred embodiments, of the presently disclosed subject matter may include various combinations or configurations of presently disclosed features, steps, or elements, or their equivalents (including combinations of features, parts, or steps or configurations thereof not expressly shown in the figures or stated in the detailed description of such figures). Additional embodiments of the presently disclosed subject matter, not necessarily expressed in the summarized section, may include and incorporate various combinations of aspects of features, components, or steps referenced in the summarized objects above, and/or other features, components, or steps as otherwise discussed in this application. Those of ordinary skill in the art will better appreciate the features and aspects of such embodiments, and others, upon review of the remainder of the specification, and will appreciate that the presently disclosed subject matter applies equally to corresponding methodologies as associated with practice of any of the present exemplary devices, and vice versa.
These and other features, aspects and advantages of various embodiments will become better understood with reference to the following description and appended claims. The accompanying figures, which are incorporated in and constitute a part of this specification, illustrate embodiments of the present disclosure and, together with the description, serve to explain the related principles.
A full and enabling disclosure of the present subject matter, including the best mode thereof to one of ordinary skill in the art, is set forth more particularly in the remainder of the specification, including reference to the accompanying figures in which:
Repeat use of reference characters in the present specification and figures is intended to represent the same or analogous features, elements, or steps of the presently disclosed subject matter.
Reference will now be made in detail to various embodiments of the disclosed subject matter, one or more examples of which are set forth below. Each embodiment is provided by way of explanation of the subject matter, not limitation thereof. In fact, it will be apparent to those skilled in the art that various modifications and variations may be made in the present disclosure without departing from the scope or spirit of the subject matter. For instance, features illustrated or described as part of one embodiment, may be used in another embodiment to yield a still further embodiment.
In general, the present disclosure is directed to an over-the-air computation (OAC) framework where OAC occurs in both uplink (UL) and downlink (DL) in a multi-cell environment with a non-coherent computation scheme based on orthogonal signaling, e.g., frequency-shift keying (FSK)-based majority vote (MV) (FSK-MV), which FSK is an example of orthogonal signaling that is used in the sequel The other examples of orthogonal signaling are pulse position modulation (PPM), chirp-shift keying, and on-off keying (OOK).
Consider a multi-cell wireless network with K EDs and S ESs. We assume that the frequency synchronization in the network is handled through a control mechanism. We consider time synchronization errors among the EDs (and the ESs) and the maximum difference between the time of arrivals of the signals at the desired receiver's location is Tsync seconds, where Tsync is equal to the reciprocal to the signal bandwidth. We assume that the signal-to-noise ratio (SNR) at an ES is 1/σES2 when an ED is located at the reference distance rUL. We then set the received signal power of the kth ED at the sth ES as PEDk,s=rk,s−α/rUL−α, where rk,s is the link distance between the kth ED and the sth ES, and a is the path loss exponent. Similarly, we define the DL SNR at an ED is 1/σED2 when the distance between an ED and an ES is equal to the reference distance rDL. We then set the received signal power of the sth ES at the kth ED as PESk,s=rk,s−α/rDL−α.
A. Signal Model in Uplink and Downlink
In this disclosure, the EDs in the UL and the ESs in the DL access the wireless channel on the same time-frequency resources simultaneously with N OFDM symbols consisting of M active subcarriers. We assume that the cyclic prefix (CP) duration is larger than Tsync and the maximum-excess delay of the channel. Considering independent frequency-selective channels between the EDs and the ESs, the superposed symbol on the mth subcarrier of the nth OFDM symbol at the sth ES for the tth communication round of FEEL can be written as
where hULt,s,k,m,n∈ is the channel coefficient between the sth ES and the kth ED, tEDt,k,m,n∈
is the transmitted symbol from the kth ED, and wESt,k,m,n is the symmetric additive white Gaussian noise (AWGN) with zero mean and the variance σES2 on the mth subcarrier for mϵ{0, 1, . . . , M−1} and nϵ{0, 1, . . . , N−1}.
Similarly, the received symbol on the mth subcarrier of the nth OFDM symbol at the kth ED for the tth communication round in the DL can be shown as
where hDLt,s,k,m,n∈ is the channel coefficient between the sth ES and the kth ED, tESt,s,m,n∈
is the transmitted symbol from the sth ES, ωEDt,k,m,n is the symmetric AWGN with zero mean and the variance σED2 on the mth subcarrier.
B. Problem Statement and Learning Model
Let wk(t)∈Q denote the model parameters at the kth ED for tth communication round. The local data set containing labeled data samples at the kth ED as {(
)}∈Dk, where
and
are the
th data sample and its associated label, respectively. In this disclosure, unlike to a classical FEEL problem, to capture the model test accuracy for each ED under heterogeneous data distribution, we define a personalized global loss function at the kth ED for a given wk(t) as
where k={(
,
)∈
|
} for g=
1∪
2∪ . . . ∪
k, and
k is the set of distinct labels in the dataset of the kth ED. f(wk(t), Xe, Ye) is the sample loss function that measures the labelling error for (Xe, Ye) for the parameters wk(t) at the kth ED.
The personalized federated learning (FL) problem can then be defined as
To solve (4), a full-batch gradient descend with the learning rate η is given by wk(t+1)=wk(t)−ηgk(t), and
where the ith element of gk(t) is gk,i(t), which is the gradient of Fk(wk(t)) with respect to wk,i(t).
In this disclosure, our main goal is to solve (4) in a wireless network consisting of multiple cells, where the data sharing among EDs is not allowed to promote data privacy. To this end, we consider FEEL and reduce the communication latency by adopting an OAC scheme, i.e., FSK-MV[15], which is originally proposed in the UL for a single cell (i.e., S=1). With this scheme, the kth ED first calculates the local stochastic gradient as
where {tilde over (g)}k(t) is the local gradient where its ith is {tilde over (g)}k,i(t) and k⊂
k is the selected data batch from the local data set with the batch size, nb=|
k|.
Each ED then obtains the transmit symbols in the UL as follows: Consider a mapping from i∈{1, . . . , q} to the distinct pairs (m+, n+) and (m−, n−) for m+, m−∈{0, 1, . . . , M−1} and n+, n−∈{0, 1, . . ., N−1}. Based on the value of sign({tilde over (g)}k,i(t)), the kth ED calculates the symbol tEDt,k,m+, n+and tEDt,k,m−, n−, as ∀i, as
t
ED
t,k,m+, n+=√{square root over (Es)}SEDt,k,i[
], (7)
and
t
ED
t,k,m−, n−=√{square root over (Es)}SEDt,k,i[
], (8)
respectively, where sEDt,k,i is a random quadrature phase-shift keying (QPSK) symbol and Es=2 is the symbol energy. Note that a long-term power constraint, used for OBDA [10, Eq. 9 and Eq. 10], is not needed for FSK-MV as the OFDM symbol energy does not change as a function of CSI with FSK-MV. The ES receives the superposed symbols for a given i, respectively, as follows:
The superposed symbols at the ES are then compared with an energy detector for the ith gradient to detect the MV as
v
ES
t,s,i =sign(ΔESt,s,i), ∀i ∈{1, . . . , q}, (9)
where ΔESt,s,i|rESt,s, m+, n+|2−|rESt,s, m−, n−|2,
Finally, the ES transmits the MVs, i.e., VESt,s=[vESt,s,l, . . . , vESt,s,Q]T, to the EDs and the model parameters at the kth ED are updated as
w
k
(t+1)
=w
k
(t)
−ηv
ED
t,k (10)
This procedure is repeated for T communication rounds.
One of the major advantages of FSK-MV over other state-of-the-art OAC schemes (e.g., OBDA) is that EDs and ESs do not need to utilize the CSI. Also, it does not require precise time-synchronization among the transmitters since the computation with FSK-MV is achieved through a non-coherent detection in the frequency domain.
Similarly, in the DL, an ED (e.g., a cell-edge ED) can receive signals from multiple ESs. Hence, the inter-cell interference in the DL can also be used for the MV calculation at the EDs as depicted in
A. Uplink OAC with FSK-MV
In the UL, the expressions given for the transmitted symbols from the EDs and the superposed symbols at the ES with FSK-MV for a single cell, discussed in Section II-B, also hold in a multi-cell environment for S>1. After the sth ES calculates the vector vESt,s ∀s, the DL OAC starts.
B. Downlink OAC with FSK-MV
t
ES
t,s,m+,n+=√{square root over (Es)}sESt,s,i[vESt,s,i=
], (11)
and
t
ES
t,s,m−,n−=√{square root over (Es)}sESt,s,i[vESt,s,i=−
], (12)
respectively, where sESt,s,i is a random QPSK symbol.
All ESs calculate the corresponding OFDM symbols and transmit them simultaneously for DL OAC.
The energy detector at the kth ED then detects the MV for the ith gradient as
v
ED
t,k,i=sign(ΔEDt,k,i), ∀i∈{1, . . . q}, (13)
where ΔEDt,k,i|rEDt,k,m+,n+|2−|rEDt,k,m−,n−|2.
Subsequently, the kth ED calculates the MV vector, i.e., vEDt,k=[vEDt,k,1, . . . , vEDt,k, Q]T and updates its parameters as in Eq. (10). Hence, the parameters at the EDs are updated based on the received signals from multiple ESs.
C. Convergence Analysis
For the convergence analysis, we consider several standard assumptions made in the literature[10], [11]:
Assumption 1 (Bounded loss function). Fk(Wk)≥F°, ∀Wk.
Assumption 2 (Smoothness). Let gk be the gradient of the personalized global loss function Fk (Wk) evaluated at wk. For all wk and w′k, the expression given by
holds for a non-negative constant vector L=[L1, . . . , LQ]T.
Assumption 3 (Variance bound). Assume that the estimated gradient is an unbiased estimate of the true gradient, [{tilde over (g)}k]=gk, ∀k, and the variance of each component of them is bounded as
|({tilde over (g)}k,i−gk,i)2|≤σi2/nb, ∀k,i, where σ=[σ1, . . . , σQ]T is a non-negative constant vector.
Assumption 4 (Unimodal, symmetric gradient noise). For any given wk, the elements of the vector gk, ∀k, has a unimodal distribution that is also symmetric around its mean.
We also assume that the number of EDs that are connected to an ES, and the number of ESs that are connected to an ED, are fixed and denoted as Kc≤K and Sc ≤S, respectively (i.e., fixed-connectivity assumption). This assumption is due to the largescale fading in wireless channels, e.g., an ES can receive the strong signals from the EDs located at its adjacent ESs, but the ones from far cells are likely to be attenuated due to the large link distance. Based on this assumption, let Ks be the set of all EDs that are connected to the sth ES and Sk be the set of all ESs that are connected to the kth ED, where |Ks|=Kc, ∀k, and |Sk|=Sc, ∀s. We set the received power PEDs,k=1 for k ∈Ks, ∀s, otherwise 0, and PESs,k=1 for s ∈Sk, ∀k, otherwise 0. This assumption does not hold for an irregular deployment. Nevertheless, it leads us to provide insight into multi-cell OAC with a tractable analysis since it results in |rESt,s,m+,n+|2 and |rESt,s,m−,n−| to be exponential random variables with the means μEs,i+=ESKS+σES2 and μES, i−=EsKs−σES2, respectively, where KS+ and Ks− are the cardinalities of the sets {k}and {vESt,s,i=−1|s∈
k}respectively. The distributions of ΔEDt,s,i and ΔEDt,k,i can then be calculated as ΔESt,s,i˜f(x, μES,i+, μES,i−) and ΔEDt,k,i˜f(y, μED,j+, μED,i−) respectively, where f(x, μ1, μ2) is xx e−x/μ1/(μ1 +μ2) for x>0, and otherwise it is e−x/μ2/(μ1+μ2)[15].
Theorem 1. For η=1/T and nb=T/γ, the convergence rate of multi-cell OAC with FSK-MV in Rayleigh fading channel is:
where γ is a positive integer, A and B are defined as
respectively.
Proof: The proof relies on the strategy used in prior art[11]. By using Assumption 2 and using Eq. (9), it can be shown that:
where Σk=1kΣi=1Q|gk,i(t)|(vEDt,k,i≠ĝk,i(t)) is the stochasticity-induced error.
Let ĝk, i(t) sign(gk,i(t)) denote the correct decision and assume that ĝk,i(t)=1. Also, let Y and Z be binomial random variables for the number of ESs and the number of EDs with the correct decision, i.e., Y˜(Sc, Py,i) and Z˜B(Kc, pz,i), where Py,i and Pz,i denote the success probabilities. The probability Pk,ierr
(vEDt,k,i≠ĝk,i(t)) and the success probability py,i can then be written as
respectively.
Based on the distributions of ΔESt,s,i and ΔEDt,k,i, we calculate the conditional probabilities in Eq. (15) and Eq. (16) as
respectively.
By using the definitions of μES,i+and μES,i−and substituting Eq. (18) into Eq. (16), we obtain
By substituting Eq. (17) into Eq. (15) and using Eq. (19), we obtain Pk,ierr as
Accordingly, an upper bound for the stochasticity-induced error can be obtained as follows:
where A and B are defined in Theorem 1.
By considering Assumption 1, an upper bound can then be obtained as follows:
Finally, by rearranging terms of the above equation and considering η=1/T and ηb=T/γ, Eq. (14) can be reached.
To numerically evaluate multi-cell OAC, we consider the learning task of handwritten-digit recognition over a hexagonal tessellation with 77 cells, i.e., S=77 ESs, where K=120 EDs are located at the cell edge and the distance between two adjacent ESs is 50 meters (see
For the local data at the EDs, we use the MNIST database that contains labeled handwritten-digit images size of 28×28 from digit 0 to digit 9. We consider both homogeneous data and heterogeneous data distribution in the cell. To prepare the data, we first choose |G|∈{5000, 30000} training images from the database, where each digit has the identical number of images. For the scenario with the homogeneous data distribution, each local dataset has approximately an equal number of distinct images for each digit. For the scenario with the heterogeneous data distribution, we assume that the distribution of the images depends on the locations of the EDs. To this end, we divide the area into 5 identical parallel areas, where the EDs located in the αth area have the data samples with the labels {α−1, α, 1+α, 2+α, 3+α, 4+α} for α∈{1, . . . , 5} (see
In
In
In
In
In this disclosure, we present a multi-cell OAC framework where the aggregations occur in both UL and DL across multiple cells through a non-coherent OAC scheme, i.e., FSK-MV. We also prove the convergence of FEEL under a fixed-connectivity assumption. Finally, we evaluate the test accuracy of the multi-cell OAC by comparing it with the one for a single-cell scenario for homogeneous and heterogeneous data distributions. Our numerical results show that the proposed approach is a promising solution to achieve a high-test accuracy at the EDs by exploiting the interference among multiple cells. In this disclosure, our analysis is based on regular tessellation. For an irregular deployment, the interference distributions in UL and DL need to be considered for the convergence analysis, which will be investigated in future work.
While certain embodiments of the disclosed subject matter have been described using specific terms, such description is for illustrative purposes only, and it is to be understood that changes and variations may be made without departing from the spirit or scope of the subject matter.
Number | Date | Country | |
---|---|---|---|
63341045 | May 2022 | US |