Over-the-air computation (OAC) leverages the signal superposition property of wireless multiple-access channels to compute a nomographic function [1]. It has recently gained major attention to reduce the per-round communication latency that linearly increases with the number of edge devices (EDs) for federated edge learning (FEEL), i.e., an implementation of federated learning (FL) in a wireless network [2], [3]. Despite its merit, an OAC scheme may require the EDs to start their transmissions synchronously with high accuracy [4], which can impose stringent requirements for the underlying mechanisms. In a practical network, time synchronization can be maintained via an external timing reference such as the Global Positioning System (GPS) (see [5] and the references therein), a triggering mechanism as in IEEE 802.11 [6], or well designed synchronization procedures over random-access and control channels as in cellular networks [7].
However, while using a GPS-based solution can be costly and not suitable for indoor applications, the implementations of a trigger-based synchronization or some synchronization protocols may not be self-sufficient. This is because an entire baseband besides the synchronization blocks may need to be implemented as a hard-coded block to satisfy the timing constraints. On the other hand, when a software-defined radio (SDR) is used as an I/O peripheral connected to a companion computer (CC) for flexible baseband processing, the transmission/reception instants are subject to a large jitter due to the underlying protocols (e.g., USB, TCP/IP) for the communication between the CC and the SDR. Hence, it is not trivial to use SDRs to test an OAC scheme in practice.
In the state-of-the-art, proof-of-concept OAC demonstrations are particularly in the area of wireless sensor networks. For example, in [8], a statistical OAC is implemented with twenty-one RFID tags to compute the percentages of the activated classes that encode various temperature ranges. A trigger signal is used to achieve time synchronization across the RFIDs. In [9], Goldenbaum and Stańczak's scheme is implemented with three SDRs emulating eleven sensor nodes and a fusion center. The arithmetic and geometric mean of the sensor readings are computed over a 5 MHz signal. The time synchronization across the sensor nodes is maintained based on a trigger signal and the disclosed method is implemented in a field-programmable gate array (FPGA). A calibration procedure is also discussed to ensure amplitude alignment at the fusion center. In [11], the summation is evaluated with a testbed that involves three SDRs as transmitters and an SDR as a receiver. The scheme used in this setup is based on channel inversion. However, the details related to the synchronization are not provided.
Over-the-air computation (OAC) reduces the communication latency that linearly increases with the number of devices in a wireless network for machine learning applications. Despite its merit, an OAC scheme may require the radios to start their transmissions synchronously with high accuracy, which can impose stringent requirements for the underlying mechanisms. On the other hand, when SDRs are used as radios for this application, the synchronization is hard to maintain.
In a practical network, time synchronization can be maintained via an external timing reference such as the Global Positioning System (GPS), a triggering mechanism as in IEEE 802.11, or well-designed synchronization procedures over random-access and control channels as in cellular networks. However, while using a GPS-based solution can be costly and unsuitable for indoor applications, the implementations of a trigger-based synchronization or some synchronization protocols may not be self-sufficient because an entire baseband besides the synchronization blocks may need to be implemented as a hard-coded block to satisfy the timing constraints. On the other hand, when an SDR is used as an I/O peripheral connected to a CC for flexible baseband processing, the transmission/reception instants are subject to a large jitter due to the underlying protocols (e.g., USB, TCP/IP) for the communication between the CC and the SDR. Hence, it is not trivial to use SDRs to test an OAC scheme in practice. Also, the procedures for OAC are needed and it is not actually clear how it will work in a practical network.
Generally speaking, there is no widely known OAC scheme which has been demonstrated in practice for FEEL. The presently disclosed subject matter addresses such challenges, and addresses this gap and introduce a synchronization method suitable for SDRs.
Aspects and advantages of the presently disclosed subject matter will be set forth in part in the following description, or may be apparent from the description, or may be learned through practice of the presently disclosed subject matter.
Broadly speaking, the presently disclosed subject matter relates to methods and procedures for synchronization and over-the-air computation.
Another presently disclosed broader object is to provide general-purpose synchronization methodology.
Yet another present goal is to provide methodologies and procedures which enable an SDR-based network to realize over-the-air computation (OAC) for machine learning applications in a reliable way.
More particularly, it is a present object to provide general-purpose synchronization methodology which allows a set of software-defined radios (SDRs) to transmit or receive any in-phase/quadrature (IQ) data with precise timings while maintaining the baseband processing in the corresponding companion computers (CCs).
Further, for at least some embodiments, presently disclosed methodology relies on the detection of a synchronization waveform in both receive and transmit directions. For some such embodiments, the direct memory access (DMA) blocks are controlled jointly with the processing system.
Still further, by implementing presently disclosed synchronization methodology for some embodiments on a set of low-cost SDRs, the performance of frequency-shift keying (FSK)-based majority vote (MV) (FSK-MV) is demonstrated. Stated another way, the present disclosure relates to an over-the-air computation (OAC) scheme for federated edge learning (FEEL), and corresponding procedures. Demonstration shows that test accuracy can reach more than 95% for homogeneous and heterogeneous data distributions without using channel state information at the edge devices (EDs).
Stated yet another way, the presently disclosed general-purpose synchronization methodology for some embodiments allows a set of software-defined radios (SDRs) to transmit or receive any in-phase/quadrature (IQ) data with precise timings while maintaining the baseband processing in the corresponding companion computers (CCs). The disclosed method for some such embodiments relies on the detection of a synchronization waveform in both receive and transmit directions and on controlling the direct memory access (DMA) blocks jointly with the processing system. By implementing this synchronization method on a set of low-cost SDRs, we demonstrate the performance of frequency-shift keying (FSK)-based majority vote (MV) (FSK-MV), i.e., an over-the-air computation (OAC) scheme for federated edge learning (FEEL) and introduce the corresponding procedures.
Still further, the presently disclosed synchronization method for some embodiments enables low-cost SDR to be time-synchronous without using GPS or some additional circuitry. The presently disclosed procedures in some such embodiments enable OAC in practice by describing the alignment, calibration, and computation signals.
One presently disclosed exemplary methodology preferably relates to an over-the-air computation (OAC) methodology for federated edge learning (FEEL) without using channel state information (CSI) at a plurality of edge devices (EDs) or at an edge server (ES). Such methodology preferably may comprise a distributed machine-learning model to be trained with the update vectors received at an edge server (ES) as transmitted from a plurality of edge devices (EDs); one or more processors; and one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors, cause the one or more processors to perform operations. Such operations preferably may comprise transmitting local update vectors as votes from each respective of the plurality of edge devices (EDs) via a wireless multiple access channel, receiving the superposed local updates at the ES, determining the majority vote (MV) for each element of the update vector at the ES with an energy detector, and inputting the MVs into the machine-learning model to be updated. Further, the plurality of EDs and the ES preferably each respectively comprise a software-defined radio (SDR) using a general purpose synchronization method between the ES and each respective ED which relies on the detection of a synchronization waveform in both receive and transmit directions.
It is to be understood from the complete disclosure herewith that the presently disclosed subject matter equally relates to both methodology and corresponding and/or related apparatus.
One presently disclosed exemplary embodiment relates to an over-the-air computation (OAC) system for federated edge learning (FEEL) without using channel state information (CSI) at a plurality of edge devices (EDs) or at an edge server (ES). Such system preferably comprises a machine-learning model training to process data received at an edge server (ES) as transmitted from a plurality of edge devices (EDs); one or more processors; and one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors, cause the one or more processors to perform operations. Such operations preferably comprise transmitting local update vectors as votes from each respective of the plurality of edge devices (EDs) via a wireless multiple access channel, receiving the superposed local updates at the ES, determining the majority vote (MV) for each element of the update vector at the ES with an energy detector, and inputting the MVs into the machine-learning model to be updated. The plurality of EDs and the ES preferably each respectively comprise a software-defined radio (SDR) using a general purpose synchronization method between the ES and each respective ED which relies on the detection of a synchronization waveform in both receive and transmit directions.
The market impact of the presently disclosed subject matter is potentially large in size as it is related to both commercial wireless and AI technologies. It could, for example, be useful for artificial intelligence technologies over wireless or sensor networks, 5G and beyond, 6G wireless standardization, and IEEE 802.11 Wi-Fi. Recently, IEEE 802.11 has formed a Topic Interest Group, where distributed learning over a wireless network has been mentioned:
(https://mentor.ieee.org/802.11/documents?is_dcn=DCN%2C%20Title%2C%20Author %20or%20Affiliation&is_group=aiml)
Other example aspects of the present disclosure are directed to systems, apparatus, tangible, non-transitory computer-readable media, user interfaces, memory devices, and electronic smart devices or the like. To implement methodology and technology herewith, one or more processors may be provided, programmed to perform the steps and functions as called for by the presently disclosed subject matter, as will be understood by those of ordinary skill in the art.
Additional objects and advantages of the presently disclosed subject matter are set forth in, or will be apparent to, those of ordinary skill in the art from the detailed description herein. Also, it should be further appreciated that modifications and variations to the specifically illustrated, referred and discussed features, elements, and steps hereof may be practiced in various embodiments, uses, and practices of the presently disclosed subject matter without departing from the spirit and scope of the subject matter. Variations may include, but are not limited to, substitution of equivalent means, features, or steps for those illustrated, referenced, or discussed, and the functional, operational, or positional reversal of various parts, features, steps, or the like.
Still further, it is to be understood that different embodiments, as well as different presently preferred embodiments, of the presently disclosed subject matter may include various combinations or configurations of presently disclosed features, steps, or elements, or their equivalents (including combinations of features, parts, or steps or configurations thereof not expressly shown in the Figures or stated in the detailed description of such Figures). Additional embodiments of the presently disclosed subject matter, not necessarily expressed in the summarized section, may include and incorporate various combinations of aspects of features, components, or steps referenced in the summarized objects above, and/or other features, components, or steps as otherwise discussed in this application. Those of ordinary skill in the art will better appreciate the features and aspects of such embodiments, and others, upon review of the remainder of the specification, and will appreciate that the presently disclosed subject matter applies equally to corresponding methodologies as associated with practice of any of the present exemplary devices, and vice versa.
A full and enabling disclosure of the presently disclosed subject matter, including the best mode thereof, directed to one of ordinary skill in the art, is set forth in the specification, which makes reference to the appended Figures, in which:
Repeat use of reference characters in the present specification and drawings is intended to represent the same or analogous features or elements or steps of the presently disclosed subject matter.
It is to be understood by one of ordinary skill in the art that the present disclosure is a description of exemplary embodiments only, and is not intended as limiting the broader aspects of the disclosed subject matter. Each example is provided by way of explanation of the presently disclosed subject matter, not limitation of the presently disclosed subject matter. In fact, it will be apparent to those skilled in the art that various modifications and variations can be made in the presently disclosed subject matter without departing from the scope or spirit of the presently disclosed subject matter. For instance, features illustrated or described as part of one embodiment can be used with another embodiment to yield a still further embodiment. Thus, it is intended that the presently disclosed subject matter covers such modifications and variations as come within the scope of the appended claims and their equivalents.
The present disclosure is generally directed to methods and procedures for synchronization and over-the-air computation, such as providing general-purpose synchronization methodology. The present disclosure further relates to providing methodologies and procedures which enable an SDR-based network to realize over-the-air computation (OAC) for machine learning applications in a reliable way.
Presently disclosed subject matter relate in part to both (a) synchronization for CC-based baseband processing and (b) realization of OAC in practice for FEEL.
Synchronization for CC-based baseband processing: To maintain the time synchronization in an SDR-based network while maintaining the baseband in the CCs, we disclose a hard-coded block that is agnostic to the in-phase/quadrature (IQ) data desired to be communicated in the CC. We disclose the corresponding procedures, calibration, and synchronization waveform to address the hardware limitations.
Realization of OAC in practice for FEEL: We realize the disclosed method with an intellectual property (IP) core embedded into Adalm Pluto SDR. By using the presently disclosed synchronization method, we demonstrate the performance of frequency-shift keying (FSK)-based majority vote (MV) (FSK-MV) [12], [13], i.e., an OAC scheme for FEEL, for both homogeneous and heterogeneous data distribution scenarios. We also provide the corresponding procedures.
Notation: The complex and real numbers are denoted by and R, respectively.
Consider a scenario where K EDs transmit a set of complex-valued vectors denoted by {xUL,k∈1×N
To address the scenario above, the main strategy that we adopt is to separate any signal processing blocks that maintain the synchronization from the ones that do not need to be implemented under strict timing requirements so that the baseband can still be kept in the CC. Based on this strategy, we disclose a hard-coded block that is solely responsible for time synchronization. As shown in 1(b), the disclosed block jointly controls the TX direct-memory access (DMA) and the RX DMA with the processing system (PS) (e.g., Linux on the SDR) as a function of the detection of the synchronization waveform, denoted by xSYNC, in the transmit or receive directions through the (active-high) digital signals etx [n]∈{0, 1} and erx [n]∈{0, 1}, respectively. TX DMA and RX DMA are responsible for transferring the IQ samples from the random access memory (RAM) to the transceiver IP or vice versa, respectively. They are programmed by the PS, not by the block.
We define two modes for the block:
Mode 1: The default values of etx [n] and erx [n] are 0, i.e., TX DMA and RX DMA cannot transfer the IQ samples. The block listens to the output of the transceiver IP (i.e., the IQ samples in the receive direction), denoted by xrx [n], and constantly searches for the synchronization waveform xSYNC. If the vector xSYNC is detected, it sequentially sets (etx [n], erx [n])=(0, 1) for TRX seconds to allow the RX DMA to move the received IQ samples to the RAM, sets (etx [n], erx [n])=(0, 0) for TPC seconds, and finally sets (etx [n], erx [n])=(1, 0) for TTX seconds to allow TX DMA to transfer the IQ samples from the RAM to the transceiver IP.
Mode 2: The default values of etx [n] and erx [n] are 1, i.e., TX DMA and RX DMA can transfer the IQ samples. However, the block listens to the output of the TX DMA (the IQ samples in the transmit direction), denoted by xtx [n]. It searches for the vector xSYNC. If the vector xSYNC is detected, it blocks the reception by setting erx [n]=0 for TPC seconds.
Now, assume that the SDRs at the EDs and the ES are equipped with the disclosed block and operate at Mode 1 and Mode 2, respectively. We disclose the following procedure, illustrated in
Step 1 (EDs): The CC at each ED executes a command (i.e., refill(NED)) to fill the RAM with NED IQ samples in the receive direction for NED≥NDL. Since RX DMA is disabled by the disclosed block at this point, the CC waits for the RX DMA to be enabled by the block.
Step 2 (ES): After the CC at the ES synthesizes the vector XDL, it prepends xSYNC to initiate the procedure. It writes [xSYNC xDL] to the RAM and starts TX DMA by executing a command (i.e., transmit([xSYNC xDL])). As soon as the block detects the vector xSYNC in the transmit direction, it disables RX-DMA for TPC,ES seconds. Subsequently, the CC issues another command, i.e., refill(NES), to fill its RAM in the receive direction, where NES is the number of IQ samples to be acquired. However, the reception does not start for TPC,ES seconds due to the disabled RX DMA.
Step 3 (EDs): The transceiver IP at each ED receives [xSYNC xDL]. As soon as the block detects xSYNC, it enables RX DMA. Assuming that TRX,ED is large enough to acquire NED samples, the RX DMA transfers NED samples to the RAM as the PS requests for NED IQ samples on Step 1. The CC reads NED IQ samples in the RAM via a command, i.e., read(NED). As a result, XDL is received with a precise timing.
Step 4 (EDs): The CC at the kth ED processes the vector XDL and synthesizes xUL,k as a response. It then writes xUL,k to the RAM and initiates TX DMA by executing transmit([xSYNC xDL]) before the block enables the TX DMA to transfer. Hence, xUL,k should be ready in the RAM within TRX,ED+TPC,ED seconds.
Step 5 (EDs): The disclosed block at the ED enables the TX-DMA for TTX,ED seconds, where TTX,ED is assumed to be large enough to transmit NUL IQ samples. At this point, the EDs start their transmissions simultaneously.
Step 6 (ES): Assuming that TPC,ES=TRX,ED+TPC,ED−TΔand NES≥NUL+[TΔ/Tsample], the RX DMA at the ES starts to transfer NES IQ samples (due to the request in Step 2) TΔ second before the EDs' transmissions, where Tsample is the sample period. After executing read(NES), the ES receives the superposed signal starting from the [T66/Tsample] sample.
The procedure can be repeated after the ES waits for Twait,ES seconds to allow the EDs to be ready for the next communication cycle and complete its own internal signal processing, where each cycle takes TPC,ED+TRX,ED+TTX,ED+Twait,ES seconds. Note that the parameters TPC,ED, TRX,ED, TTX,ED, TPC,ES, TΔ and Twait,ES can be pre-configured or configured online by the CC (e.g., through an advanced extensible interface (AXI)). Their values depend on the (slowest) processing speed of the constituent CCs in the network. The timers for TPC,ED, TRX,ED, TTX,ED, and TPC,ES can be implemented as counters that count up on each FPGA clock tick. The distinct feature of the disclosed block and the corresponding procedure is that the timers are set up via xSYNC in the receive and transmit directions at both EDs and ES without using the CC.
The design of the synchronization waveform xSYNC and its detection under carrier frequency offset (CFO) with limited FPGA resources were two major issues that we dealt with in our implementation. We address these challenges by synthesizing xSYNC based on a single-carrier (SC) waveform with the roll-off factor of 0.5 by upsampling a repeated binary phase shift keying (BPSK) modulated sequence, i.e., 2[g g g g]−1, by a factor of Nup=2 and passing it through a root-raised cosine (RRC) filter, where g=[g1, . . . , g32]∈1×32 is a binary Golay sequence. As a result, the null-to-null bandwidth of xSYNC is equal to 0.75 fsample, where fsample is the sample rate.
The rationale behind the design of xSYNC is as follows:
where b is based on the approximate SC waveform with the rectangular filter and equal to b=2[g32, g32, g31, g31, . . . , g1, g1]−1∈1×64 for Nup=2 and sn is [n−63], xrx [n−62], . . . , xrx [n]] for Mode 1 or [xtx[n−63], xtx[n−62], . . . , xtx [n]] for Mode 2.
The block declares a detection if m[n] is larger than 1/4 for four times with 64 samples apart. Implemented IP is illustrated in
The baseband processing (and the additional processing for FEEL) at the ED can take time in the order of seconds. In this case, TPC may need to be set to a large value accordingly. However, using a large value for TPC (also for TRX and TTX) results in a surprising time offset problem due to the inaccurate and unstable FPGA clock. To elaborate on this, we model the instantaneous FPGA clock period T′clk,k [n] at the kth ED as T′clk,k [n]=Tclk+ΔTclk,k+nclk,k [n] where Tclk is the ideal clock period and ΔTclk,k and nclk,k [n] are the offset and the jitter due to the imperfect oscillator on the SDR, respectively. The disclosed block at the kth ED measures TRX,ED+TPC,ED through a counter that counts up till Ncnt=(TRX,ED+TPC,ED)/Tclk with the FPGA clock ticks. Therefore, the difference between TRX,ED+TPC,ED and the measured one can be calculated as
which implies that a large Ncnt causes not only a large time offset (the first term) but also a large jitter (second term).
The jitter can be mitigated by reducing Ncnt or using a more stable oscillator in the SDR. To address the time offset, we disclose a closed-loop calibration procedure as represented in conjunction with
In this part of the disclosure, we implement FEEL based on the OAC scheme, i.e., FSK-MV, originally disclosed in [12] and extended in [13] with the absentee votes. To make the reader familiar with this scheme, let k denote the local data set containing the labeled data samples (, ) at the kth ED for k=1 , . . . , K, where x and y are th data sample and its associated label, respectively. The main problem tackled with FEEL can be expressed as
where =1 ∪ . . . ∪ K and f (w, x, y) is the sample loss function measuring the labeling error for (x, y) for the parameter vector w=[w1, . . . , wQ]T∈RQ.
To solve (2) in a wireless network with OAC in a distributed manner (i.e., the global data set cannot be formed by uploading the local data sets to the ES), for the nth parameter-update round, the kth ED first calculates the local stochastic gradients as
where {tilde over (g)}k(n)=[{tilde over (g)}k,1(n), . . . , {tilde over (g)}k,Q(n)] is the gradient vector, k ⊂ k is the selected data batch from the local data set and nb=|k| as the batch size.
Similar to the distributed training strategy by the MV with sign stochastic
gradient descent (signSGD) [14], each ED then activates one of the two subcarriers determined by the time-frequency index pairs (m+, l+) and (m−, l−) for m+, m−∈{0, 1, . . . , S−1} and l+, l−, l−∈{0, 1, . . . , M−1} with the symbols tk,l
t
k,l
,m
and
t
k,l
,m
respectively, where ω(ĝk,q(n)=1 for |{tilde over (g)}k,q(n)|≥, otherwise it is 0, Es=2 is the normalization factor, Sk,q(n) is a random quadrature phase-shift keying (QPSK) symbol to reduce the peak-to-mean envelope power ratio (PMEPR), the function sign (·) results in 1, −1, or ±1 at random for a positive, a negative, or a zero-valued argument, respectively, and the function [·] results in 1 if its argument holds, otherwise it is 0.
The K EDs then access the wireless channel on the same time-frequency resources simultaneously with S orthogonal frequency division multiplexing (OFDM) symbols consisting of M active subcarriers. In [13], it is shown that >0 (i.e., enabling absentee votes) can improve the test accuracy by eliminating the converging EDs from the MV calculation when the data distribution is heterogeneous.
Let rl+,m+(n) and rl−,m−(n) be the received symbols after the superposition for the qth gradient at the ES. The ES detects the MV for the qth gradient with an energy detector as
v
q
(n)=sign (Δq(n)), (6)
where Δq(n)eq+−eq−for eq+|rl
Finally, the ES transmits v(n)=[v1(n), . . . , vQ(n)]T to the EDs and the models at the EDs are updated as w(n+1)=w(n)−ηv(n), where η is the learning rate.
In [12] and [13], the reception of the MV vector by the EDs is assumed to be perfect. In practice, the MVs can be communicated via traditional communication methods. Nevertheless, as it increases the complexity of the EDs, we also use the FSK in the DL in our implementation as done for the UL.
In
The signaling between EDs and ES in this disclosure is maintained over a custom PPDU as shown in
In this design, there are four different fields, i.e., frame synchronization, channel estimation (CHEST), header, and data fields, where each field is based on OFDM symbols. We express an OFDM symbol as
t=AF
N
HMfd, (7)
where A ∈N
For all fields, we set the IDFT size and CP size for synthesizing the OFDM symbols to NIDFT=256 and Ncp=64, respectively. For CHEST, header, and data fields, we use M=192 active subcarriers along with 8 direct current (DC) subcarriers. For the frame synchronization field, the DC subcarriers are also utilized.
The frame synchronization field is a single OFDM symbol. Every other active subcarrier within the band is utilized with a Zadoff-Chu sequence of length 97. Therefore, the corresponding OFDM symbols has two repetitions in the time domain. While the repetitions are used to estimate the CFO at the receiver, the null subcarriers are utilized to estimate the noise variance.
The CHEST field is a single OFDM symbol. The modulation symbols are the elements of a pair of QPSK Golay sequences of length 96, denoted by (ga, gb). The d is the concatenation of ga and gb.
The header is a single OFDM symbol. It is based on BPSK symbols with a polar code of length 128 with the rate of 1/2. We reserve 56 bits for a sequence of signature bits, the number of codewords in the data field, i.e., Ncw, and the number of pre-padding bits, i.e., Npad. The rest of the 8 bits are reserved for cyclic redundancy check (CRC). We also use QPSK-based phase tracking symbols for every other two subcarriers, where the tracking symbols are the elements of a QPSK Golay sequence of length 64.
Let Nbit be the number of information bits to be communicated. We calculate the number of codewords and the number of pre-padding bits as Ncw=[Nbit/56] and Npad=56Ncw−Nbit. After the information bits are padded with Npad, they are grouped into Ncw messages of length 56 bits. The concentration of each message sequence and its corresponding CRC is encoded with a polar code of length 128 with the rate of 1/2. We carry one codeword on each OFDM symbol with BPSK modulation. Hence, the number of OFDM symbols in the data field is also Ncw. Similar to the header, QPSK-based phase tracking symbols are used for every other two subcarriers.
Throughout this disclosure, we use the information bits that are transmitted over the PPDU to signal tcal, tfeed, tgrd, tmv and user multiplexing. We dedicate 4 bits for signaling type and 25 bits for user multiplexing. If the signaling type is the calibration feedback, we define 32 bits for time offset and 8 bits for power control for each ED.
For the experiment/confirmation, we consider the learning task of handwritten-digit recognition with K=5 EDs and ES, where each of them is implemented with Adalm Pluto (Rev. C) SDRs.
We develop the IP core for the disclosed synchronization method by using MATLAB HDL Coder and embed it to the FPGA (Xilinx Zynq XC7Z010) based on the guidelines provided in [15]. As shown in
We use the MNIST database that contains labeled handwritten-digit images. To prepare the data, we first choose |D|=25000 training images from the database, where each digit has distinct 2500 images. For homogeneous data distribution, each ED has 500 distinct images for each digit. For heterogeneous data distribution, kth ED has the data samples with the labels {k−1, k, 1+k, 2+k, 3+k, 4+k}. For both distributions, the EDs do not have common training images. For the model, we consider a convolution neural network (CNN) that consists of two 2D convolutional layers with the kernel size [5, 5], stride [1, 1], and padding [2, 2], where the former layer has 1 input and 16 output channels and the latter one has 16 input and 32 output channels. Each layer is followed by batch norm, rectified linear units, and max pooling layer with the kernel size 2. Finally, we use a fully-connected layer followed by softmax. Our model has Q=29034 learnable parameters that result in S=303 OFDM symbols for FSK-MV. We set η=0.001 and nb=100. For the test accuracy, we use 10000 test samples in the database.
The experiment reveals many practical issues. The FPGA clock rate of Adalm Pluto SDR is 100 MHz, generated from a 40 MHz oscillator where the frequency deviation is rated at 20 PPM. Due to the large deviation and TPC,ED+TRX,ED, we observe a large time offset and a large jitter as discussed in Section II-B. Hence, the ES initiates the calibration procedure in
In
In the experiment, we observe that the magnitudes of the channel frequency coefficients do not change significantly due to the low mobility. However, their phases change in an intractable manner due to the random time offsets. Nevertheless, this is not an issue for FSK-MV as it does not require a phase alignment. Note that we also implement a closed-loop power control by using the calibration procedure to align the received signal powers. However, an ideal power alignment is challenging to maintain. For example, ED 3's channel in
Finally,
In particular,
Further,
For homogeneous data distribution, the test accuracy for each ED quickly reaches 97.5% for both cases as given in
We disclose a method that can maintain the synchronization in an SDR-based network without implementing the baseband as a hard-coded block. We also provide the corresponding procedure and discuss the design of the synchronization waveform to address the hardware limitations. Finally, by implementing the disclosed concept with Adalm Pluto SDRs, for the first time, we demonstrate the performance of an OAC, i.e., FSK-MV, for FEEL. Our experiment shows that FSK-MV provides robustness against time synchronization errors and can result in a high test accuracy in practice.
This written description uses examples to disclose the presently disclosed subject matter, including the best mode, and also to enable any person skilled in the art to practice the presently disclosed subject matter, including making and using any devices or systems and performing any incorporated methods. The patentable scope of the presently disclosed subject matter is defined by the claims, and may include other examples that occur to those skilled in the art. Such other examples are intended to be within the scope of the claims if they include structural and/or step elements that do not differ from the literal language of the claims, or if they include equivalent structural and/or elements with insubstantial differences from the literal languages of the claims.
The present application claims the benefit of priority of U.S. Provisional Patent Application No. 63/396,351, filed Aug. 9, 2022, and the benefit of priority of U.S. Provisional Patent Application No. 63/505,835, filed Jun. 2, 2023, both of which are titled Methods And Procedures For Synchronization And Over-The-Air Computation, and both of which are fully incorporated herein by reference for all purposes.
Number | Date | Country | |
---|---|---|---|
63505835 | Jun 2023 | US | |
63396351 | Aug 2022 | US |