The present application claims priority from UK patent application no. 2207152.6, filed May 16, 2022; the content of which is incorporated herein by reference for all purposes.
Feedback codes are a class of error correction codes that protect a message transmitted from a terminal A to another terminal B over a noisy communication channel. In contrast to classical forward error correction codes, feedback codes leverage the feedback signal from terminal B to terminal A to aid encoding at terminal A, and the encoding proceeds iteratively such that each transmitted symbol depends not only the intended message, but also all the feedback signals received so far.
Examples will be described with reference to the accompanying drawings in which
The following notation is used:
The communication goal is to deliver a vector of K bits b∈{0,1}K 103 reliably from node A 102 to node B 104. Therefore, nodes A 102 and B 104 are arranged to communicate over T interactions. In the τth interaction, τ=1, 2, . . . , T, node A 102 transmits a packet of q(τ) symbols c(τ) to node B 104 over the forward channel. In turn, node B 104 feeds back a packet of {tilde over (q)}(τ) symbols {tilde over (c)}(τ) to node A 102 over the feedback channel. In the examples described herein, it will be assumed that q(τ)={tilde over (q)}(τ)=q, ∀τ.
Node A 102 transmits, via the forward channel 106, a vector of symbols c∈q 110. The received signal or symbols at Node B 104 is given by y=c+n 112, where n∈q 114 is a vector of Additive White Gaussian Noise (AWGN), the elements of which have a Gaussian distribution (0,σf2) in an independent and identically distributed (i.i.d.) manner.
Node B 104 feeds back a vector of symbols {tilde over (c)}∈q 116 to node A 102. The received signal or received symbols, at node A 102, are given by {tilde over (y)}={tilde over (c)}+ñ 118, where the elements of the AWGN ñ∈q 120 have a Gaussian distribution (0,σb2) in an independent and identically distributed (i.i.d.) manner.
Nodes A 102 and B 104 are arranged to operate at a code rate
where N=Tq such that
In the examples described, nodes A 102 and B 104 can be subject to average power constraints P and {tilde over (P)} respectively:
where E is the expectancy. Therefore, the signal-to-noise ratio (SNR) of the feedforward 106 and feedback 108 channels are respectively given by
Node A 102 comprises an encoder 122, and an accumulator 124. The encoder 122 is arranged to produce the packet of q(τ) symbols c(τ) 110. The encoder 122 produces the q(τ) symbols c(τ) 110 from a number of inputs. Examples can be realised in which the number of inputs comprise an information vector Q(τ) 126. The information vector 126 is also known as a knowledge vector. The information vector Q(τ) 126 is derived from the accumulator 124 and the bitstream b 103 to be modulated. The accumulator 124 receives the feedback symbols {tilde over (y)}={tilde over (c)}+ñ 118 and constructs the information vector Q(τ) 126 for processing by the encoder 122 as follows:
Q
(τ)
=[b,c
(1)
, . . . ,c
(τ-1)
,{tilde over (y)}
(1)
, . . . ,{tilde over (y)}
(τ)] (6).
Node B 104 comprises an accumulator 128. The accumulator 128 is arranged to produce the packet of {tilde over (q)}(τ) symbols {tilde over (c)}(τ) 116. The accumulator 122 produces the packet of {tilde over (q)}(τ) symbols {tilde over (c)}(τ) 116 from at least one input. Examples can be realised in which the at least one input comprises an information vector {tilde over (Q)}(τ) 130. The information vector 130 is also known as a knowledge vector. The information vector {tilde over (Q)}(τ) 130 is derived by accumulator 128 from the received signals y(1), . . . , y(τ) 112. Examples can be realised, using active feedback, in which the information vector {tilde over (Q)}(τ) 130 is derived from previous feedback symbols {tilde over (c)}(1), . . . , {tilde over (c)}(τ-1) as well as the received signals y(1), . . . , y(τ) 112. Examples that provide active feedback can be provided in which node B comprises an encoder 132. The encoder 132 is arranged to derive the feedback symbols {tilde over (c)}(1), . . . , {tilde over (c)}(τ) 116 from the information vector {tilde over (Q)}(τ) 130. Therefore, examples, in the case of active feedback, can be realised in which the information vector {tilde over (Q)}(τ) 130 is given by
{tilde over (Q)}
(τ)
=[{tilde over (c)}
(1)
, . . . ,{tilde over (c)}
(τ-1)
,y
(1)
, . . . ,y
(τ)] (7).
Examples, in the case of passive feedback, can be realised in which the information vector {tilde over (Q)}(τ) 130 is given by
{tilde over (Q)}
(τ)
=[y
(1)
, . . . ,y
(τ)].
The encoders 128 and 132 are arranged to encode the forward channel and feedback channels via respective encoding mechanisms M(τ) and {tilde over (M)}(τ), which, for each communication block τ, are realised via, respectively,
M
(τ)
:Q
(τ)
→c
(τ)
∈R
q
(8)
and
{tilde over (M)}
(τ)
:{tilde over (Q)}
(τ)
→{tilde over (c)}
(τ)
∈R
{tilde over (q)}
(9).
Node B 104 also comprises a decoder 134. The decoder 134 is arranged to predict the original bitstream b 103 from the information vector {tilde over (Q)}(τ) 130 following a decoding mechanism D given by
D:{tilde over (Q)}
(τ)
→{circumflex over (b)}∈{0,1}K (10).
As indicated above, the encoder 132 at node B 104 actively processes the information vector {tilde over (Q)}(τ) 130 via {tilde over (M)}(τ) to generate the vector of symbols {tilde over (c)}(τ) 116 transmitted to node A 102 over the feedback channel 108. In the case of examples that use passive feedback, the encoder 132 at node B 104 implements a relay mechanism in which the information vector {tilde over (Q)}(τ) 130, via {tilde over (M)}(τ), generates the vector of symbols {tilde over (c)}(τ) 116 transmitted to node A 102 over the feedback channel 108 using
{tilde over (M)}
(τ)
:{tilde over (Q)}
(τ)
→{tilde over (c)}
(τ)
=αy
(τ)
+αn
(τ) (14)
where α is a scalar that can be used to scale the received vector y(τ) to the above average power constraint, if imposed. In the cases of passive feedback, {tilde over (q)}(τ)=q(τ).
Referring to
A further layer normalisation layer 214 is provided to receive the bitstream b 103. The output of layer 214 is coupled to a position encoder 216. The output of the Linear-ReLu-Linear layer 212 is coupled to a further layer normalisation layer 218.
The encoder layers 202 of
An example implementation of systematic, passive, feedback encoding and decoding will be described with reference to
Although the above uses BPSK modulation to send the bitstream, examples are not limited thereto. Examples can be realised in which some other form of higher order modulation is used such as, for example, QPSK, 8-PSK, QAM etc. Since the encoder block comprises ds2s layers, the parity symbols can be generated in parallel by dividing the bitstream b 103 into multiple blocks with each layer of the encoder block being arranged to process a respective block of the bitstream b 130. The process of encoding the bitstream b 130 is shown in
Referring to
It will be appreciated that a total of l=K/m symbols are transmitted at each iteration corresponding to the l blocks of information bits of the bitstream b 130. The foregoing is repeated until n parity symbols have been transmitted for each block. It will be appreciated that the above gives a coding rate of R=m/(m+n) and requires
communication blocks. Therefore, the rate of a code can be adjusted by changing the block size m and the number of parity symbols per block n. Repeating the process results in iterative party symbol encoding, which is presented in Algorithm 2 below and also illustrated in, and described with reference to,
At line 1, a for loop is established so that a single parity symbol is generated per block at each pass. The knowledge vector Q(τ) is updated using the bitstream 103, the parity symbols transmitted thus far and the received feedback symbols or signals received thus far. The knowledge vector Q(τ) is pre-processed at steps 4 to 13, as will be described below. Feature extraction occurs at lines 14 to 16, attention-based neural-encoding is implemented at line 17 and symbol mapping is determined at lines 19 and 20.
Referring to line 5, Se(⋅) defines how the knowledge vector is pre-processed and fed to the deep neural network (DNN) architecture. Firstly, Se(⋅) generates l equal-sized knowledge vectors, i.e., Se(Q(τ))={Qi(τ), . . . , Ql(τ)}, each of which corresponds to respective different blocks.
The above Algorithm 2 presents four ways to pre-process the knowledge vector, which are expressed in lines 7, 9, 11, and 13.
A first way to pre-process the knowledge vector is given in line 7, in which the knowledge vector Q(τ) is arranged to comprise a current or respective block, b((i-1)*m+1:i*m), of the bitstream b together with the thus far, or current, received feedback signals {tilde over (y)}i(1), . . . , {tilde over (y)}i(τ-1).
A second way to pre-process the knowledge vector Q(τ) is given in line 9. It should be noted that node A 102, by subtracting the vector of the symbols c(τ) 110 from the received noisy version of the feedback symbols {tilde over (y)}(τ) gives a cumulative noise vector
A third way to pre-process the knowledge vector Q(τ) is given in line 11. In the third example, the knowledge vector is constructed to comprise a current or respective block, b((i-1)*m+1:i*m), of the bitstream b together with the thus far, or current, transmitted symbols ci(1), . . . , ci(τ-1) and the above-described cumulative noise vector {tilde over (y)}(τ)−c(τ)=
A fourth way to construct the knowledge vector Q(τ) is given in line 13. In the fourth example, the knowledge vector is constructed to comprise a current, or respective, block, b((i-1)*m+1:i*m), of the bitstream b together with the thus far, or current, transmitted symbols ci(1), . . . , ci(τ-1) and together with the thus far, or current, received feedback signals {tilde over (y)}i(1), . . . , {tilde over (y)}i(τ-1).
The encoder architecture depicted in
Examples of the encoder can be realised in which the layer normalisation module 210 can be implemented with different orders, in particular, examples can provide either post-normalisation or pre-normalisation. Pre- and post-layer normalisation are described in detail in
Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Liwei Wang, and Tie-Yan Liu. “On layer normalization in the transformer architecture”. CoRR, abs/2002.04745, 2020, which is incorporated herein by reference.
Examples can be realised in which the encoder uses pre-layer normalisation to stabilise gradient flow. Locating the layer normalisation inside any residual blocks allows training to be performed without a warm-up stage and supports faster training convergence.
It will be appreciated also that the mask that is used in transformers as described in “Attention is all you need” is not used in the example encoders and decoders described herein. The ability to remove the use of masks follows from not using sequential processing in the input. Furthermore, the examples described herein feature extractor 220 and symbol mapper 222 are realised as fully connected layers.
Examples of the decoder 134 have an identical architecture to examples of the encoder 122 with the exception that decoding is not performed in an iterative manner.
Although examples can be realised that use a two-phase communication protocol, such as described above in Algorithm 1, examples are not limited to such an arrangement. Examples can be realised in which the first phase in which the modulated symbols corresponding to the bitstream b 130 are not transmitted in advance of the generating symbols using feedback. In such examples, node A 102 directly communicates parity symbols and receives corresponding feedback symbols. Examples of the Block Attention Feedback codes that do not use an initial phase will be known as Generalised Block Attention Feedback (GBAF) codes.
For GBAF codes, since the initial phase is removed, examples use T=m+n communication blocks to obtain the same transmission or coding rate of R=m/(m+n). In examples of GBAF, all iterations use the IPSE algorithm, that is, Algorithm 2, including when τ=1, such that line 1 of Algorithm 2 becomes
for τ=1, . . . , T do # Generate 1 parity symbol per block at each pass.
Referring again to
Referring to
The accumulator 124 outputs the feature matrix Q(τ) 126 of dimension
to the encoder 122. The feature matrix Q(τ) 126 comprises:
the bitstream b 103 is divided into l=17 blocks each having a length of m=3, which gives a vector Fb=[s1, s2, . . . sl] 502,
a symbols vector of previously transmitted symbols
504, and
a vector of cumulative noise vectors
506.
The feature matrix Q(τ) 126 is fed to the encoder 122.
Referring to
The feature matrix Q(τ) 126 is input to the feature extractor 220. The feature extractor neural network 220 produces an extracted features matrix V(τ)∈Rbs×17×32 602. Further detail on the structure of the feature extractor neural network is given in
The possible symbols matrix WM 606 is output to a symbol mapping neural network 222 Hmapper. The structure of the symbol mapping neural network 222 Hmapper is described in greater detail with reference to
The code symbols 110 can be transmitted to node B 104 without further processing. However, preferred implementations also provide at least one, or both, of power normalisation and power reallocation as described above.
Referring to
In the example depicted in
In the example shown, the linear layers 702 to 706 are fully connected linear layers and the activation layers 708 to 710 use GeLu activation functions. Although the example illustrated in
In the example shown in
There are bs instances of the feature extractor neural network 220. The output 712 of the, or of each instance of the, first linear layer 702 is a matrix or tensor having dimensions bs×17×64. Each of the matrices of dimension bs×17×64 is fed into respective instances of the first activation layer 708. Each of the inputs to the activation layer is passed through a respective activation function. In the example depicted, the activation function is a GeLu activation function. The first activation layer 708 comprises l, where l=17 in the present example, GeLu activation functions. The output 714 of the first activation layer 708 is a matrix or tensor having dimensions bs×17×64 that is, bs instances of 2D matrices of dimension 17×64.
The output 714 matrix is input into a second linear layer 704. The second linear layer comprises a 64×64 neural network that produces an output matrix 716 having dimensions bs×17×64. The output matrix 716 is fed into the second activation layer 710, where each input is subjected to a GeLu activation function. The output 718 of the second activation layer 710 is also a matrix of dimensions bs×17×64.
The output 718 of the second activation layer 710 is input into the third linear layer 706. The third linear layer comprises a 64×32 neural network that produces an output matrix 720 having dimensions bs×17×32. The output matrix 720 corresponds to the above-described extracted features matrix V(τ)∈bs×17×32.
Still referring to
The possible symbols matrix W(τ) 606 is fed to the symbols mapper neural network 222. The symbols mapper neural network 222 comprises a linear layer neural network 730. The linear layer neural network 730 has dimensions 32×2 and produces an output matrix 732 of coded symbols. The matrix 732 corresponds to the above-described matrix of coded symbols 110.
Referring to
The knowledge matrix {tilde over (Q)}(τ) 130 is output to the decoder neural network 134.
Referring to
The feature extraction neural network 902 processed the knowledge matrix {tilde over (Q)}(τ) 130 to generate an extracted features matrix {tilde over (V)}∈bs×17×19 908. The extracted features matrix {tilde over (V)} 908 is processed by the sequence to sequence neural network 904 to produce a candidate symbol matrix {tilde over (W)}∈bs×17×32 910. The candidate symbol matrix {tilde over (W)} 910 comprises a plurality of candidate symbols. The candidate symbol matrix {tilde over (W)} 910 is processed by the symbol mapping neural network 906 to produce a decoded bitstream vector {circumflex over (b)}∈bs×51×2 912 containing estimates of the initially transmitted bitstream b 103. It will be appreciated that the actual dimension of {circumflex over (b)} is {circumflex over (b)}∈bs×51×1. However, since for each bit, a binary distribution (p,p−1) is generated, the output of the neural network 906 is {circumflex over (b)}∈bs×51×2 from which {circumflex over (b)}∈bs×51×1 is decoded.
Returning to
Referring to
In the example depicted in
In the example shown, the linear layers 1002 to 1006 are fully connected linear layers and the activation layers 1008 to 1010 use GeLu activation functions. Although the example illustrated in
In the example shown in
The output 1014 matrix having dimensions bs×17×64 is input into a second neural network linear layer 1004. The second linear layer comprises a 64×64 neural network that produces an output matrix 1016 having dimensions bs×17×64. The output matrix 1016 is fed into the second activation layer 1010, where each input is subjected to a GeLu activation function. The output 1018 of the second activation layer 1010 is also a matrix of dimensions bs×17×64.
The output 1018 of the second activation layer 1010 is input into the third linear layer 1006. The third linear layer comprises a 64×32 neural network that produces an output matrix 1020 having dimensions bs×17×32. The output matrix 1020 corresponds to the above-described extracted features matrix {tilde over (V)}(τ)∈bs×17×32 908.
Still referring to
The possible symbols matrix W(τ) 910 is fed to the symbols mapper neural network 906. The symbols mapper neural network 906 comprises a linear layer neural network 1030, a reshape function 1031 and a softmax function 1032. The linear layer neural network 1030 has dimensions 32×6 and produces an output matrix 1034 of coded symbols having dimensions bs×17×6. The reshaping function 1031 processes the output matrix 1034 to produce a reshaped matrix. The reshaped matrix comprises a rearrangement of the values of the output matrix 1034 to produce an output matrix 1036 of candidate decoded symbols. The output matrix 1026 has dimensions bs×51×2. Each instance of the output matrix 1034 of the possible decoded symbols two values is processed by the softmax function 1032 to generate a matrix 1038 of decoded symbols. The matrix 1038 corresponds to the above-described matrix of decoded symbols 912.
It can be appreciated from the above that the neural-encoder at node A 102 performs simultaneously two tasks; namely, keeping track of a current belief regarding the original bits at the receiver and generating symbols accordingly in order to refine the belief. It can be appreciated that the above-described GBAF uses a single network for both tasks.
Referring again to
It will be appreciated from the above GBAF that the feedback information (parity symbols and combined noise values) and the original bitstream (modulated or unmodulated) are processed simultaneously. However, examples using belief feedback support learning the residual error between the original bits of the bitstream and the prediction at node B 104 based on symbols received so far. Therefore, examples can be realised that add another deep neural network for generating a belief vector on the original bitstream based on the feedback information comprising the parity symbols and the combined noise values. The belief vector can be concatenated with the vector of the original unmodulated bitstream 103 and conveyed to the encoder as part of the information vector or knowledge vector Q(τ) 126
Referring to
The feature extractor neural network 1106 has an input matrix {tilde over (Y)}(τ)∈bs×17×8 1112, that is, {tilde over (Y)}(τ)∈ i.e. the totality of all feedback from node B 104, that is, all received feedback signals comprising feedback symbols and noise, where
The feature extractor neural network 1106 processed the input matrix {tilde over (Y)}(τ) 1112 to produce a feature matrix {tilde over (V)}′∈b×17×32 1114. The feature matrix {tilde over (V)}′ 1114 is input into the belief sequence to sequence neural network 1108. The belief sequence to sequence neural network 1108 processes the feature matrix {tilde over (V)}′1114 to produce a matrix of candidate beliefs {tilde over (W)}′∈bs×17×32 1116. The matrix of candidate beliefs {tilde over (W)}′ 1116 forms an input to the belief mapping neural network 1110. The belief mapping neural network 1110 processes the matrix of candidate beliefs {tilde over (W)}′ 1116 to form the matrix of beliefs B(τ)∈bs×17×32 1104. The architecture for the belief neural network 1102 is almost identical to the architecture for the encoder 122 described above with the exception that the belief mapping neural network 1110 uses an extra softmax layer to generate the beliefs in the form of output probabilities.
The matrix of beliefs B(τ)∈bs×17×32 1104 is fed to the accumulator 124 of node A 102 to form part of, or be used with, the information vector or knowledge vector Q(τ) 126, in particular, pre-processing of the knowledge vector is given by
Se(Q(τ),B(τ))={Qi(r), . . . , Ql(r)}. The overall architecture for feedback encoding and decoding incorporating beliefs is known as Unified Iterative Parity Symbol Encoding (UIPSE) and is shown below in detail in Algorithm 3.
indicates data missing or illegible when filed
Again, it can be appreciated that a for loop for τ=1, . . . , T is established, that is, a parity symbol is generated per block at each pass. The knowledge vector Q(τ) 126 is updated at line 3 as Q(τ)=[b, c(1) . . . , c(τ-1), {tilde over (y)}(1), . . . , y(τ-1)]. A determination is made at line 4 whether or not belief feedback is enabled. If belief processing is not enabled, processing continues at line 29, where Algorithm 2 is implemented. If belief feedback is enabled, processing continues with lines 5 to 27 as follows. At line 6, the knowledge vector Q(τ) 126 vector is established for the belief network as Sb(Q(τ))={{tilde over (Q)}i(τ), . . . , {tilde over (Q)}l(τ)} such that {tilde over (Q)}i(τ)=[{tilde over (y)}i(τ), . . . , {tilde over (y)}i(τ-1)]. The features of the input vector 1112 are extracted at line 8 by Vi(τ)=Hextractbelief({tilde over (Q)}i(τ)). Attention-based neural encoding, that is, sequence to sequence encoding, is realised at line 9 using
{tilde over (V)}belief(τ)=Hencoderbelief(Vbelief(τ)). The belief feedback is generated at line 10 using Bi(τ)=Hmapbelief({tilde over (V)}i(τ)). The information vector or knowledge vector Q(τ) 126 is pre-processed at line 12 such that Se(Q(τ), B(τ))={Qi(τ), . . . , Ql(τ)}, where Qi(τ) is established according to one of the following conditions: feedback only, noise only, disentanglement or beliefs, symbols and feedback. If feedback only is selected, Qi(τ) is given by
Qi(τ)=[b((i-1)*m+1:i*m), Bi(τ), {tilde over (y)}i(1), . . . , {tilde over (y)}i(τ-1)]. If noise only is selected, Qi(τ) is given by Qi(τ)=[b((i-1)*m+1:i*m), Bi(τ), {tilde over (y)}i(1)−ci(1), . . . , {tilde over (y)}i(τ-1)−ci(τ-1)]. If Disentanglement is enabled, Qi(τ) is given by Qi(τ)=[b((i-1)*m+1:i*m), Bi(τ), ci(1), . . . , ci(τ-1), {tilde over (y)}i(1)−ci(1), . . . , {tilde over (y)}i(τ-1)−ci(τ-1)]. Otherwise Qi(τ) is given by Qi(τ)=[b((i-1)*m+1:i*m, Bi(τ), ci(1), . . . , ci(τ-1), {tilde over (y)}i(1), . . . , {tilde over (y)}i(τ- 1)]. Feature extraction is then performed at lines 22 and 23 for all l blocks by establishing a for loop as for i∈[l]do,Vi(τ)=Hextract(Qi(τ)), followed by Attention-based Neural-encoding at line 24 given by V(τ)=Hencoder({tilde over (V)}(τ)). Finally, symbol mapping is performed at lines 26 and 27 via for i∈[l]do,ci(τ)=Hmap({tilde over (V)}i(τ)).
Referring again to
The unified architecture of
Inner feedback refers to the process of using generated parity symbols as inputs to the encoder 122 at consecutive iterations as will be described with reference to
Outer feedback refers to the feedback channel information received at node A 102 from node B 104, which enables the encoder to track noise realisations.
Belief feedback is the output of the additional deep neural network employed at node A 102 that is used to track node B's belief about the bitstream after each transmission block.
It will be appreciated that enabling and disabling the belief feedback supports switching between variations of GBAF and GBAF-BF. Still further, examples can be realised that disable or enable the inner or outer feedback mechanisms as well, which supports realising different variations of the unified GBAF. Therefore, examples can be realised in which the encoder comprises a selectable plurality of feedback mechanisms to support feedback encoding of a bitstream, the selectable plurality of feedback mechanisms comprising at least one of the following, taken jointly and severally in any and all permutations: inner feedback comprising processing generated parity symbols as inputs to the encoder, outer feedback comprising processing feedback channel to determine noise associated with at least one, or both, of a feedforward channel and a feedback channel, and belief feedback comprising data associated with the bitstream at a receiver after each transmission block.
Referring to
The iterative decoding module 1206 is invoked multiple times and is arranged to use previous decoding outputs 1220 to 1224 as inputs to the iterative decoding process by concatenating the latent representations 1214 to 1218 and the previous decoding outputs 1220 to 1224. The iterative decoding process forms a belief propagation mechanism through a multi-layer attention encoder 1226. The iterative decoding module 1206 comprises multiple fully connected layers. In the example depicted in
It will be appreciated that the iterative decoding module 1206 uses the output of the output fully connected layers 1234 to 1238 as beliefs to refine predictions in a manner similar to recurrent neural network architectures. The fully connected layers 1228 to 1238 are arranged to align the sizes of the latent representations 1214 to 1218.
In the example described, there are two layers, that is, there are two encoders. However, examples are not limited to two such layers. Examples can be used in which two or more than two such layers are used. Furthermore, the iterative decoding module 1206 can be invoked multiple times. Examples can be realised in which the iterative decoding module 1206 can be invoked three times. However, examples are not limited to the iterative decoding layer 1206 being involved three times. Examples can be realised in which the iterative decoding layer 1206 is involved two or more times. Accordingly, examples can be realised in which the iterative decoding module 1206 comprises a plurality of fully connected layers and a multi-layer attention encoder 1226 that are invoked two or more times.
It will be appreciated from
In training the neural networks of the examples, an AdamW optimizer was utilised, which is a variation of the Adam optimizer but with decoupled weight decay regularization. Also, a batch size of B=8192 was used, with an initial learning rate of 0.001 and a weight decay parameter of 0.01. Furthermore, gradient clipping was applied with a threshold of 0.5 and the neural networks were trained for 600,000 batches together with applying polynomial decay to the learning rate.
Referring to
The view 1300 shows:
a BLER performance curve 1302 for 5G NR LDPC, that is, a 5G New Radio Low-Density Parity-Check code,
a BLER performance curve 1304 for Deepcode as described in “Deepcode: Feeback Codes via Deep Learning” by Hyeji Kim, Yihan Jiang, Sreeram Kannan, Sewoong Oh, Pramod Viswanath, available from https://arxiv.org/pdf/1807.00801v1.pdf,
a BLER performance curve 1306 for Deep Extended Feedback Codes, as described in https://arxiv.org/pdf/2105.01365.pdf, Anahid Robert Safavi, Alberto G. Perotti, Branislav M. Popovi{tilde over (c)}, Mandi Boloursaz Mashhadi, Deniz Gündüz,
a BLER performance curve 1308 for DRFC coding, and
a BLER performance curve 1310 according to GBAF coding as described herein.
It can be appreciated that the performance of GBAF is significantly better than the above prior art feedback coding techniques.
Referring to
At 1402, the parameters K, N, m are selected, where K represents the number of bits in the bitstream 103, N represents the total number of bits to be transmitted and m represents the number of bits per block.
At 1404, a bitstream b 103 of K bits is generated; the K bits comprise randomly generated bits. The bitstream is reshaped, at 1406, to produce a bit matrix Fb∈, where =┌K/m┐.
At 1408, real symbols s=Fbζ, where ζ=[2m-1, 2m-2, . . . , 21, 20]T are constructed for “transmission” to node B. The word “transmission” is in quotes since the channel and transmission are simulated such that actual transmission does not take place when generating the training data and using that data to train neural networks. Therefore, it will be appreciated that “transmission” within this training context means subjecting the bit matrix to a transfer function representing the channel conditions of a given channel to produce transmitted/received symbols.
Generating the bitstream 103 at 1406 and constructing the real symbols at 1408 is repeated a predetermined number of times, that is, a predetermined number of interactions take place. Examples can be realised in which the predetermined number of times is governed by T=N/ interactions. Accordingly, at 1412, a determination is made regarding the number of interactions that have taken place thus far. If the determination at 1412 is that T=N/ or fewer interactions have taken place, processing resumes at 1406 where a new bitstream is generated. If more than T=N/ interactions have taken place, processing proceeds to 1414, where the feature matrix {tilde over (Q)}(τ) is constructed and, at 1416, real symbols ŝ∈ are decoded.
Once the real symbols ŝ∈ have been decoded, the above-described encoders and decoders, that is, the above described neural networks, are trained, at 1418, to minimise the error between s and ŝ.
Alternatively, during actual encoding and decoding, that is, during actual feedback encoding and decoding once the neural networks have been trained, following decoding of the real symbols ŝ∈ at 1416, the estimated or decoded symbols ŝ are transformed into a corresponding bit matrix {circumflex over (F)}b∈ at 1420, and, at 1422, the bit matrix {circumflex over (F)}b is reshaped to give a decoded or demodulated bitstream to {circumflex over (b)}∈{0,1}K×1.
Referring to
Referring to
Having established the selected mode of operating and having established the knowledge vector in response, feature extraction is performed at 1636 for all/blocks by establishing a for loop as for i∈[l]do,Vi(τ)=Hextract(Qi(τ)), followed by Attention-based Neural-encoding at 1638 given by V(τ)=Hencoder({tilde over (V)}(τ)). Finally, symbol mapping is performed, at 1640, via for i∈[l]do,ci(τ)=Hmap({tilde over (V)}i(τ)).
Examples can be realised in which active feedback is used to generate modulated data. Referring again to
indicates data missing or illegible when filed
Referring to Algorithm 4, a for loop is established at line 1 so that a feedback symbol per block is generated at each pass.
A determination is made at line 2 regarding whether or not active feedback is enabled. If active feedback is not enabled, processing proceeds from line 19 where the τth feedback symbol cτ is determined from the received signal. Examples can be realised in which the τth feedback symbol cτ is determined as a scaled version of the (τ−1)th received signal αy(τ-1). If active feedback is enabled, the information vector or knowledge vector {tilde over (Q)}(τ) 130 is updated at line 4 as {tilde over (Q)}(τ)=[{tilde over (c)}(1), . . . , {tilde over (c)}(τ-1), y(1), . . . , y(τ)].
At lines 6 to 10, the knowledge vector {tilde over (Q)}(τ) 130 is pre-processed, that is, Se(⋅) generates l equal-sized knowledge vectors, i.e., Se({tilde over (Q)}(τ))={{tilde over (Q)}i(τ), . . . , {tilde over (Q)}l(τ)}, each of which corresponds to respective different blocks, such that each knowledge vector is determined according to whether or not Parity only should be taken into account or if previously transmitted feedback symbols should be taken into account as well as parity. It can be appreciated that from the perspective of the encoder 132 of node B 104, the received signals correspond to or represent parity symbols and the feedback symbols correspond to or represent transmitted signals.
If parity only is active, then each of the l knowledge vectors is determined from {tilde over (Q)}i(τ)=[yi(1), . . . , yi(τ)] as can be appreciated from line 8. If parity only is not active, then each of the l knowledge vectors also takes into account the previous feedback symbols such that each of the l knowledge vectors is determined from {tilde over (Q)}i(τ)=[{tilde over (c)}i(1), . . . , {tilde over (c)}i(τ-1), yi(1), . . . , yi(τ)] as can be appreciated at line 10.
The knowledge vectors {tilde over (Q)}i(τ), . . . , {tilde over (Q)}l(τ) are processed at lines 12 and 13 to extract the features {tilde over (V)}i(τ)={tilde over (H)}extractfeedback({tilde over (Q)}i(τ)). Sequence to Sequence processing, via an attention-based neural network, as described above in respect of the encoder 122 of node A 102, is performed at line 14 via {tilde over (W)}(τ)={tilde over (H)}s2sfeedback({tilde over (V)}(τ)).
Finally, feedback symbol mapping is performed at lines 16 and 17 to establish the feedback symbols 116 via {tilde over (c)}i(τ)={tilde over (H)}mapfeedback({tilde over (W)}i(τ)), which are outputs for transmission to node A 102.
Referring to
At 1710, the knowledge vector {tilde over (Q)}(τ) 130 is pre-processed, that is, Se(⋅) generates l equal-sized knowledge vectors, i.e., Se({tilde over (Q)}(τ))={{tilde over (Q)}i(τ), . . . , {tilde over (Q)}l(τ)}, each of which corresponds to respective different blocks, such that each knowledge vector is determined according to whether or not Parity only should be taken into account or if previously transmitted feedback symbols should be taken into account as well as parity. It can be appreciated that from the perspective of the encoder 132 of node B 104, the received signals correspond to or represent parity symbols and the feedback symbols correspond to or represent transmitted signals.
Therefore, a determination is, at 1712, if parity only is active. If the determination at 1712 is that parity only is active, then each of the/knowledge vectors is determined, at 1714, from {tilde over (Q)}i(τ)=[yi(1), . . . , yi(τ)]. However, if parity only in not active, then each of the l knowledge vectors also takes into account the previous feedback symbols such that each of the l knowledge vectors is determined, at 1716, from {tilde over (Q)}i(τ)=[{tilde over (c)}i(1), . . . , {tilde over (c)}i(τ-1), yi(1), . . . , yi(τ)].
The knowledge vectors {tilde over (Q)}i(τ), . . . , {tilde over (Q)}l(τ) are processed, at 1718, to extract the features {tilde over (V)}i(τ)={tilde over (H)}extractfeedback({tilde over (Q)}i(τ)). Sequence to Sequence processing, via an attention-based neural network, as described above in respect of the encoder 122 of node A 102, is performed, at 1720, via {tilde over (W)}(τ)={tilde over (H)}s2sfeedback({tilde over (V)}(τ)).
Finally, feedback symbol mapping is performed, at 1722, to establish the feedback symbols 116 via {tilde over (c)}i(τ)={tilde over (H)}mapfeedback({tilde over (W)}i(τ)), which are outputs for transmitting node A 102.
The functionality of the system 100 and any parts thereof can be realised in the form of machine instructions that can be processed by a machine comprising or having access to the instructions. The machine can comprise a computer, processor, processor core, DSP, a special purpose processor implementing the instructions such as, for example, an FPGA or an ASIC, circuitry or other logic, compiler, translator, interpreter or any other instruction processor. Processing the instructions can comprise interpreting, executing, converting, translating or otherwise giving effect to the instructions. The instructions can be stored on a machine readable medium, which is an example of machine-readable storage. The machine-readable medium can store the instructions in a non-volatile, non-transient or non-transitory, manner or in a volatile, transient, manner, where the term ‘non-transitory’ does not encompass transitory propagating signals. The instructions can be arranged to give effect to any and all operations described herein taken jointly and severally in any and all permutations. The instructions can be arranged to give effect to any and all of the operations, devices, systems, flowcharts, protocols or methods described herein taken jointly and severally in any and all permutations. In particular, the machine instructions can give effect to, or otherwise implement, the operations of the algorithms and/or flowcharts depicted in, or described with reference to,
Therefore,
The machine instructions 1802 comprise at least one or more than one of:
Instructions 1808 to realise an encoder,
Instructions 1810 to implement an accumulator at node A 102,
Instructions 1812 to implement a belief neural network,
Instructions 1814 to realise an accumulator at node B 104,
Instructions 1816 to implement a decoder,
Instructions 1818 to realise an encoder at node B 104
Instructions 1820 to implement Algorithm 1,
Instructions 1822 to implement Algorithm 2,
Instructions 1824 to implement Algorithm 3, and
Instructions 1826 to implement Algorithm 4.
the foregoing instructions 1808 to 1826 being taken jointly and severally in any and all permutations.
Advantageously, one or more than one of the examples described herein address or otherwise solve the following limitations of existing deep neural networks:
k∈Z+.
Examples can be realised in accordance with the following clauses:
Clause 1: An encoding method for a modulator of a transmitter to encode a source bitstream b∈{0,1}K×1 comprising K source bits using feedback encoding; the method comprising:
dividing the source bitstream into l=[K/m] groups of size m, such that b=[s1T, s2T, . . . , slT];
constructing a feature matrix,
where Fb comprises the source bits, the feature matrix also comprising at least selectable ones of: pairs of previously transmitted coded symbols, Fc, and estimated noise realisations, Fn, optionally, selectable ones of tuples of source bits, previously transmitted coded symbols, Fc, and estimated noise realisations, Fn, received via a-feedback signal transmitted by a receiver;
encoding the feature matrix, using attention-based neural sequence to sequence
(s2s) mapping, to generate a vector of l coded symbols, and
outputting the l coded symbols for transmitting to the receiver.
Clause 2: The method of clause 1, in which encoding the feature matrix to generate a vector of l coded symbols comprises:
preprocessing the feature matrix to extract a set of features that will influence encoding the feature matrix,
transforming (s2s), using an attention encoder, the feature matrix, Q(τ), into a sequence to establish new correlations between portions of the feature matrix using existing correlations between portions of the feature matrix, and
mapping the sequence into l coded symbols.
Clause 3: The method of clause 2, in which the transforming comprises transforming (s2s), using the attention encoder, the feature matrix, Q(τ), into the sequence to establish new column-wise correlations between columns of the feature matrix using existing column-wise correlations between columns of the feature matrix.
Clause 4: The method of any preceding clause, in which constructing the feature matrix,
comprising the source bits, previously transmitted coded symbols and estimated noise realisations received at the transmitter comprises:
generating a vector Fb∈{0,1}m×1, comprising the l groups of m source bits, Fb=[s1, s2, . . . sl].
Clause 5: The method of any preceding clause, in which constructing the feature matrix,
comprising the source bits, previously transmitted coded symbols and estimated noise realisations received at the transmitter comprises:
generating a vector Fc∈R(τ-1)×l comprising the previously transmitted coded symbols; each row of Fc comprising c(i), for i=1, . . . , τ−1, and zero-padded for i=τ, . . . , T−1, where τ is a temporal index of order of the previously transmitted symbols;
Clause 6: The method of any preceding clause, in which constructing the feature matrix,
comprising the source bits, previously transmitted coded symbols and estimated noise realisations received at the transmitter comprises:
generating a vector of estimated noise realisations Fn∈R(τ-1)×l observed at the feedback channel of the transmitter, such that
from the received feedback signal.
Clause 7: The method of any preceding clause in which outputting the 1 coded symbols for transmitting to the receiver comprises at least one, or both, of power normalisation and power reallocation to generate the l coded symbols c(τ)∈R1×l.
Clause 8: A decoding method for a demodulator of a receiver to decode a coded symbol stream comprising T symbols C(τ), τ=1, 2, . . . , T, iteratively derived from a bitstream b∈{0,1}K×1 comprising K source bits arranged into l=┌K/m┐ groups of size m, such that b=[s1T, s2T, . . . , slT] using feedback provided by the receiver; the decoding method comprising:
progressively/iteratively
receiving a current signal of a plurality of signals
y(τ)=c(τ)+n(τ); c(τ), n(τ)∈R1×l, comprising the T symbols; and
transmitting received symbols, c(τ), or the currently/most recently received signal, y(τ), comprising a currently/most recently received symbol, c(τ), to a transmitter associated with generating the symbols, c(τ);
constructing a feature matrix,
using the plurality of signals, y(τ), comprising the T symbols, c(τ), by progressively accumulating (y(τ))T for τ=1, 2, . . . , T; and
generating a decoded bitstream vector {circumflex over (b)}∈{0,1}K, comprising the l groups of m source bits, {circumflex over (b)}=[s1, s2, . . . sl], from the feature matrix, {tilde over (Q)}(τ), using a sequence to sequence neural network/attention neural network.
Clause 9: The method of clause 8, in which generating the decoded bitstream vector {circumflex over (b)}, comprising the l groups of m source bits, {circumflex over (b)}=[s1, s2, . . . sl], from the feature matrix, {tilde over (Q)}(τ), comprises:
preprocessing the feature matrix, {tilde over (Q)}(τ), to extract a set of features, {tilde over (V)}∈Rbsxlx, for influencing generating the decoded bitstream;
transforming (s2s), using an attention encoder, the feature matrix, {tilde over (Q)}(τ), into a sequence to using correlations between portions of the feature matrix, and
mapping the sequence into the l decoded symbols.
Clause 10: The method of clause 9, in which transforming (s2s), using an attention encoder, the feature matrix, {tilde over (Q)}(τ), into a sequence to using correlations between portions of the feature matrix comprises
transforming (s2s), using an attention encoder, the feature matrix, {tilde over (Q)}(τ), into a sequence using column-wise correlations between columns of the feature matrix.
Clause 11: The method of any preceding clause in which generating a decoded bitstream vector, {circumflex over (b)}, comprises reshaping the output from sequence to sequence neural network/attention neural network.
Clause 12: Machine readable instructions arranged, when processed, to implement a method of any preceding clause.
Clause 13: Machine readable storage storing machine readable instructions of clause 12.
Clause 14: An encoder comprising circuitry arranged to implement a method of any of clauses 1 to 7.
Clause 15: A decoder comprising circuitry arranged to implement a method of any of clauses 8 to 11.
Number | Date | Country | Kind |
---|---|---|---|
2207152.6 | May 2022 | GB | national |