HETEROGENEOUS PRODUCT AUTOENCODERS FOR CHANNEL-ADAPTIVE NEURAL CODES OF LARGE DIMENSIONS

Information

  • Patent Application
  • 20240256864
  • Publication Number
    20240256864
  • Date Filed
    February 01, 2024
    10 months ago
  • Date Published
    August 01, 2024
    4 months ago
Abstract
A method of training an autoencoder that includes encoder neural networks and decoder neural networks. The method includes training the encoder neural networks in which weights of the decoder neural networks are fixed. The method also includes iteratively training the decoder neural networks for a number of iterations. For each iteration of the training of the decoder neural networks, a pair of decoder neural networks is replaced by another pair of neural networks, and a second decoder neural network of the pair of decoder neural networks utilizes different parameters than a first decoder neural network of the pair of decoder neural networks.
Description
BACKGROUND
1. Field

Aspects of embodiments of the present disclosure relate to channel encoders and decoders implemented using trained neural networks.


2. Description of the Related Art

Channel encoders and decoders improve the reliability of communication systems when transmitting and receiving data over a noisy communication channel. Generally, an encoder implementing an error correction code (or error correcting code ECC) takes an original message as input and generates an encoded message, where the encoded message has some additional bits of data in comparison to the original message (e.g., the encoded message is longer than the original message). These additional bits of data provide redundancy such that, if the encoded message is corrupted or otherwise modified between being transmitted from an encoder and being received at a decoder, the decoder can use the additional data to reconstruct the original message, within some limits on the number of errors that can be corrected in accordance with the ECC that is applied to the original message. Examples of classical error correction codes include Reed-Solomon codes, Turbo codes, low-density parity-check (LDPC) codes, and polar codes.


Recently, the encoder and decoder (or some components within the encoder and decoder architectures) have been replaced with neural networks or other trainable models, which reduces the encoding and decoding complexity, improves upon the performance of classical channel codes, enables applications for realistic channel models and emerging use cases, and enables designing universal decoders that simultaneously decode several codes. However, the size of the code spaces (2k distinct codewords for a binary linear code of dimension k) present a major technical challenge in the channel coding context. Due to these large code spaces, only a small fraction of all codewords will be seen during the training phase, and thus the trained models for the encoders and decoders may fail in generalizing to unseen codewords. Additionally, a straightforward design of neural encoders and decoders for large code dimensions and lengths requires using huge networks with excessively large number of trainable parameters. Together, these factors make it prohibitively complex to design and train relatively large neural channel encoders and decoders. Another major challenge is the joint training of the encoder and decoder due to local optima that may occur as a result of non-convex loss functions.


The above information disclosed in this Background section is only for enhancement of understanding of the present disclosure, and therefore it may contain information that does not form the prior art that is already known to a person of ordinary skill in the art.


Summary

The present disclosure relates to various embodiments of a method of training an autoencoder that includes encoder neural networks and decoder neural networks. In one embodiment, the method includes training the encoder neural networks. Weights of the decoder neural networks are fixed during the training of the encoder neural networks. The method also iteratively training the decoder neural networks for a number of iterations. For each iteration of the training of the decoder neural networks, a pair of decoder neural networks is replaced by another pair of neural networks, and a second decoder neural network of the pair of decoder neural networks utilizes different parameters than a first decoder neural network of the pair of decoder neural networks.


The second decoder neural network may utilize a larger blocklength than the first decoder neural network.


The second decoder neural network may utilize a smaller rate than the first decoder neural network.


Iteratively training the decoder neural networks may include training all of the decoder neural networks for each of a first number of iterations; training only a single pair of decoder neural networks for each of a second number of iterations after the first number of iterations; and training all of the decoder neural networks for each of a third number of iterations after the second number of iterations.


The noisy channel may be a first type of channel, and the method may further include retraining the autoencoder on a second type of channel different than the first type of channel.


The first type of channel may be an additive white Gaussian noise (AWGN) channel, and the second type of channel may be a Rayleigh fading channel.


Retraining the autoencoder may include performing a single training epoch on the second type of channel.


The signal-to-noise ratio may be a first signal-to-noise ratio in a first range, and the method may further include retraining the autoencoder for a number of epochs over the noisy channel having a second signal-to-noise ratio different than the first signal-to-noise ratio.


The second signal-to-noise ratio may be larger than the first signal-to-noise ratio.


The second signal-to-noise ratio may be a wider range than the first signal-to-noise ratio.


The number of epochs may be 11 epochs.


The message may have a code dimension of at least 300 bits.


Training the encoder neural networks may include applying power normalization.


Training the decoder neural networks may include applying power normalization.


The present disclosure also relates to various embodiments of an autoencoder. In one embodiment, the autoencoder includes a number of encoder neural networks configured to map a message to a codeword and to transmit the codeword over a noisy channel having a signal-to-noise ratio, and a number of decoder neural networks configured to decode the message. A second decoder neural network of the pair of decoder neural networks utilizes different parameters than a first decoder neural network of the pair of decoder neural networks.


The second decoder neural network may utilize a larger blocklength than the first decoder neural network.


The second decoder neural network may utilize a smaller rate than the first decoder neural network.


The autoencoder may be trained on a first type of channel and on a second type of channel different than the first type of channel.


The first type of channel may be an additive white Gaussian noise (AWGN) channel, and the second type of channel may be a Rayleigh fading channel.


The autoencoder may be trained on a noisy channel having a first signal-to-noise ratio and a second signal-to-noise ratio different than the first signal-to-noise ratio.


This summary is provided to introduce a selection of features and concepts of embodiments of the present disclosure that are further described below in the detailed description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used in limiting the scope of the claimed subject matter. One or more of the described features may be combined with one or more other described features to provide a workable system or method.





BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, together with the specification, illustrate example embodiments of the present invention, and, together with the description, serve to explain the principles of the present invention.



FIG. 1 is a block diagram of a communication system according to one embodiment of the present disclosure;



FIG. 2 is a block diagram of encoder architecture according to one embodiment of the present disclosure;



FIG. 3 is a block diagram of decoder architecture according to one embodiment of the present disclosure;



FIG. 4 is a flowchart illustrating tasks of a method of retraining a product autoencoder on a channel having a larger and wider range of signal-to-noise ratios according to one embodiment of the present disclosure; and



FIG. 5 is a flowchart illustrating tasks of a method of retraining a product autoencoder on a different channel type according to one embodiment of the present disclosure.





DETAILED DESCRIPTION

The present disclosure relates to various embodiments of a large-scale product autoencoder including an encoding artificial neural network and a decoding artificial neural network, and methods of training a product autoencoder on large code dimensions, such as 300 bits of information or more. The encoding artificial neural network and the decoding artificial neural network are trained for a large code using smaller code components (i.e., the present disclosure trains the encoding artificial neural network and the decoding artificial neural network on smaller code components rather than directly training the encoding artificial neural network and the decoding artificial neural network for a large code). Additionally, the training scheme of the present disclosure utilizes a heterogeneous decoder training scheme in which more powerful codes (e.g., different parameters, such as larger blocklengths or smaller code rates) are utilized for one decoder neural network than the other decoder neural network. The present disclosure also includes training schemes to reduce (e.g., remove) the error floor in the performance of the product autoencoder by retraining the product autoencoder with larger values and a wider range of the training signal-to-noise ratios (SNRs). The present disclosure further includes a method of fine-tuning the product autoencoder by retraining the product autoencoder on a different channel type than it was originally trained (e.g., retraining a product autoencoder that was trained on an additive white Gaussian noise (AWGN) channel on a Rayleigh fading channel model).



FIG. 1 depicts an autoencoder communication system 100 for transmitting and receiving information across a noisy channel 200 (e.g., an additive white Gaussian noise (AWGN) channel). In the illustrated embodiment, the autoencoder communication system 100 includes an encoder 101 including a first encoder 102 and a second encoder 103, and a decoder 104 that includes a first decoder 105 and a second decoder 106. In the illustrated embodiment, the first encoder 102, the second encoder 103, the first decoder 104, and the second decoder 105 are each fully-connected artificial neural networks (FCNNs) (i.e., the autoencoder 100 includes a first encoder FCNN 102, a second encoder FCNN 103, a first decoder FCNN 105, and a second decoder FCNN 106). The first encoder FCNN 102 and the second encoder FCNN 103 are parametrized with a set of weights ϕ1 and ϕ2, respectively, and the first decoder FCNN 105 and the second decoder FCNN 106 are parametrized with a set of weights θ1 and θ2, respectively. Each of the first and second encoder FCNNs 102, 103 has an input size of kj, an output size of nj, and Lencj hidden layers of size Nencj, where j=1 for the first encoder FCNN 102 and j=2 for the second encoder FCNN 103.



FIG. 1 depicts the transmission of a length-k (e.g., a k-bit) sequence of information bits u across the noisy channel 200. The encoder FCNNs 102, 103 map the information bits u to a length-n sequence of coded symbols c=E(u), known as a codeword, wherein k and n are the code dimension and blocklength, respectively, and the resulting code is denoted by an (n, k) code. FIG. 1 depicts an example of a two-dimensional (n, k) product code that is constructed from two smaller codes: C1:(n1, k1) and C2:(n2, k2), where n=n1n2 and k=k1k2. The first encoder FCNN 102 implements the (n1, k1) code C1, and the second encoder FCNN 103 implements the (n2, k2) code C2. As noted above, k=k1k2 and therefore the k-bit input codeword U can be reshaped into a k2×k1 matrix. Because each row of the matrix has k1 symbols, the first encoder FCNN 102 applies the (n1, k1) code C1 independently to each of the k2 rows of the input to generate a k2×n1 matrix Uk2×k1(1). This first intermediate encoded message Uk2×k1(1) is supplied to the second encoder FCNN 103. Here, because each column of the first intermediate encoded message Uk2×k1(1) has k2 symbols, the second encoder FCNN applies 103 the (n2, k2) code C2 independently to each column to generate an n2×n1 output matrix Un2×n1(2). Because this is a two-dimensional product code, there are only two encoder FCNNs 102, 103 (i.e., there are no further stages in this pipeline of encoder stages), and this n2×n1 output matrix Un2×n1(2) is the length n coded message to be transmitted on or over the channel 200. In one or more embodiments in which the product code is a three-dimensional product code or larger, an i-th stage encoder FCNN implements a code Ci:(ni, ki) where ni is a factor of n and ki is a factor of k.


Noise and other interference in the channel 200 can modify the data in the output matrix Un2×n1(2) transmitted by the second encoder FCNN 103. The noisy signal can be expressed as y=c+n, where n is the channel noise vector (independent from c) whose components are Gaussian random variables with mean zero and variance σ2. The ratio of the average energy per coded symbol to the noise variance is the signal-to-noise ratio (SNR). In one or more embodiments, the encoder FCNNs 102, 103 satisfy a soft power constraint such that the average power per coded bit is equal to 1 and the SNR=1/σ2.


Additionally, as shown in FIG. 1, the first and second decoder FCNNs 105, 106 decode the message (i.e., the noisy signal y) in the reverse order in which the first and second encoder FCNNs 102, 103 applied the codes C1 and C2. In the example shown in FIG. 1, the first encoder FCNN 102 applied code C1 and then the second encoder FCNN 103 applied code C2. Accordingly, the second decoder FCNN 106 is first applied for code C2 (e.g., to decode each column of n2 symbols) and then the first decoder FCNN 105 is applied for code C1 (e.g., to decode each row of n2 symbols).


In the example shown in FIG. 1, applying the decoder FCNNs 105, 106 does not change the shape of the data. For example, applying the second decoder FCNN 106 to the columns of the n2×n1 input results in an n2×n1 second intermediate decoder output Yn2×n1(2) and applying the first decoder FCNN 105 to the rows of the second intermediate decoder output Yn2×n1(2) produces a first intermediate decoder output Yn2×n1(1) having dimensions n2×n1. As such, an extraction circuit may be used to extract a length-k sequence (e.g., k2×k1 matrix) from the first intermediate decoder output Yn2×n1(1).


In addition, in some embodiments, decoding performance can be improved by applying a soft-input soft-output (SISO) decoder and also performing several iterations, where the output of the product decoder 104 (e.g., the output of the last decoder stage, in this case the first decoder FCNN 105 as shown in FIG. 1) is fed back as input to the product decoder 104 (e.g., the input to the second decoder FCNN 106 as shown in FIG. 1) for some set number of iterations (I iterations) before extracting the estimated message û from the output of the product decoder 104. These iterations I can improve the decoding performance because, in some circumstances, errors that were not corrected during a row decoding might be corrected in a later column decoding (or vice versa) or after several iterations (e.g., reconstructing some lost data may enable other, previously not reconstructable, lost data to be subsequently reconstructed).


While the above discussion of FIG. 1 relates to a two-dimensional code, product codes are not limited thereto and can also be applied to M-dimensional product codes having parameters (n, k, d) that uses M binary linear block codes C1, C2, . . . , CM. Here, each l-th encoder encodes the l-th dimension of the M-dim input matrix, and each l-th decoder decodes the vectors on the l-th dimension of the M-dim output matrix. In each code (e.g., an l-th code Cl:(nl, kl, dl)) with generator matrix Gl for l=1, 2, . . . , M, the parameters of the resulting product code are:

    • Block-length: n=Πl=1Mnl
    • Dimension: k=Πl=1Mkl
    • Rate: R=Πl=1M Rl
    • Minimum distance: d=Πl=1M dl
    • Generator matrix: G=G1⊗G2⊗ . . . ⊗GM, where ⊗ is the Kronecker product operator.


In one or more embodiments, the encoder FCNNs 102, 103 and the decoder FCNNs 105, 106 may be components of the same computer system (e.g., integrated within a single enclosure, such as in the case of a smartphone or other mobile device, tablet computer, or laptop computer), may be separate components of a computer system (e.g., a desktop computer in communication with an external monitor), or may be separate computer systems (e.g., two independent computer systems communicating over the communication channel), or variations thereof (e.g., implemented within special purpose processing circuits such as microcontrollers configured to communicate over the communication channel, where the microcontrollers are peripherals within a computer system). In one or more embodiments, the encoder FCNNs 102, 103 and the decoder FCNNs 105, 106 may be implemented on different communication devices of the communication system and that communicate with one another over a communication channel. For example, the encoder FCNNs 102, 103 may be implemented in user equipment such as a smartphone and the decoder FCNNs 105, 106 may be implemented in a base station. In one or more embodiments, these communication devices are transceivers that can both transmit and receive data. For example, a smartphone may include both the encoder FCNNs 102, 103 and the decoder FCNNs 105, 106 for transmitting and receiving data, respectively, where these may implement the same neural codes (e.g., same error correction codes) or different neural codes (e.g., different error correction codes).


In various embodiments, the encoder FCNNs 102, 103 and the decoder FCNNs 105, 106 may be implemented by processing circuits of a communication device. In one or more embodiments, the various processing circuits may be components of a same integrated circuit (e.g., as being components of a same system on a chip or SoC) or may be components of different integrated circuits that may be connected through pins and lines on a printed circuit board. Additionally, in one or more embodiments, the encoder circuit may be implemented using a different type of processing circuit than the decoder circuit. Examples of processing circuits include, but are not limited to, a general-purpose processor core (e.g., included within application processors, system-on-chip processors, and the like), a central processing unit (CPU), an application processor (AP) or application processing unit (APU), a field programmable gate array (FPGA which may include a general-purpose processor core), an application specific integrated circuit (ASIC) such as a display driver integrated circuit (DDIC), a digital signal processor (DSP), a graphics processing unit (GPU), a neural accelerator or neural processing unit, and combinations thereof (e.g., controlling an overall encoding or decoding process using a general-purpose processor core that controls a neural accelerator to perform neural network operations such as vector multiplications and accumulations and to apply non-linear activation functions). The encoder FCNNs 102, 103 and the decoder FCNNs 105, 106 may be defined in accordance with an architecture and a plurality of parameters such as weights and biases of connections between neurons of different layers of the neural networks of various neural stages of the encoder FCNNs 102, 103 and the decoder FCNNs 105, 106. In some embodiments of the present disclosure, these parameters may be stored in memory and accessed by the processing circuits during runtime to perform computations implementing the neural stages. In some embodiments of the present disclosure, the processing circuit is configured with these parameters (e.g., fixed in a lookup table or as constant values in a special-purpose DSP, ASIC, FPGA, or neural processing unit).


In one or more embodiments, the communication link may be a wireless communication link such as a cellular connection or a local wireless network connection (e.g., Wi-Fi connection) between a client mobile device (e.g., smartphone, laptop, or other user equipment (UE)) and a base station (e.g., gNodeB or gNB in the case of a 5G-NR base station or a Wi-Fi access point or Wi-Fi router), where the devices may transmit and receive data over the communication channel using processing circuits according to the present disclosure integrated into the respective devices, where the data is formatted or encoded and decoded in accordance with a communication protocol (e.g., a cellular communication protocol such as 6G wireless or a wireless networking communication protocol such as a protocol in the IEEE 802.11 family of protocols).


Algorithm 1 below depicts a process of training the autoencoder 100 according to one embodiment of the present disclosure. The weights ϕ1, ϕ2 and θ1, θ2 of the first and second encoder FCNNs 102, 103 and the first and second decoder FCNNs 105, 106, respectively, are initialized and the process of training the autoencoder 100 includes performing E training epochs. For each training epoch e of the total number of training epochs E, the process includes a decoder training schedule in which the decoder FCNNs 105, 106 are trained for Tdec iterations, and an encoder training schedule in which the encoder FCNNs 102, 103 are trained for Tenc iterations. During the encoder training schedule, the decoder FCNNs 105, 106 remain fixed, and during the decoder training schedule the encoder FCNN 102, 103 remain fixed (i.e., the weights θ1, θ2 of the decoder FCNNs 105, 106 remaining fixed during the training of the encoder FCNNs 102, 103, and the weights ϕ1, ϕ2 of the encoder FCNNs 102, 103 remain fixed during the training of the decoder FCNNs 105, 106).


As illustrated below, for each training iteration id of the total number of training iterations Tdec the decoder FCNNs 105, 106, a batch of B message words are generated and the encoder FCNNs 102, 103 are utilized to generate codewords C from the message words in which the code dimensions are K={k1, k2 . . . , kM} and the blocklengths are N={n1, n2 . . . , nM}. The process of training the decoder FCNNs 105, 106 also includes generating a batch of B noise vectors with signal-to-noise ratios (SNRs) from the range [γd,1, γd,u] (i.e., generate N) and then generating a batch of noisy codewords (i.e., Y=C+N) using the noise vectors, which represent the noise of the communication channel 200. The process of training the decoder FCNNs 105, 106 also includes generating decoded sequences Û utilizing the decoder FCNNs 105, 106 and then applying an optimizer to update the weights Θ of the decoder FCNNs 105, 106 while keeping the weights ϕ of the encoder FCNNs 102, 103 fixed.


Additionally, as illustrated below, for each training iteration ie of the total number of training iterations Tenc of the encoder FCNNs 102, 103, a batch of B message words are generated and the encoder FCNNs 102, 103 are utilized to generate codewords C from the message words. The process of training the encoder FCNNs 102, 103 also includes generating a batch of B noise vectors with signal-to-noise ratio (SNR) γe (i.e., generate N) and then generating a batch of noisy codewords (i.e., Y=C+N) using the noise vectors. The process of training the encoder FCNNs 102, 103 also includes generating decoded sequences Û utilizing the decoder FCNNs 105, 106 and then applying an optimizer to update the weights ϕ of the encoder FCNNs 102, 103 while keeping the weights Θ of the decoder FCNNs 105, 106 fixed. In this manner, the weights ϕ and θ of the first and second encoder FCNNs 102, 103 and the first and second decoder FCNNs 105, 106 are determined.












Algorithm 1 ProductAE Training Algorithm















Input: Dimension of ProductAE M, Set of code dimensions custom-character  := {k1, k2, ... ,kM} and blocklengths



custom-character  := {n1, n2, ... ,nM}, number of epochs E, batch size custom-character  , number of encoder and decoder training



iterations Ttext missing or illegible when filed  and Ttext missing or illegible when filed  encoder training SNR text missing or illegible when filed , decoder training SNR range [text missing or illegible when filed ], and encoder


and decoder learning rates text missing or illegible when filed  and text missing or illegible when filed


Output: Encoder and decoder NN weights ϕ and Θ.








 1:
Initialize ϕ and Θ









 2:
for e = 1, ... ,E do

custom-character  perform E training epochs



 3:
 for text missing or illegible when filed  = 1, ... ,Ttext missing or illegible when filed  do
custom-character  decoder training schedule








 4:
  U ← generate a batch of custom-character  message words


 5:
  C ← ProductAE_Enc(U, ϕ, custom-character  , custom-character  )


 6:
  N ← gencrate a batch of custom-character  noise vectors with SNRs from the range text missing or illegible when filed









 7:
  Y ← C + N
custom-character  batch of noisy codewords








 8:
  Û ← ProductAE_Dec(Y, Θ, custom-character  , custom-character  )


 9:
  Θ ← Optimizer(custom-character  (U, Û), Θ, text missing or illegible when filed ) custom-character  apply the optimizer to update Θ while keeping ϕ



fixed


10:
 end for









11:
 for text missing or illegible when filed  = 1, ... , Ttext missing or illegible when filed  do
custom-character  encoder training schedule








12:
  U ← generate a batch of custom-character  message words


13:
  C ← ProductAE_Enc(U, ϕ, custom-character  , custom-character  )


14:
  N ← a batch of B noise vectors with SNR text missing or illegible when filed


15:
  Y ← C + N


16:
  Û ← productAE_Dec(Y, Θ, custom-character  , custom-character  )


17:
  ϕ ← optimizer( custom-character  (U, Û), ϕ, text missing or illegible when filed ) custom-character  apply the optimizer to update ϕ while keeping Θ



fixed


18:
 end for


19:
end for


20:
return ϕ and Θ






text missing or illegible when filed indicates data missing or illegible when filed








FIG. 2 is a block diagram illustrating a scheme for training the encoder FCNNs 102, 103 (e.g., a two-dimensional product autoencoder (M=2) including a first encoder FCNN (E11)) and a second encoder FCNN (E22)) of the communication system 100 for transmitting data over the noisy communication channel 200 (e.g., an AWGN) channel) according to one embodiment of the present disclosure.


During a process of training the first and second encoder FCNNs 102, 103, a batch of B length-k1k2 binary information sequences 107 is reshaped to a tensor UB×k2×k1 108 to fit the first and second encoder FCNNs 102, 103. In response to receiving the tensor UB×k2×k1 108 as an input, the first encoder FCNN 102 maps each length-k1 row to a length-n1 real-valued vector, resulting in a tensor UB×k2×n1(1) that is sent or transmitted to the second encoder FCNN 103. In response to receiving the tensor UB×k2×n1(1); the second encoder FCNN 103 maps each real-valued length-k2 vector in the second dimension of the tensor UB×k2×n1(1) to a length-n2 real-valued vector. In one or more embodiments, the mappings may be nonlinear mappings such that the resulting code is nonlinear and non-binary code.


In the illustrated embodiment, the length-n2 real-valued vector (i.e., the codeword) is output by the second encoder FCNN 103 to a power normalization task or process 109. The power normalization task 109 is configured to ensure that the average power per coded bit is equal to one and thus the average SNR is equal to the given SNR. In the illustrated embodiment, the length-n real-valued vector c=(c1, c2, . . . , cn) of the coded sequence at the output of the second encoder FCNN 103 is normalized as follows:







c


"\[Rule]"


c



:=


c



c


2


×


n

.






Therefore, ∥c′∥22=n, and thus the average power per coded symbol is equal to one.


After the input is encoded by the first and second encoder FCNNs 102, 103 and the coded sequence is normalized by the power normalization function 109, the real-valued codewords C are passed through the channel 200 (e.g., the AWGN channel) and then decoded using the decoder 104 to generate a batch of decoded codewords ÛB×k2k1 (after appropriate reshaping). As described above, the weights θ of the decoder 104 are fixed during this process. Additionally, as illustrated in FIG. 2, the process includes computing the loss between the transmitted and decoded sequences L(U, Û) 110, backpropagating the loss to compute its gradients 111, and then updating the weights ϕ1, ϕ2 of the first and second encoder FCNNs utilizing an encoder optimizer (Adam with Irenc) 112. This process is repeated Tenc times while each time only updating the weights ϕ1, ϕ2 of the first and second encoder FCNNs 102, 103 while the weights θ of the decoder 104 remain fixed.



FIG. 3 is a block diagram illustrating a scheme for training the decoder FCNNs 105, 106 (e.g., a two-dimensional product autoencoder (M=2) including first decoder FCNN (D11) and second decoder FCNN (D2θ2)) of the communication system 100 for receiving data over the noisy communication channel 200 (e.g., an AWGN channel) according to one embodiment of the present disclosure.


As illustrated in FIG. 3, a batch of B random sequences UB×k2k1 113 are reshaped into binary information sequences UB×k2×k1 114. For each epoch of the decoder training schedule, the reshaped binary information sequences UB×k2×k1 are encoded into real-valued codewords utilizing the encoder 101 (e.g., the first and second encoder FCNNs 102, 103). During this process, the weights ϕ1, ϕ2 of the encoder FCNNs 102, 103 are fixed. The real-valued codewords are then passed through the channel 200 (e.g., the AWGN channel), with a range of decoder training signal-to-noise ratios (SNR) to generate a batch of noisy codewords YB×n2×n1. The noisy codewords YB×n2×n1 are then decoded using the decoder FCNNs 105, 106 to generate a batch of decoded codewords ÛB×k2k1. For each training epoch, the decoder FCNNs 105, 106 are trained for/iterations.


As illustrated in FIG. 3, the pair of decoder FCNNs 105, 106 are replaced at each i-th iteration (i=1, 2, . . . I) with a pair of distinct decoder FCNNs (i.e., another pair of decoder FCNNs), which results in a total number of decoder FCNNs 105, 106 of 2×I (i.e., the process of training the decoder FCNNs 105, 106 includes performing I decoding iterations in which each iteration i utilizes a distinct pair of decoder FCNNs 105, 106). In FIG. 3, the decoder FCNNs 105, 106 are denoted Dj(i), where j=1, 2. Each of the decoder FCNNs 105, 106 are parametrized by a set of weights Θj(i). At each decoding iteration i, the decoding is first performed by the second decoder FCNN D2 106 and then by the first decoder FCNN D1 105. Because the decoding is first performed by the second decoder FCNN D2 106, the second decoder FCNN D2 106 is expected to observe noisier (less reliable) codewords than the first decoder FCNN D1 105. Accordingly, in one or more embodiments, the process of training the decoder FCNNs 105, 106 includes utilizing more powerful codes (e.g., different parameters, such as larger blocklengths or smaller code rates) for C2 compared to C1. For example, in one or more embodiments, the training process may include utilizing larger blocklengths and/or smaller code rates for C2 compared to C1.


Additionally, as illustrated in FIG. 3, the process includes computing the loss between the transmitted and decoded sequences L(U, Û) and backpropagating the loss to compute its gradients 115, and then updating the weights Θ1, Θ2 of the first and second decoder FCNNs utilizing a decoder optimizer (Adam with Irdec) 116. This process is repeated Tdec times while each time only updating the weights Θ1, Θ2 of the first and second decoder FCNNs 105, 106 while the weights ϕ of the encoder 101 (e.g., the first and second FCNNs 102, 103) remain fixed.


As described above, the process of training the decoder FCNNs 105, 106 includes I decoding iterations in which each iteration i utilizes a distinct pair of decoder FCNNs 105, 106. As such, the decoder FCNNs 105, 106 have a relatively more complex network with more learnable parameters compared to the encoder FCNNs 102, 103. This greater complexity and number of learnable parameters may be utilized to improve the performance of the decoder FCNNs 105, 106 by performing separate decoder training schedules per epoch. That is, in one or more embodiments, the process of training the decoder FCNNs 105, 106 includes a multi-schedule training process. In one or more embodiments, the multi-schedule training process includes a first schedule in which all of the decoder FCNNs 105, 106 are trained together for Tdec,s iterations. During the next I schedules or iterations, only a single pair of decoder FCNNs 105, 106, corresponding to a given decoding iteration, is updated. Afterward, another full decoder training schedule takes place in which all of the decoder FCNNs 105, 106 are trained together for Tdec,e iterations. Finally, the encoder FCNNs 102, 103 are updated for Tenc iterations. In one or more embodiments, this multi-schedule training process speeds up the process of training the decoder FCNNs 105, 106 compared to a training process in which the decoder FCNNs 105, 106 are not trained in this manner.


Aspects of the present disclosure also relate to training methods for increasing the robustness and the adaptivity of the product autoencoder 100. Robustness refers to the ability of an autoencoder trained over a particular channel model to perform well on a different channel model without retraining, and adaptivity refers to the ability of an autoencoder to retrain for (i.e., adapt to) a different channel model with minimal retraining. For instance, applying a product autoencoder trained for AWGN channels to a fading channel model may result in error rate saturation for larger signal-to-noise ratio (SNR) values, which results in an error floor. This saturation of the performance is due to the training SNRs of the encoder and the decoder being in a range that does not include any (or at least does not include a significant number of) noisy samples with the large channel SNRs of the fading channel model.


In the embodiment illustrated in FIG. 4, the method 300 includes a task 310 of training a product autoencoder on a channel (e.g., an AWGN channel) having a first range of signal-to-noise ratios (SNRs), and a task 320 of re-training the product autoencoder on a channel with larger values and wider range of the training SNRs for a few epochs (e.g., a number of epochs from 2 to 20). In this manner, retraining the product autoencoder with larger values and wider range of the training SNRs eliminates or at least reduces the performance saturation behavior at higher SNRs (e.g., eliminates the error floor). In one or more embodiments, the task 320 of fine-tuning the product autoencoder trained for the AWGN channel with larger values and wider ranges of the training SNRs decreases the performance saturation behavior such that the error floor disappears (or substantially disappears) after 11 training epochs with the larger and wider ranges of the SNRs.


Aspects of the present disclosure also relate to methods for fine-tuning a product autoencoder that was trained on one channel type to fully adapt to another channel model type. For instance, in the embodiment depicted in FIG. 5, the method 400 includes a task 410 of training a product autoencoder on a first type of noisy channel (e.g., an AWGN channel) and a task 420 of training the autoencoder on a second type of noisy channel (e.g., Rayleigh fading channel model) different than the first type of noisy channel. In one or more embodiments, the task 420 of training the product autoencoder on the second task may be performed for only a single training epoch, only two training epochs, or only a few training epochs. In one or more embodiments, a product autoencoder that was trained over the AWGN model was fine-tuned to achieve excellent performance over the Rayleigh fading channel model by training the product autoencoder with only one full epoch over the Rayleigh fading channel model. In this manner, a product autoencoder may be trained to perform well over multiple channel models by fine-tuning the product autoencoder that was trained on one channel type for one epoch (or two or more epochs) over a different channel model, rather than fully retraining the product autoencoder on the new channel model. In one or more embodiments, these fine-tuned product autoencoders deliver almost the same performance as fully re-trained product autoencoders. Additionally, in one or more embodiments, the task 420 of fine-tuning does not degrade the performance of the fine-tuned product autoencoder over the AWGN channel when the fine-tuning is performed for a relatively small number of epochs, such as 20 or fewer epochs (e.g., 15 or fewer epochs, or 10 or fewer epochs).


Algorithm 2 below depicts the process of generating codewords utilizing the encoder FCNNs 102, 103 and the power normalizer 109 according to one embodiment of the present disclosure. As illustrated below, a batch of input information sequences U are reshaped to an M-dimensional array according to the set of code dimensions K={k1, k2 . . . , kM}. Additionally, the process includes applying the m-th encoder FCNN 101 to the m-dimension of the input information sequence U to map the length km vectors to length nm vectors. The process also includes a task of reshaping each array in the batch of input information sequences U to a sequence of length n=n1n2 . . . nM, and then a task of applying the power normalizer function 109 to generate the codewords C.












Algorithm 2 ProductAE Encoder Function ProductAE_Enc















Input: Batch of input information sequences U, set of code dimensions custom-character  := {k1, k2, ... ,kM} and


blocklengths custom-character  := {n1, n2, ... ,nM}, and encoder NNs


Output: Batch of codewords C.








 1:
U ← reshape each sequence of bits in U to an M-dimensional array according to the dimensions



set  custom-character


 2:
for m = 1, ... ,M do


 3:
 U ← apply the m-th NN encoder to the m-dimension of U to map length-km vectors to length-nm



vectors


 4:
end for


 5:
C1 ← reshape each array in the batch U to a sequence of length n = n1n2...nM


 6:
C ← PowerNormalizer(C1)


 7:
return C









Algorithm 3 below depicts the process of decoding the noisy codewords Y (i.e., the codewords C output by the encoder FCNNs 101 and the noise from the noisy channel 200) with the decoder FCNNs 105, 106 according to one embodiment of the present disclosure. As illustrated below, the first I−1 pairs of decoder FCNNs 105, 106 work on input and output sizes of the length of coded bits nj, and the last pair of decoder FCNNs 105, 106 (i.e., the I-th pair of decoder FCNNs 105, 106) reverts the encoding operation by reducing the lengths from nj to kj. In one or more embodiments, some of the decoder FCNNs 105, 106 may take multiple length-n; vectors as input and output multiple copies of length-nj vectors. In Algorithm 3, the function “reshape” reshapes the tensor upon which the function is being applied to the shape specified by the function arguments. Moreover, the function “concatenate([X, Y], l)” constructs a larger tensor by concatenating two tensors X and Y in dimension l. Finally, the function “permute” returns a new tensor by permuting the dimensions of the original tensor according to the specified arguments. Although the decoding algorithm in Algorithm 3 is for two-dimensional product autoencoders (M=2), the decoding algorithm may be generalized to higher-dimensional (M-dimensional) product autoencoders.












Algorithm 3 Decoder Function of 2-Dimensional ProductAEs ProductAE_Deg_2D















Input: Batch of noisy codewords Y, set of code dimensions k1, k2 and blocklengths n1, n2, number of


decoding iterations I, and decoder NNs


Output: Batch of decoded sequences Û.









 1:
Y ← Y.reshape(B, n1, n2)

custom-character  reshape Y to a 3D tensor









 2:
if I == 1 then


 3:
 Ytext missing or illegible when filed  ← Y


 4:
else


 5:
 for i = 1, ... ,I − 1 do


 6:
  if i == 1 then


 7:
   Y2 ← custom-charactertext missing or illegible when filed (Y).reshape(B, Fn1, n2)


 8:
  else


 9:
   Ytext missing or illegible when filed  ← custom-charactertext missing or illegible when filed (Ytext missing or illegible when filed )


10:
   Y2 ← (Ytext missing or illegible when filed  − Ytext missing or illegible when filed ).reshape(B, Fn1, n2)


11:
  end if


12:
  Ytext missing or illegible when filed  ← concatenate([Y, Y2],1).permute(0,2,1)


13:
  Ytext missing or illegible when filed  ← custom-charactertext missing or illegible when filed  (Ytext missing or illegible when filed ).permute(0,2,1)


14:
  Ytext missing or illegible when filed  ← (Y1 − Y2).reshape(B, n1, Fn2)


15:
  Ytext missing or illegible when filed  ← concatenate([Y, Ytext missing or illegible when filed ], 2)


16:
 end for


17:
end if


18:
Y2 ← custom-charactertext missing or illegible when filed (Ytext missing or illegible when filed ).reshape(B, Fn1, k2)


19:
Y1 ← custom-charactertext missing or illegible when filed (Y2).permute(0,2,1))


20:
Û ← Y1.reshape(B, k1k2)


21:
return Û






text missing or illegible when filed indicates data missing or illegible when filed







It should be understood that the sequence of steps of the processes described herein in regard to various methods and with respect various flowcharts is not fixed, but can be modified, changed in order, performed differently, performed sequentially, concurrently, or simultaneously, or altered into any desired order consistent with dependencies between steps of the processes, as recognized by a person of skill in the art. Further, as used herein and in the claims, the phrase “at least one of element A, element B, or element C” is intended to convey any of: element A; element B; element C; elements A and B; elements A and C; elements B and C; and elements A, B, and C.


Embodiments of the present invention can be implemented in a variety of ways as would be appreciated by a person of ordinary skill in the art, and the term “processor” as used herein may refer to any computing device capable of performing the described operations, such as a programmed general purpose processor (e.g., an ARM processor) with instructions stored in memory connected to the general purpose processor, a field programmable gate array (FPGA), and a custom application specific integrated circuit (ASIC). Embodiments of the present invention can be integrated into a serial communications controller (e.g., a universal serial bus or USB controller), a graphical processing unit (GPU), an intra-panel interface, and other hardware or software systems configured to transmit and receive digital data.


While the present invention has been described in connection with certain example embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims
  • 1. A method of training an autoencoder comprising a plurality of encoder neural networks and a plurality of decoder neural networks, the method comprising: training the plurality of encoder neural networks, wherein weights of the plurality of decoder neural networks are fixed during the training of the plurality of encoder neural networks;iteratively training the plurality of decoder neural networks for a plurality of iterations, wherein, for each iteration of training the plurality of decoder neural networks:a pair of decoder neural networks of the plurality of decoder neural networks are replaced by another pair of neural networks of the plurality of decoder neural networks are, anda second decoder neural network of the pair of decoder neural networks utilizes different parameters than a first decoder neural network of the pair of decoder neural networks.
  • 2. The method of claim 1, wherein the second decoder neural network utilizes a larger blocklength than the first decoder neural network.
  • 3. The method of claim 1, wherein the second decoder neural network utilizes a smaller rate than the first decoder neural network.
  • 4. The method of claim 1, wherein the iteratively training the plurality of decoder neural networks comprises: training all of the plurality of decoder neural networks for each of a first number of iterations of the plurality of iterations;training only a single pair of decoder neural networks for each of a second number of iterations of the plurality of iterations after the first number of iterations; andtraining all of the plurality of decoder neural networks for each of a third number of iterations of the plurality of iterations after the second number of iterations.
  • 5. The method of claim 1, wherein the plurality of encoder neural networks is configured to map a message to a codeword and to transmit the codeword over a noisy channel, wherein the noisy channel is a first type of channel, and wherein the method further comprises retraining the autoencoder on a second type of channel different than the first type of channel.
  • 6. The method of claim 5, wherein the first type of channel is an additive white Gaussian noise (AWGN) channel, and wherein the second type of channel is a Rayleigh fading channel.
  • 7. The method of claim 5, wherein the retraining of the autoencoder comprises performing a single training epoch on the second type of channel.
  • 8. The method of claim 1, wherein the plurality of encoder neural networks is configured to map a message to a codeword and to transmit the codeword over a noisy channel having a signal-to-noise ratio, wherein the signal-to-noise ratio is a first signal-to-noise ratio in a first range, and wherein the method further comprises retraining the autoencoder for a plurality of epochs over the noisy channel having a second signal-to-noise ratio different than the first signal-to-noise ratio.
  • 9. The method of claim 8, wherein the second signal-to-noise ratio is larger than the first signal-to-noise ratio.
  • 10. The method of claim 9, wherein the second signal-to-noise ratio is a wider range than the first signal-to-noise ratio.
  • 11. The method of claim 8, wherein the plurality of epochs comprises 11 epochs.
  • 12. The method of claim 1, wherein the plurality of encoder neural networks is configured to map a message to a codeword and to transmit the codeword over a noisy channel having a signal-to-noise ratio, and wherein the message has a code dimension of at least 300 bits.
  • 13. The method of claim 1, training the plurality of encoder neural networks comprises applying power normalization.
  • 14. The method of claim 1, training the plurality of decoder neural networks comprises applying power normalization.
  • 15. An autoencoder comprising: a plurality of encoder neural networks configured to map a message to a codeword and to transmit the codeword over a noisy channel having a signal-to-noise ratio; anda plurality of decoder neural networks configured to decode the message, wherein a second decoder neural network of the pair of decoder neural networks utilizes different parameters than a first decoder neural network of the pair of decoder neural networks.
  • 16. The autoencoder of claim 15, wherein the second decoder neural network utilizes a larger blocklength than the first decoder neural network.
  • 17. The autoencoder of claim 15, wherein the second decoder neural network utilizes a smaller rate than the first decoder neural network.
  • 18. The autoencoder of claim 15, wherein the autoencoder is trained on a first type of channel and on a second type of channel different than the first type of channel.
  • 19. The autoencoder of claim 18, wherein the first type of channel is an additive white Gaussian noise (AWGN) channel, and wherein the second type of channel is a Rayleigh fading channel.
  • 20. The autoencoder of claim 15, wherein the autoencoder is trained on a noisy channel having a first signal-to-noise ratio and a second signal-to-noise ratio different than the first signal-to-noise ratio.
CROSS-REFERENCE TO RELATED APPLICATION(S)

This application claims priority to and the benefit of U.S. Provisional Patent Application No. 63/442,675, filed in the United States Patent and Trademark Office on Feb. 1, 2023, the entire disclosure of which is incorporated by reference herein. The present application is related to U.S. patent application Ser. No. 17/942,064, filed in the United States Patent and Trademark Office on Sep. 9, 2022, the entire disclosure of which is incorporated by reference herein.

Provisional Applications (1)
Number Date Country
63442675 Feb 2023 US