Aspects of embodiments of the present disclosure relate to channel encoders and decoders implemented using trained neural networks.
Channel encoders and decoders improve the reliability of communication systems when transmitting and receiving data over a noisy communication channel. Generally, an encoder implementing an error correction code (or error correcting code ECC) takes an original message as input and generates an encoded message, where the encoded message has some additional bits of data in comparison to the original message (e.g., the encoded message is longer than the original message). These additional bits of data provide redundancy such that, if the encoded message is corrupted or otherwise modified between being transmitted from an encoder and being received at a decoder, the decoder can use the additional data to reconstruct the original message, within some limits on the number of errors that can be corrected in accordance with the ECC that is applied to the original message. Examples of classical error correction codes include Reed-Solomon codes, Turbo codes, low-density parity-check (LDPC) codes, and polar codes.
Recently, the encoder and decoder (or some components within the encoder and decoder architectures) have been replaced with neural networks or other trainable models, which reduces the encoding and decoding complexity, improves upon the performance of classical channel codes, enables applications for realistic channel models and emerging use cases, and enables designing universal decoders that simultaneously decode several codes. However, the size of the code spaces (2k distinct codewords for a binary linear code of dimension k) present a major technical challenge in the channel coding context. Due to these large code spaces, only a small fraction of all codewords will be seen during the training phase, and thus the trained models for the encoders and decoders may fail in generalizing to unseen codewords. Additionally, a straightforward design of neural encoders and decoders for large code dimensions and lengths requires using huge networks with excessively large number of trainable parameters. Together, these factors make it prohibitively complex to design and train relatively large neural channel encoders and decoders. Another major challenge is the joint training of the encoder and decoder due to local optima that may occur as a result of non-convex loss functions.
The above information disclosed in this Background section is only for enhancement of understanding of the present disclosure, and therefore it may contain information that does not form the prior art that is already known to a person of ordinary skill in the art.
The present disclosure relates to various embodiments of a method of training an autoencoder that includes encoder neural networks and decoder neural networks. In one embodiment, the method includes training the encoder neural networks. Weights of the decoder neural networks are fixed during the training of the encoder neural networks. The method also iteratively training the decoder neural networks for a number of iterations. For each iteration of the training of the decoder neural networks, a pair of decoder neural networks is replaced by another pair of neural networks, and a second decoder neural network of the pair of decoder neural networks utilizes different parameters than a first decoder neural network of the pair of decoder neural networks.
The second decoder neural network may utilize a larger blocklength than the first decoder neural network.
The second decoder neural network may utilize a smaller rate than the first decoder neural network.
Iteratively training the decoder neural networks may include training all of the decoder neural networks for each of a first number of iterations; training only a single pair of decoder neural networks for each of a second number of iterations after the first number of iterations; and training all of the decoder neural networks for each of a third number of iterations after the second number of iterations.
The noisy channel may be a first type of channel, and the method may further include retraining the autoencoder on a second type of channel different than the first type of channel.
The first type of channel may be an additive white Gaussian noise (AWGN) channel, and the second type of channel may be a Rayleigh fading channel.
Retraining the autoencoder may include performing a single training epoch on the second type of channel.
The signal-to-noise ratio may be a first signal-to-noise ratio in a first range, and the method may further include retraining the autoencoder for a number of epochs over the noisy channel having a second signal-to-noise ratio different than the first signal-to-noise ratio.
The second signal-to-noise ratio may be larger than the first signal-to-noise ratio.
The second signal-to-noise ratio may be a wider range than the first signal-to-noise ratio.
The number of epochs may be 11 epochs.
The message may have a code dimension of at least 300 bits.
Training the encoder neural networks may include applying power normalization.
Training the decoder neural networks may include applying power normalization.
The present disclosure also relates to various embodiments of an autoencoder. In one embodiment, the autoencoder includes a number of encoder neural networks configured to map a message to a codeword and to transmit the codeword over a noisy channel having a signal-to-noise ratio, and a number of decoder neural networks configured to decode the message. A second decoder neural network of the pair of decoder neural networks utilizes different parameters than a first decoder neural network of the pair of decoder neural networks.
The second decoder neural network may utilize a larger blocklength than the first decoder neural network.
The second decoder neural network may utilize a smaller rate than the first decoder neural network.
The autoencoder may be trained on a first type of channel and on a second type of channel different than the first type of channel.
The first type of channel may be an additive white Gaussian noise (AWGN) channel, and the second type of channel may be a Rayleigh fading channel.
The autoencoder may be trained on a noisy channel having a first signal-to-noise ratio and a second signal-to-noise ratio different than the first signal-to-noise ratio.
This summary is provided to introduce a selection of features and concepts of embodiments of the present disclosure that are further described below in the detailed description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used in limiting the scope of the claimed subject matter. One or more of the described features may be combined with one or more other described features to provide a workable system or method.
The accompanying drawings, together with the specification, illustrate example embodiments of the present invention, and, together with the description, serve to explain the principles of the present invention.
The present disclosure relates to various embodiments of a large-scale product autoencoder including an encoding artificial neural network and a decoding artificial neural network, and methods of training a product autoencoder on large code dimensions, such as 300 bits of information or more. The encoding artificial neural network and the decoding artificial neural network are trained for a large code using smaller code components (i.e., the present disclosure trains the encoding artificial neural network and the decoding artificial neural network on smaller code components rather than directly training the encoding artificial neural network and the decoding artificial neural network for a large code). Additionally, the training scheme of the present disclosure utilizes a heterogeneous decoder training scheme in which more powerful codes (e.g., different parameters, such as larger blocklengths or smaller code rates) are utilized for one decoder neural network than the other decoder neural network. The present disclosure also includes training schemes to reduce (e.g., remove) the error floor in the performance of the product autoencoder by retraining the product autoencoder with larger values and a wider range of the training signal-to-noise ratios (SNRs). The present disclosure further includes a method of fine-tuning the product autoencoder by retraining the product autoencoder on a different channel type than it was originally trained (e.g., retraining a product autoencoder that was trained on an additive white Gaussian noise (AWGN) channel on a Rayleigh fading channel model).
Noise and other interference in the channel 200 can modify the data in the output matrix Un2×n1(2) transmitted by the second encoder FCNN 103. The noisy signal can be expressed as y=c+n, where n is the channel noise vector (independent from c) whose components are Gaussian random variables with mean zero and variance σ2. The ratio of the average energy per coded symbol to the noise variance is the signal-to-noise ratio (SNR). In one or more embodiments, the encoder FCNNs 102, 103 satisfy a soft power constraint such that the average power per coded bit is equal to 1 and the SNR=1/σ2.
Additionally, as shown in
In the example shown in
In addition, in some embodiments, decoding performance can be improved by applying a soft-input soft-output (SISO) decoder and also performing several iterations, where the output of the product decoder 104 (e.g., the output of the last decoder stage, in this case the first decoder FCNN 105 as shown in
While the above discussion of
In one or more embodiments, the encoder FCNNs 102, 103 and the decoder FCNNs 105, 106 may be components of the same computer system (e.g., integrated within a single enclosure, such as in the case of a smartphone or other mobile device, tablet computer, or laptop computer), may be separate components of a computer system (e.g., a desktop computer in communication with an external monitor), or may be separate computer systems (e.g., two independent computer systems communicating over the communication channel), or variations thereof (e.g., implemented within special purpose processing circuits such as microcontrollers configured to communicate over the communication channel, where the microcontrollers are peripherals within a computer system). In one or more embodiments, the encoder FCNNs 102, 103 and the decoder FCNNs 105, 106 may be implemented on different communication devices of the communication system and that communicate with one another over a communication channel. For example, the encoder FCNNs 102, 103 may be implemented in user equipment such as a smartphone and the decoder FCNNs 105, 106 may be implemented in a base station. In one or more embodiments, these communication devices are transceivers that can both transmit and receive data. For example, a smartphone may include both the encoder FCNNs 102, 103 and the decoder FCNNs 105, 106 for transmitting and receiving data, respectively, where these may implement the same neural codes (e.g., same error correction codes) or different neural codes (e.g., different error correction codes).
In various embodiments, the encoder FCNNs 102, 103 and the decoder FCNNs 105, 106 may be implemented by processing circuits of a communication device. In one or more embodiments, the various processing circuits may be components of a same integrated circuit (e.g., as being components of a same system on a chip or SoC) or may be components of different integrated circuits that may be connected through pins and lines on a printed circuit board. Additionally, in one or more embodiments, the encoder circuit may be implemented using a different type of processing circuit than the decoder circuit. Examples of processing circuits include, but are not limited to, a general-purpose processor core (e.g., included within application processors, system-on-chip processors, and the like), a central processing unit (CPU), an application processor (AP) or application processing unit (APU), a field programmable gate array (FPGA which may include a general-purpose processor core), an application specific integrated circuit (ASIC) such as a display driver integrated circuit (DDIC), a digital signal processor (DSP), a graphics processing unit (GPU), a neural accelerator or neural processing unit, and combinations thereof (e.g., controlling an overall encoding or decoding process using a general-purpose processor core that controls a neural accelerator to perform neural network operations such as vector multiplications and accumulations and to apply non-linear activation functions). The encoder FCNNs 102, 103 and the decoder FCNNs 105, 106 may be defined in accordance with an architecture and a plurality of parameters such as weights and biases of connections between neurons of different layers of the neural networks of various neural stages of the encoder FCNNs 102, 103 and the decoder FCNNs 105, 106. In some embodiments of the present disclosure, these parameters may be stored in memory and accessed by the processing circuits during runtime to perform computations implementing the neural stages. In some embodiments of the present disclosure, the processing circuit is configured with these parameters (e.g., fixed in a lookup table or as constant values in a special-purpose DSP, ASIC, FPGA, or neural processing unit).
In one or more embodiments, the communication link may be a wireless communication link such as a cellular connection or a local wireless network connection (e.g., Wi-Fi connection) between a client mobile device (e.g., smartphone, laptop, or other user equipment (UE)) and a base station (e.g., gNodeB or gNB in the case of a 5G-NR base station or a Wi-Fi access point or Wi-Fi router), where the devices may transmit and receive data over the communication channel using processing circuits according to the present disclosure integrated into the respective devices, where the data is formatted or encoded and decoded in accordance with a communication protocol (e.g., a cellular communication protocol such as 6G wireless or a wireless networking communication protocol such as a protocol in the IEEE 802.11 family of protocols).
Algorithm 1 below depicts a process of training the autoencoder 100 according to one embodiment of the present disclosure. The weights ϕ1, ϕ2 and θ1, θ2 of the first and second encoder FCNNs 102, 103 and the first and second decoder FCNNs 105, 106, respectively, are initialized and the process of training the autoencoder 100 includes performing E training epochs. For each training epoch e of the total number of training epochs E, the process includes a decoder training schedule in which the decoder FCNNs 105, 106 are trained for Tdec iterations, and an encoder training schedule in which the encoder FCNNs 102, 103 are trained for Tenc iterations. During the encoder training schedule, the decoder FCNNs 105, 106 remain fixed, and during the decoder training schedule the encoder FCNN 102, 103 remain fixed (i.e., the weights θ1, θ2 of the decoder FCNNs 105, 106 remaining fixed during the training of the encoder FCNNs 102, 103, and the weights ϕ1, ϕ2 of the encoder FCNNs 102, 103 remain fixed during the training of the decoder FCNNs 105, 106).
As illustrated below, for each training iteration id of the total number of training iterations Tdec the decoder FCNNs 105, 106, a batch of B message words are generated and the encoder FCNNs 102, 103 are utilized to generate codewords C from the message words in which the code dimensions are K={k1, k2 . . . , kM} and the blocklengths are N={n1, n2 . . . , nM}. The process of training the decoder FCNNs 105, 106 also includes generating a batch of B noise vectors with signal-to-noise ratios (SNRs) from the range [γd,1, γd,u] (i.e., generate N) and then generating a batch of noisy codewords (i.e., Y=C+N) using the noise vectors, which represent the noise of the communication channel 200. The process of training the decoder FCNNs 105, 106 also includes generating decoded sequences Û utilizing the decoder FCNNs 105, 106 and then applying an optimizer to update the weights Θ of the decoder FCNNs 105, 106 while keeping the weights ϕ of the encoder FCNNs 102, 103 fixed.
Additionally, as illustrated below, for each training iteration ie of the total number of training iterations Tenc of the encoder FCNNs 102, 103, a batch of B message words are generated and the encoder FCNNs 102, 103 are utilized to generate codewords C from the message words. The process of training the encoder FCNNs 102, 103 also includes generating a batch of B noise vectors with signal-to-noise ratio (SNR) γe (i.e., generate N) and then generating a batch of noisy codewords (i.e., Y=C+N) using the noise vectors. The process of training the encoder FCNNs 102, 103 also includes generating decoded sequences Û utilizing the decoder FCNNs 105, 106 and then applying an optimizer to update the weights ϕ of the encoder FCNNs 102, 103 while keeping the weights Θ of the decoder FCNNs 105, 106 fixed. In this manner, the weights ϕ and θ of the first and second encoder FCNNs 102, 103 and the first and second decoder FCNNs 105, 106 are determined.
:= {n1, n2, ... ,nM}, number of epochs E, batch size , number of encoder and decoder training
perform E training epochs
indicates data missing or illegible when filed
During a process of training the first and second encoder FCNNs 102, 103, a batch of B length-k1k2 binary information sequences 107 is reshaped to a tensor UB×k
In the illustrated embodiment, the length-n2 real-valued vector (i.e., the codeword) is output by the second encoder FCNN 103 to a power normalization task or process 109. The power normalization task 109 is configured to ensure that the average power per coded bit is equal to one and thus the average SNR is equal to the given SNR. In the illustrated embodiment, the length-n real-valued vector c=(c1, c2, . . . , cn) of the coded sequence at the output of the second encoder FCNN 103 is normalized as follows:
Therefore, ∥c′∥22=n, and thus the average power per coded symbol is equal to one.
After the input is encoded by the first and second encoder FCNNs 102, 103 and the coded sequence is normalized by the power normalization function 109, the real-valued codewords C are passed through the channel 200 (e.g., the AWGN channel) and then decoded using the decoder 104 to generate a batch of decoded codewords ÛB×k
As illustrated in
As illustrated in
Additionally, as illustrated in
As described above, the process of training the decoder FCNNs 105, 106 includes I decoding iterations in which each iteration i utilizes a distinct pair of decoder FCNNs 105, 106. As such, the decoder FCNNs 105, 106 have a relatively more complex network with more learnable parameters compared to the encoder FCNNs 102, 103. This greater complexity and number of learnable parameters may be utilized to improve the performance of the decoder FCNNs 105, 106 by performing separate decoder training schedules per epoch. That is, in one or more embodiments, the process of training the decoder FCNNs 105, 106 includes a multi-schedule training process. In one or more embodiments, the multi-schedule training process includes a first schedule in which all of the decoder FCNNs 105, 106 are trained together for Tdec,s iterations. During the next I schedules or iterations, only a single pair of decoder FCNNs 105, 106, corresponding to a given decoding iteration, is updated. Afterward, another full decoder training schedule takes place in which all of the decoder FCNNs 105, 106 are trained together for Tdec,e iterations. Finally, the encoder FCNNs 102, 103 are updated for Tenc iterations. In one or more embodiments, this multi-schedule training process speeds up the process of training the decoder FCNNs 105, 106 compared to a training process in which the decoder FCNNs 105, 106 are not trained in this manner.
Aspects of the present disclosure also relate to training methods for increasing the robustness and the adaptivity of the product autoencoder 100. Robustness refers to the ability of an autoencoder trained over a particular channel model to perform well on a different channel model without retraining, and adaptivity refers to the ability of an autoencoder to retrain for (i.e., adapt to) a different channel model with minimal retraining. For instance, applying a product autoencoder trained for AWGN channels to a fading channel model may result in error rate saturation for larger signal-to-noise ratio (SNR) values, which results in an error floor. This saturation of the performance is due to the training SNRs of the encoder and the decoder being in a range that does not include any (or at least does not include a significant number of) noisy samples with the large channel SNRs of the fading channel model.
In the embodiment illustrated in
Aspects of the present disclosure also relate to methods for fine-tuning a product autoencoder that was trained on one channel type to fully adapt to another channel model type. For instance, in the embodiment depicted in
Algorithm 2 below depicts the process of generating codewords utilizing the encoder FCNNs 102, 103 and the power normalizer 109 according to one embodiment of the present disclosure. As illustrated below, a batch of input information sequences U are reshaped to an M-dimensional array according to the set of code dimensions K={k1, k2 . . . , kM}. Additionally, the process includes applying the m-th encoder FCNN 101 to the m-dimension of the input information sequence U to map the length km vectors to length nm vectors. The process also includes a task of reshaping each array in the batch of input information sequences U to a sequence of length n=n1n2 . . . nM, and then a task of applying the power normalizer function 109 to generate the codewords C.
Algorithm 3 below depicts the process of decoding the noisy codewords Y (i.e., the codewords C output by the encoder FCNNs 101 and the noise from the noisy channel 200) with the decoder FCNNs 105, 106 according to one embodiment of the present disclosure. As illustrated below, the first I−1 pairs of decoder FCNNs 105, 106 work on input and output sizes of the length of coded bits nj, and the last pair of decoder FCNNs 105, 106 (i.e., the I-th pair of decoder FCNNs 105, 106) reverts the encoding operation by reducing the lengths from nj to kj. In one or more embodiments, some of the decoder FCNNs 105, 106 may take multiple length-n; vectors as input and output multiple copies of length-nj vectors. In Algorithm 3, the function “reshape” reshapes the tensor upon which the function is being applied to the shape specified by the function arguments. Moreover, the function “concatenate([X, Y], l)” constructs a larger tensor by concatenating two tensors X and Y in dimension l. Finally, the function “permute” returns a new tensor by permuting the dimensions of the original tensor according to the specified arguments. Although the decoding algorithm in Algorithm 3 is for two-dimensional product autoencoders (M=2), the decoding algorithm may be generalized to higher-dimensional (M-dimensional) product autoencoders.
reshape Y to a 3D tensor
indicates data missing or illegible when filed
It should be understood that the sequence of steps of the processes described herein in regard to various methods and with respect various flowcharts is not fixed, but can be modified, changed in order, performed differently, performed sequentially, concurrently, or simultaneously, or altered into any desired order consistent with dependencies between steps of the processes, as recognized by a person of skill in the art. Further, as used herein and in the claims, the phrase “at least one of element A, element B, or element C” is intended to convey any of: element A; element B; element C; elements A and B; elements A and C; elements B and C; and elements A, B, and C.
Embodiments of the present invention can be implemented in a variety of ways as would be appreciated by a person of ordinary skill in the art, and the term “processor” as used herein may refer to any computing device capable of performing the described operations, such as a programmed general purpose processor (e.g., an ARM processor) with instructions stored in memory connected to the general purpose processor, a field programmable gate array (FPGA), and a custom application specific integrated circuit (ASIC). Embodiments of the present invention can be integrated into a serial communications controller (e.g., a universal serial bus or USB controller), a graphical processing unit (GPU), an intra-panel interface, and other hardware or software systems configured to transmit and receive digital data.
While the present invention has been described in connection with certain example embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
This application claims priority to and the benefit of U.S. Provisional Patent Application No. 63/442,675, filed in the United States Patent and Trademark Office on Feb. 1, 2023, the entire disclosure of which is incorporated by reference herein. The present application is related to U.S. patent application Ser. No. 17/942,064, filed in the United States Patent and Trademark Office on Sep. 9, 2022, the entire disclosure of which is incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
63442675 | Feb 2023 | US |