Binary low-density parity-check (LDPC) codes are sparse graph-based channel codes whose rates approach the capacity of symmetric binary input channels. Due to their excellent error correcting performance over noisy channels, LDPC codes have recently been standardized for error correction in 5G cellular new radio systems such as mobile phones. Error correction is done by detecting the status of parity bits. When a parity check failure is detected for a data bit, information from the multiple parity bits associated with the data bits are used to retrieve the original/correct value for the data bit.
Tanner graphs of LDPC codes are sparse bipartite graphs whose vertex sets are partitioned into check nodes (CNs) and variable nodes (VNs). Typically, iterative decoding on an LDPC Tanner graph is carried out via flooding: all CNs and VNs are updated simultaneously in each iteration. In contrast, sequential LDPC decoding seeks to optimize the order of node updates to improve the convergence speed and/or the decoding performance with respect to the flooding scheme. One approach to sequential decoding of LDPC codes is to use a node-wise scheduling (NS) algorithm, where a single CN is scheduled per decoding iteration based on its residual, given by the magnitude of the difference between two successive messages emanating from that CN. Using sequential decoding and scheduling CNs with higher residuals is expected to lead to faster and more reliable decoding compared to the flooding scheme. To obviate the need for computing residuals, a reinforcement learning (RL)-based NS (RL-NS) scheme was previously proposed. Model-free RL methods have also been considered by (1) computing the Gittins index of each CN, and (2) utilizing standard Q-learning. In addition to model-free RL, a model-based RL-NS approach based on Thompson sampling has also been considered.
Embodiments of the present disclosure improve sequential decoding performance of low-density parity-check (LDPC) codes by implementing a reinforcement learning (RL) based process that sequentially updates clusters in each iteration, as opposed to a single check node (CN), until a stopping condition or a maximum number of iterations is reached. In each scheduling instant, a cluster's neighbors are updated via flooding based on the latest messages propagated by its neighboring clusters.
Embodiments of the present disclosure provide for RL-based sequential decoding processes to optimize the scheduling of CN clusters for moderate length LDPC codes. Embodiments of the present disclosure include a new state space model built using the collection of outputs of clusters. Deep reinforcement learning (DRL) can be applied for cluster size 3 and standard Q-learning for smaller clusters. Experimental results show that by learning the cluster scheduling order, embodiments of the present disclosure can outperform a random scheduling scheme, irrespective of the cluster size. The performance gains include lowering both bit error rate (BER) and message-passing complexity.
In accordance with embodiments of the present disclosure, systems, methods, and non-transitory computer-readable media are disclosed for sequentially decoding low-density parity-check codes encoded in a traffic channel of a communication signal received by a mobile communication device. The non-transitory computer-readable medium storing instructions for decoding low-density parity-check codes and a processing device can execute the instructions to perform a method that includes training a reinforcement learning software agent of an LDPC decoder to learn to schedule each check node in a cluster and to schedule each cluster sequentially depending on a reward associated with an outcome of scheduling a particular cluster for each iteration; decoding scheduled check node clusters in each iteration; updating a posterior log-likelihood ratio of all variable nodes (VNs) based on the decoding; determining whether a specified maximum number of iterations has been reached or a stopping condition has been satisfied; and outputting a reconstructed signal corresponding to the communication signal received by the mobile communication device in response to determining the specified maximum number of iterations or the stopping condition is reached.
In accordance with embodiments of the present disclosure, systems, methods, and non-transitory computer-readable media are disclosed for sequentially decoding low-density parity-check codes encoded in a traffic channel of a communication signal received by a mobile communication device. The non-transitory computer-readable medium storing instructions for decoding low-density parity-check codes and a processing device can execute the instructions to perform a method that includes generating a decoding schedule for a plurality of clusters of check nodes in response to execution of a reinforcement learning-based software agent of an LDPC decoder; sequentially decoding each of the plurality of clusters of check nodes according to the learned decoding policy; updating a posterior log-likelihood ratio of all variable nodes (VNs) based on the decoding; determining whether a specified maximum number of iterations has been reached or a stopping condition has been satisfied; and in response to determining the specified maximum number of iterations or the stopping condition is reached, outputting a reconstructed signal corresponding to the communication signal received by the mobile communication device.
In accordance with embodiments of the present disclosure, the reinforcement learning software agent can be trained to schedule sequential decoding of the plurality of clusters of check nodes based on a reward associated with an outcome of decoding each of the plurality of clusters of check nodes. The reward corresponds to a probability that corrupted bits of the communication signal are correctly reconstructed. A cluster scheduling policy is based on the training of the reinforcement learning software agent. The decoding schedule is determined based on the learned cluster scheduling policy.
In accordance with embodiments of the present disclosure, the check nodes can be clustered to minimize inter-cluster dependency.
In accordance with embodiments of the present disclosure, the reinforcement learning software agent can implement at least one of a Q-learning scheme or a deep reinforcement learning scheme to generate the decoding schedule.
Any combination and/or permutation of embodiments are envisioned. Other objects and features will become apparent from the following detailed description considered in conjunction with the accompanying drawings. It is to be understood, however, that the drawings are designed as an illustration only and not as a definition of the limits of the present disclosure.
Embodiments of the present disclosure provide for systems and methods for sequential decoding of moderate length low-density parity-check (LDPC) codes via reinforcement learning (RL). The sequential decoding process can be embodied in an LDPC decoder including a reinforcement learning software agent executed in a mobile communication device and can be modeled as a Markov decision process (MDP). An optimized cluster scheduling policy can be subsequently obtained via RL. In contrast to conventional approaches, where a software agent learns to schedule only a single check node (CN) within a group (cluster) of CNs per iteration, in embodiments of the present disclosure the software agent of the LDPC decoder is trained to schedule all CNs in a cluster, and all clusters in every iteration. That is, in accordance with embodiments of the present disclosure, in each RL step, the software agent of the LDPC decoder learns to schedule CN clusters sequentially depending on the reward associated with the outcome of scheduling a particular cluster.
Embodiments of the present disclosure provide an LDPC decoder with a new RL state space model, which has a significantly smaller number of states than previously proposed models, enabling embodiments of the RL-based LDPC decoder of the present disclosure to be suitable for much longer LDPC codes. As a result, embodiments of the RL-based LDPC decoder described herein exhibit a signal-to-noise ratio (SNR) gain of approximately 0.8 dB for fixed bit error probability over the conventional flooding approach.
With respect to LDPC codes, an [n, k] binary linear code is a k-dimensional subspace of F2n, and can be defined as the kernel of a binary parity-check matrix H∈F2m×n, where m≥n−k. The code's block length is n, and the rate is (n−rank(H))/n. The Tanner graph of a linear code with parity-check matrix H is the bipartite graph GH=(V∪C, E), where V={v0, . . . , vn−1} is a set of variable nodes (VNs) corresponding to the columns of H, C={c0, . . . , cm−1} is a set of check nodes (CNs) corresponding to the rows of the parity-check matrix H, and edges in E correspond to columns (or VNs) and rows (or CNs) in parity-check matrix H that contain a “1”. LDPC codes are a class of highly competitive linear codes defined via sparse parity-check matrices or, equivalently, sparse Tanner graphs, and are amenable to low-complexity graph-based message-passing decoding algorithms, making them ideal for practical applications in telecommunications and other fields. One example of a decoding algorithm for which LDPC codes are suitable is belief propagation (BP) iterative decoding.
Experimental results for embodiments the LDPC decoder that utilize two particular classes of LDPC codes—(γ, k)-regular and array-based (AB-) LDPC codes—are described herein. A (γ, k)-regular LDPC code is defined by a parity-check matrix with constant column and row weights equal to γ and k, respectively. A (γ, p) AB-LDPC code, where p is prime, is a (γ, p)-regular LDPC code with additional structure in its parity-check matrix, H(γ, p). In particular,
where σz denotes the circulant matrix obtained by cyclically left-shifting the entries of the p×p identity matrix I by z (mod p) positions. Notice that σ0=I. In embodiment of the present disclosure, lifted LDPC codes can be obtained by replacing non-zero (resp., zero) entries of the parity-check matrix with randomly generated permutation (resp., all-zero) matrices.
In an RL problem, a software agent (learner) interacts with an environment whose state space can be modeled as a finite Markov decision process (MDP). The software agent takes actions that alter the state of the environment and receives a reward in return for each action, with the goal of maximizing the total reward in a series of actions. The optimized sequence of actions can be obtained by employing a cluster scheduling policy which utilizes an action-value function to determine how beneficial an action is for maximizing the long-term expected reward. For embodiments described herein, let [[x]]={0, . . . , x−1}, where x is a positive integer. As an example, an environment can allow m possible actions. A random variable Al∈[[m]], with realization α, represents the index of an action taken by the software agent during learning step l. The current state of the environment before taking action Al is represented as Sl, with realization s∈Z, and Sl+1, with realization s′, represents a new state of the MDP after executing action Al. A state space S contains all possible state realizations. The reward yielded at step l after taking action Al in state Sl is represented as Rl(Sl, Al, Sl+1).
Optimal policies for MDPs can be estimated via Monte Carlo techniques such as Q-learning. The estimated action-value function Ql(Sl, Al) in Q-learning represents the expected long-term reward achieved by the software agent at step l after taking action Al in state Sl. To improve the estimation in each step, the action-value function can be adjusted according to a recursion
where s′ represents the new state s0 as a function of s and α, 0<α<1 is the learning rate, β is the reward discount rate, and Ql+1(s, α) is a future action-value resulting from action α in the current state s. Note that the new state is updated with each action. The optimal cluster scheduling policy for the software agent, π(l), in state s is given by
π(l)=argmax aQl(s,α), (3)
where l is the total number of learning steps elapsed after observing the initial state S0. In the case of a tie, an action can be uniformly chosen at random from all the maximizing actions.
An embodiment of the RL-based sequential decoding (RL-SD) process can include a belief propagation (BP) decoding algorithm in which the environment is given by the Tanner graph of the LDPC code, and the optimized sequence of actions, i.e., the scheduling of individual clusters, can be obtained using a suitable RL algorithm such as Q-learning. A single cluster scheduling step can be carried out by sending messages from all CNs of a cluster to their neighboring VNs, and subsequently sending messages from these VNs to their CN neighbors. That is, a selected cluster executes one iteration of flooding in each decoding instant. Every cluster is scheduled exactly once within a single decoder iteration. Sequential cluster scheduling can be carried out until a stopping condition is reached, or an iteration threshold is exceeded. The RL-SD method relies on a cluster scheduling policy based on an action-value function, which can be estimated using the RL techniques described herein.
The (first) mobile communication device 110 can encode (e.g., with LDPC codes) and modulate a radiofrequency (RF) signal and transmit the RF signal which can be routed through the network 130 and transmitted to the (second) communication device 120, which can demodulate and decode the received RF signal to extract the voice data. In an exemplary embodiment, the first mobile communication device 110 can use LDPC codes for channel coding on the traffic channel. When the second mobile communication device 120 receives the RF signal, the second mobile communication device can extract the LDPC codes from the RF signal and use the extracted LDPC codes to correct channel errors by maintaining parity bits for data bits transmitted via the traffic channel. When a parity check failure is detected by the second mobile communication device 120 for one or more data bits, information from the multiple parity bits of the LDPC codes associated with the one or more data bits can be used by the second mobile communication device 120 to determine the original/correct value for the one or more data bits.
The memory 206 can include any suitable, non-transitory computer-readable storage medium, e.g., read-only memory (ROM), erasable programmable ROM (EPROM), electrically-erasable programmable ROM (EEPROM), random access memory (RAM), flash memory, and the like. In exemplary embodiments, an operating system 226 and an embodiment of the LDPC decoder 228 can be embodied as computer-readable/executable program code stored on the non-transitory computer-readable memory 206 and implemented using any suitable, high or low-level computing language, scripting language, or any suitable platform, such as, e.g., Java, C, C++, C #, assembly code, machine-readable language, Python, Rails, Ruby, and the like. The memory 206 can also store data to be used by and/or that is generated by the LDPC decoder 228. While memory 206 is depicted as a single component, those skilled in the art will recognize that the memory can be formed using multiple components and that separate non-volatile and volatile memory devices can be used.
One or more processing and logic devices 204 can be programmed and/or configured to facilitate an operation of the mobile communication device 200 and enable RF communications with other communication devices via a network (e.g., network 130). The processing and/or logic devices 204 can be programmed and/or configured to execute the operating system 226 and the LDPC decoder 228 to implement one or more processes to perform one or more operations (decoding of LDPC codes, error detection and correction). As an example, a microprocessor, micro-controller, central processing unit (CPU), or graphical processing unit (GPU) can be programmed to execute the LDPC decoder 228. As another example, the LDPC decoder 228 can be embodied and executed by an application-specific integrated circuit (ASIC). The processing and/or logic devices 204 can retrieve information/data from and store information/data to the memory 206. For example, the processing device 204 can retrieve and/or store LDPC codes and/or any other suitable information/data that can be utilized by the mobile communication device to perform error detection and correction using LDPC codes.
The LDPC decoder 228 can include a reinforcement learning (RL) software agent that can sequentially decode the low-density parity-check (LDPC) codes included in the RF signal via reinforcement learning (RL). The sequential decoding process implemented by the software agent can be trained to schedule all check nodes (CNs) in a cluster, and all clusters in every iteration, such that in each RL step, the software agent of the LDPC decoder 228 learns to schedule CN clusters sequentially depending on the reward associated with the outcome of scheduling a particular cluster.
The RF circuitry 214 can include an RF transceiver, one or more modulation circuits, one or more demodulation circuits, one or more multiplexers, one or more demultiplexers. The RF circuitry 214 can be configured to transmit and/or receive wireless communications via an antenna 215 pursuant to, for example, the 3rd Generation Partnership Project (3GPP) for 5G NR and/or the International Telecommunications Union (ITU) IMT-2020.
The display unit 208 can render user interfaces, such as graphical user interfaces (GUIs) to a user and in some embodiments can provide a mechanism that allows the user to interact with the GUIs. For example, a user may interact with the mobile communication device 200 through the display unit 208, which may be implemented as a liquid crystal touchscreen (or haptic) display, a light-emitting diode touchscreen display, and/or any other suitable display device, which may display one or more user interfaces that may be provided in accordance with exemplary embodiments.
The power source 212 can be implemented as a battery or capacitive elements configured to store an electric charge and power the mobile communication device 200. In exemplary embodiments, the power source 212 can be a rechargeable power source, such as a battery or one or more capacitive elements configured to be recharged via a connection to an external power supply.
The transmitted and the received words can be represented as x=[x0, . . . , xn−1] and y=[y0, . . . , yn−1], respectively, where for v∈[[n]], the values of each transmitted word include 0's and/or 1's (xv∈{0,1}) and the value of each received word can be represented as yv=(−1)x
The posterior LLR computed by VN v during iteration I can be represented as LI=Σc∈(v)mc→v(I)+Lv, where L0=Lv and mc→v(I) is the message received by VN v from neighboring CN c in iteration I. Similarly, the posterior LLR computed during iteration I by VN j in the subgraph induced by the cluster with index α∈[[┌m/z┐]] can be represented as LI. Hence, LI=LIif VN v in the Tanner graph is also the jth VN in the subgraph induced by the cluster with index α.
After scheduling cluster a during iteration I, the output
of cluster α, where lα≤z*kmax is the number of VNs adjacent to cluster α, is obtained by taking hard decisions on the vector of posterior LLRs
computed according to
The output, {circumflex over (x)}α(I) of cluster α includes the bits reconstructed by the sequential decoder after scheduling cluster a during iteration I. An index of a realization of {circumflex over (x)}α(I) in iteration I can be denoted by sα(I)∈[[2l
can be obtained.
During the learning/training phase, embodiments of the RL process inform the software agent of the current state of the LPDC decoder and the reward obtained after performing an action (decoding a cluster). Based on these observations, the software agent of the LDPC decoder 228 can take future actions, to enhance the total reward earned, which alters the state of the environment as well as the future reward. Given that the transmitted communication signal x is known during the training phase, a vector containing the lα bits of x that are reconstructed in the output {circumflex over (x)}α(I) of a cluster can be represented as xα=[x0,α, . . . , xl
where 1(⋅) denotes the indicator function. Thus, the reward earned by the software agent after scheduling cluster α is identical to the probability that the corrupted bits corresponding to the transmitted bits x0,α, . . . , xl
The RL-SD process illustrated by
With respect to the software agent learning a cluster scheduling policy, the state of the MDP after scheduling a cluster index a during learning step l can be denoted as {circumflex over (x)}α(l), and the index of a realization of {circumflex over (x)}α(l) be referred to as
Thus, sα also refers to the state of the MDP. The state space of the MDP contains all possible Σα∈[┌m/z┐]2l
As an example using deep reinforcement learning (DRL), for MDPs with very large state spaces, the action-value function Ql(s, α) can be approximated as Ql(s, α; W) using a deep learning model with tensor W representing the weights connecting all layers in the neural network (NN). In each learning step 1, a separate NN can be used with weight Wl(α), for each cluster, since a single NN cannot distinguish between the signals {circumflex over (x)}a(l), . . . , {circumflex over (x)}┌m/z┐−1(l), and hence cannot distinguish between the rewards R0, . . . , R┌m/z┐−1 generated by the ┌m/z┐ different clusters. The target of the NN corresponding to cluster α is given by
where the reward Rl(sa, a, s′)=Ra. Also, let Ql(sα, α; Wl(α)) be the NN's prediction. In each DRL step, the mean squared error loss between Tl(α) and Ql(sα, α; Wl(α)) can be minimized using a gradient descent method. The NN corresponding to each cluster learns to map the cluster output {circumflex over (x)}α(l) to a vector of ┌m/z┐ predicted action-values
During inference, the optimized cluster scheduling policy, πi*(I), for scheduling the ith cluster during decoder iteration I is expressed as
where sα
As another example using standard Q-learning, for MDPs with moderately large state spaces, a standard Q-learning approach can be used for determining the optimal cluster scheduling order, where the action-value for choosing cluster a in state sa is given by
In each learning step l, cluster a can be selected via a ε-greedy approach according to
where πQ(l)=maxα∈[┌m/z┐]QL(sα, α). For ties (as in the first iteration of the standard Q-learning algorithm shown in
for scheduling the ith cluster during decoder iteration I can be expressed as
here Q*(sα
Experiments were performed to test the performance of the RL-SD process shown in
The LLR vectors used for training are sampled uniformly at random over a range of A equally spaced SNR values for a given code. Hence, there are |L|/A LLR vectors in for each SNR value considered. For both considered codes (e.g., [384, 256]-WRAN and (3, 5)-AB LDPC codes), the learning parameters can be as follows: α=0.1, β=0.9, ε=0.6, lmax=50, and ||=5×105, where || is chosen to ensure that the training is as accurate as possible without incurring excessive run-time for the standard Q-learning algorithm (e.g., an embodiment of which is shown in
For both training and inference, the AWGN channel is considered and all-zero codewords are transmitted using BPSK modulation. Training with the all-zero codeword is sufficient as, due to the symmetry of the BP decoder and the channel, the decoding error is independent of the transmitted signal.
and the frame error rate (FER), given by Pr[{circumflex over (x)}≠x]. In the case of the WRAN LDPC code, z=1 is only considered as this code has several degree-11 CNs which render both learning schemes too computationally intensive for z>1. On the other hand, for the AB code, multiple cluster sizes are chosen from z∈{1, 2,3} for both the random and RL-SD schemes. For z∈{1, 2}, standard Q-learning can be employed to learn the cluster scheduling policy. For z=3, deep reinforcement learning (DRL) can be utilized, as standard Q-learning is not feasible due to the significantly increased state space. The same number of training examples are used for both standard Q-learning and DRL.
The BER vs. channel signal-to-noise ratio (SNR), in terms of Eb/N0 in dB, for the [384, 256]-WRAN and (3, 5) AB-LDPC codes using these decoding techniques are shown in
In Table 1, the average number of CN to VN messages propagated in the considered decoding schemes are compared to attain the results in
Table 1: Average number of CN to VN messages propagated in various decoding schemes for a [384, 256]-WRAN (left) and (3,5) AB-(right) LDPC code to attain the results shown in
Exemplary flowcharts are provided herein for illustrative purposes and are non-limiting examples of methods. One of ordinary skill in the art will recognize that exemplary methods may include more or fewer steps than those illustrated in the exemplary flowcharts, and that the steps in the exemplary flowcharts may be performed in a different order than the order shown in the illustrative flowcharts.
The foregoing description of the specific embodiments of the subject matter disclosed herein has been presented for purposes of illustration and description and is not intended to limit the scope of the subject matter set forth herein. It is fully contemplated that other various embodiments, modifications and applications will become apparent to those of ordinary skill in the art from the foregoing description and accompanying drawings. Thus, such other embodiments, modifications, and applications are intended to fall within the scope of the following appended claims. Further, those of ordinary skill in the art will appreciate that the embodiments, modifications, and applications that have been described herein are in the context of particular environment, and the subject matter set forth herein is not limited thereto but can be beneficially applied in any number of other manners, environments, and purposes. Accordingly, the claims set forth below should be construed in view of the full breadth and spirit of the novel features and techniques as disclosed herein.
The present application claims priority to and the benefit of U.S. Provisional Application No. 63/249,412, filed on Sep. 28, 2021, which is incorporated by reference herein in its entirety.
This invention was made with government support under Grant No. ECCS-1711056 awarded by the U.S. National Science Foundation. The government has certain rights in the invention.
Number | Name | Date | Kind |
---|---|---|---|
20210152282 | Huang | May 2021 | A1 |
20210162282 | Solano | Jun 2021 | A1 |
20210271549 | Khayat | Sep 2021 | A1 |
20220029637 | Jeon | Jan 2022 | A1 |
20220209888 | Hoydis | Jun 2022 | A1 |
20220329262 | Liu | Oct 2022 | A1 |
20230012648 | Fitzpatrick | Jan 2023 | A1 |
20230155607 | Tullberg | May 2023 | A1 |
Number | Date | Country |
---|---|---|
WO 2007110436 | Oct 2007 | WO |
Entry |
---|
A. V. Casado, M. Griot, and R. D. Wesel, “LDPC decoders with informed dynamic scheduling,” IEEE Trans. of Comm., vol. 58, No. 12, pp. 3470-3479, Dec. 2010. |
C. J. C. H. Watkins, “Learning from delayed rewards,” Ph.D. dissertation, King's College, 1989. |
D. J. Costello, Jr., L. Dolecek, T. Fuja, J. Kliewer, D. G. M. Mitchell, and R. Smarandache, “Spatially coupled sparse codes on graphs: theory and practice,” IEEE Comm. Mag., vol. 52, No. 7, pp. 168-176, 2014. |
F. Carpi, C. Hager, M. Martalo, R. Raheli, and H. D. Pfister, “Reinforcement learning for channel coding: Learned bit-flipping decoding,” Proceedings of 57th Allerton Conf. on Communication, Control and Computing, 2019. |
H. Kfir and I. Kanter, “Parallel versus sequential updating for belief propagation decoding,” Physica A, vol. 330, pp. 259-270, 2003. |
J. C. Gittins, “Bandit processes and dynamic allocation indices,” J. R. Statistics Soc. B, vol. 41, No. 2, pp. 148-163, 1979. |
J. L. Fan, “Soft Decoding of Several Classes of Array Codes,” in Proceedings of Intl. Symp. on Information Theory, 2002, pp. 543-546. |
J. Zhang and M. Fossorier, “Shuffled belief propagation decoding,” in Proc. 36th Annual Asilomar Conf. Signals, Syst. Comput., 2002, pp. 368. |
M. Fossorier, M. Mihaljevic, and H. Imai, “Reduced complexity iterative decoding of Low Density Parity Check codes based on belief propagation,” IEEE Trans. Commun., vol. 47, pp. 673-680, May 1999. |
M. O. Duff, Q-Learning for Bandit Problems. CMPSCI Technical Report 95-26, 1995. |
R. G. Gallager, “Low-density parity-check codes,” Doctoral Dissertation, Massachusetts Institute of Technology, Cambridge, Massachusetts, 1963. |
R. M. Tanner, “A recursive approach to low complexity codes,” IEEE Trans. Inf. Theory, vol. 27, No. 5, pp. 547-553, Sep. 1981. |
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd Edition. The MIT Press Cambridge, 2018. |
S. Chung, G. D. Forney, T. J. Richardson, and R. Urbanke, “On the design of low-density parity-check codes within 0.0045 db of the Shannon limit,” IEEE Commun. Lett., vol. 5, pp. 58-60, Feb. 2001. |
S. Habib, A. Beemer, and J. Kliewer, “Belief propagation decoding of short graph-based channel codes via reinforcement learning,” IEEE J. Sel. Areas Inf. Theory (Early Access), 2021. |
S. Habib, A. Beemer, and J. Kliewer., “Learning to decode: Reinforcement learning for decoding of sparse graph-based channel codes,” Advances in Neural Information Processing Systems, pp. 22 396-22 406, 2020. [Online]. Available: https://proceedings.neurips.cc/paper/2020/ file/fdd5b16fc8134339089ef25b3cf0e588-Paper.pdf. |
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, and G. O. et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, No. 7540, 2015. |
Y. Kou, S. Lin, and M. Fossorier, “Low density parity-check codes based on finite geometries: A rediscovery and new results,” IEEE Trans. Inf. Theory, vol. 47, pp. 2711-2736, Nov. 2001. |
Y. Li, “Deep reinforcement learning,” [Online]. Available: arXiv.org, arXiv:1810.06339v1 [cs.LG], 2018. |
Number | Date | Country | |
---|---|---|---|
20230231575 A1 | Jul 2023 | US |
Number | Date | Country | |
---|---|---|---|
63249412 | Sep 2021 | US |