CRISP: CURRICULUM BASED SEQUENTIAL NEURAL DECODERS FOR POLAR CODE FAMILY

Abstract

Polar codes are widely used state-of-the-art codes for reliable communication that have recently been included in the 5 th generation wireless standards (5G). However, there remains room for the design of polar decoders that are both efficient and reliable in the short blocklength regime. Motivated by recent successes of data-driven channel decoders, we introduce a novel CurRIculum based Sequential neural decoder for Polar codes (CRISP) 1 . We design a principled curriculum, guided by information-theoretic insights, to train CRISP and show that it outperforms the successive-cancellation (SC) decoder and attains near-optimal reliability performance on the Polar(32, 16) and Polar(64, 22) codes. The choice of the proposed curriculum is critical in achieving the accuracy gains of CRISP, as we show by comparing against other curricula. More notably, CRISP can be readily extended to Polarization-Adjusted-Convolutional (PAC) codes, where existing SC decoders are significantly less reliable. To the best of our knowledge, CRISP constructs the first data-driven decoder for PAC codes and attains near-optimal performance on the PAC(32, 16) code.

1. INTRODUCTION

Error-correcting codes (codes) are the backbone of modern digital communication. Codes, composed of (encoder, decoder) pairs, ensure reliable data transmission even under noisy conditions. Since the groundbreaking work of Shannon (1948) , several landmark codes have been proposed: Convolutional codes, low-density parity-check (LDPC) codes, Turbo codes, Polar codes, and more recently, Polarization-Adjusted-Convolutional (PAC) codes (Richardson & Urbanke, 2008) . In particular, polar codes, introduced by Arikan (2009), are widely used in practice owing to their reliable performance in the short blocklength regime. A family of variants of polar codes known as PAC codes further improves performance, nearly achieving the fundamental lower bound on the performance of any code at finite lengths, albeit at a higher decoding complexity (Arıkan, 2019) . In this paper, we focus on the decoding of these two classes of codes, jointly termed the "Polar code family". The polar family exhibits several crucial information-theoretic properties; practical finite-length performance, however, depends on high complexity decoders. This search for the design of efficient and reliable decoders for the Polar family is the focus of substantial research in the past decade. (a) Polar codes: The classical successive cancellation (SC) decoder achieves information-theoretic capacity asymptotically, but performs poorly at finite blocklengths compared to the optimal maximum a posteriori (MAP) decoder (Arıkan, 2019). To improve upon the reliability of SC, several polar decoders have been proposed in the literature (Sec. 6). One such notable result is the celebrated Successive-Cancellation-with-List (SCL) decoder (Tal & Vardy, 2015) . SCL improves upon the reliability of SC and approaches that of the MAP with increasing list size (and complexity). (b) PAC codes: The sequential "Fano decoder" (Fano, 1963) allows PAC codes to perform informationtheoretically near-optimally; however, the decoding time is long and variable (Rowshan et al., 2020a) . Although SC is efficient, O(n log n), its performance with PAC codes is significantly worse than that of the Fano decoder. Several works (Yao et al., 2021; Rowshan et al., 2020b; Zhu et al., 2020; Rowshan & Viterbo, 2021b; a; Sun et al., 2021) propose ameliorations; it is safe to say that constructing efficient and reliable decoders for the Polar family is an active area of research and of In this paper, we introduce a novel CurRIculum based Sequential neural decoder for Polar code family (CRISP). When the proposed curriculum is applied to neural network decoder training, thus trained decoders outperform existing baselines and attain near-MAP reliabilty on Polar(64, 22), Polar(32, 16) and PAC(32, 16) codes while maintaining low computational complexity (Figs. 1, 5 , Table 1 ). CRISP builds upon an inherent nested hierarchy of polar codes; a Polar(n, k) code subsumes all the codewords of lower-rate subcodes Polar(n, i), 1 ≤ i ≤ k (Sec. 2.2). We provide principled curriculum of training on examples from a sequence of sub-codes along this hierarchy, and demonstrate that the proposed curriculum is critical in attaining near-optimal performance (Sec. 4). Curriculum-learning (CL) is a training strategy to train machine learning models, starting with easier subtasks and then gradually increasing the difficulty of the tasks (Wang et al., 2021) . (Elman, 1993) , a seminal work, was one of the first to employ CL for supervised tasks, highlighting the importance of "starting small". Later, Bengio et al. (2009) formalized the notion of CL and studied when and why CL helps in the context of visual and language learning (Wu et al., 2020; Wang et al., 2021) . In recent years, many empirical studies have shown that CL improves generalization and convergence rate of various models in domains such as computer vision (Pentina et al., 2015; Jesson et al., 2017; Morerio et al., 2017; Guo et al., 2018; Wang et al., 2019) , natural language processing (Cirik et al., 2016; Platanios et al., 2019 ), speech processing (Amodei et al., 2016; Gao et al., 2016; 2018) , generative modeling (Karras et al., 2017; Wang et al., 2018) , and neural program generation (Zaremba & Sutskever, 2014; Reed & De Freitas, 2015) . Viewed from this context, our results add decoding of algebraic codes (of the Polar family) to the domain of successes of supervised CL. In summary, we make the following contributions: • We introduce CRISP, a novel curriculum-based sequential neural decoder for the Polar code family. Guided by information-theoretic insights, we propose CL-based techniques to train CRISP, that are crucial for its superior performance (Sec. 3). • We demonstrate that CRISP attains near-optimal reliability performance on Polar(64, 22) and Polar(32, 16) codes whilst achieving a good throughput (Sec. 4.1 and Sec. 4.2). • Compared to Fano's decoder, the CRISP decoder has significantly higher throughput and attains near-MAP reliability for the PAC(32, 16) code. To the best of our knowledge, this is the first learning-based PAC decoder to achieve this performance (Sec. 4.4).



Source code available at the following link.



Figure 1: (a) CRISP achieves near-MAP reliability for Polar(64, 22) code on the AWGN channel. (b) Our proposed curriculum is crucial for the gains CRISP attains over the baselines; details in Sec. 4.

