GRAPH PERMUTATION SELECTION FOR DECODING OF ERROR CORRECTION CODES USING SELF-ATTENTION Anonymous authors Paper under double-blind review

Abstract

Error correction codes are an integral part of communication applications and boost the reliability of transmission. The optimal decoding of transmitted codewords is the maximum likelihood rule, which is NP-hard. For practical realizations, suboptimal decoding algorithms are employed; however, the lack of theoretical insights currently impedes the exploitation of the full potential of these algorithms. One key insight is the choice of permutation in permutation decoding. We present a data-driven framework for permutation selection combining domain knowledge with machine learning concepts such as node embedding and self-attention. Significant and consistent improvements in the bit error rate are shown for the simulated Bose Chaudhuri Hocquenghem (BCH) code as compared to the baseline decoders. To the best of our knowledge, this work is the first to leverage the benefits of self-attention networks in physical layer communication systems.

1. INTRODUCTION

Shannon's well known channel coding theorem (Shannon, 1948) states that for every channel a code exists, such that encoded messages can be transmitted and decoded with an error as low as needed while the transmission rate is below the channel's capacity. For practical applications, latency and computational complexity constrain code size. Thus, structured codes with low complexity encoding and decoding schemes, were devised. Some structured codes possess a main feature known as the permutation group (PG). The permutations in PG map each codeword to some distinct codeword. This is crucial to different decoders, such as the parallelizable soft-decision Belief Propagation (BP) (Pearl, 2014) decoder. It empirically stems from evidence that whereas decoding various corrupted words may fail, decoding a permuted version of the same corrupted words may succeed (Macwilliams, 1964) . For instance, this is exploited in the mRRD (Dimnik & Be'ery, 2009) and the BPL (Elkelesh et al., 2018) algorithms, which perform multiple runs over different permuted versions of the same corrupted codewords by trading off complexity for higher decoding gains. Nonetheless, there is room for improvement since not all permutations are required for successful decoding of a given word: simply a fitting one is needed. Our work deals with obtaining the best fit permutation per word, by removing redundant runs which thus preserve computational resources. Nevertheless, it remains unclear how to obtain this type of permutation as indicated by the authors in (Elkelesh et al., 2018) who stated in their Section III.A, "there exists no clear evidence on which graph permutation performs best for a given input". Explicitly, the goal is to approximate a function mapping from a single word to the most probable-to-decode permutation. While analytical derivation of this function is hard, advances in the machine learning field may be of use in the computation of this type of function. The recent emergence of Deep Learning (DL) has demonstrated the advantages of Neural Networks (NN) in a myriad of communication and information theory applications where no analytical solutions exists (Simeone, 2018; Zappone et al., 2019) . For instance in (Belghazi et al., 2018) , a tight lower bound on the mutual information between two high-dimensional continuous variables was estimated with NN. Another recurring motive for the use of NN in communications has to do with the amount of data at hand. Several data-driven solutions were described in (Caciularu & Burshtein, 2018; Lin et al., 2019) for scenarios with small amounts of data, since obtaining data samples in

