FEW-SHOT DOMAIN ADAPTATION FOR END-TO-END COMMUNICATION

Abstract

The problem of end-to-end learning of a communication system using an autoencoder -consisting of an encoder, channel, and decoder modeled using neural networks -has recently been shown to be an effective approach. A challenge faced in the practical adoption of this learning approach is that under changing channel conditions (e.g. a wireless link), it requires frequent retraining of the autoencoder in order to maintain a low decoding error rate. Since retraining is both time consuming and requires a large number of samples, it becomes impractical when the channel distribution is changing quickly. We propose to address this problem using a fast and sample-efficient (few-shot) domain adaptation method that does not change the encoder and decoder networks. Different from conventional training-time unsupervised or semi-supervised domain adaptation, here we have a trained autoencoder from a source distribution that we want to adapt (at test time) to a target distribution using only a small labeled dataset, and no unlabeled data. We focus on a generative channel model based on the Gaussian mixture density network (MDN), and propose a regularized, parameter-efficient adaptation of the MDN using a set of affine transformations. The learned affine transformations are then used to design an optimal transformation at the decoder input to compensate for the distribution shift, and effectively present to the decoder inputs close to the source distribution. Experiments on many simulated distribution changes common to the wireless setting, and a real mmWave FPGA testbed demonstrate the effectiveness of our method at adaptation using very few target domain samples 1 .

1. INTRODUCTION

End-to-end (e2e) learning of a communication system using an autoencoder has been recently shown to be a promising approach for designing the next generation of wireless networks (O'Shea & Hoydis, 2017; Dörner et al., 2018; Aoudia & Hoydis, 2019; O'Shea et al., 2019; Ye et al., 2018; Wang et al., 2017) . This new paradigm is a viable alternative for optimizing communication in diverse applications, hardware, and environments (Hoydis et al., 2021) . It is particularly promising for dense deployments of low-cost transceivers, where there is interference between the devices and hardware imperfections that are difficult to model analytically. The key idea of e2e learning for a communication system is to use an autoencoder architecture to model and learn the transmitter and receiver jointly using neural networks in order to minimize the e2e symbol error rate (SER). The channel (i.e., propagation medium and transceiver imperfections) can be represented as a stochastic transfer function that transforms its input z ∈ R d to an output x ∈ R d . It can be regarded as a black-box that is typically non-linear and non-differentiable due to hardware imperfections (e.g., quantization and amplifiers). Since autoencoders are trained using stochastic gradient descent (SGD)-based optimization (O'Shea & Hoydis, 2017) , it is challenging to work with a black-box channel that is not differentiable. One approach to address this problem is to use a known mathemat-ical model of the channel (e.g., additive Gaussian noise), which would enable the computation of gradients with respect to the autoencoder parameters via backpropagation. However, such standard channel models do not capture well the realistic channel effects as shown in Aoudia & Hoydis (2018) . Alternatively, recent works have proposed to learn the channel using deep generative models that approximate p(x | z), the conditional probability density of the channel, using Generative Adversarial Networks (GANs) (O'Shea et al., 2019; Ye et al., 2018) , Mixture Density Networks (MDNs) (García Martí et al., 2020) , and conditional Variational Autoencoders (VAEs) (Xia et al., 2020) . The use of a differentiable generative model of the channel enables SGD-based training of the autoencoder, while also capturing realistic channel effects better than standard models. Although this e2e optimization with a generative channel model learned from data can improve the physical-layer design for communication systems, in reality, channels often change, requiring collection of a large number of samples and frequent retraining of the channel model and autoencoder. For this reason, adapting the generative channel model and the autoencoder as often as possible, using only a small number of samples is required for good communication performance. Prior works have (to be best of our knowledge) not addressed the adaptation problem for autoencoder-based e2e learning, which is crucial for real-time deployment of such a system under frequently-changing channel conditions. In this paper, we study the problem of domain adaptation (DA) of autoencoders using an MDN as the channel model. In contrast to conventional DA, where the target domain has a large unlabeled dataset and sometimes also a small labeled dataset (semi-supervised DA) (Ben-David et al., 2006) , here we consider a few-shot DA setting where the target domain has only a small labeled dataset, and no unlabeled data. This setting applies to our problem since we only get to collect a small number of labeled samples at a time from the changing target domain (here the channel)foot_1 . Towards addressing this important practical problem, we make the following contributions: • We propose a parameter-and sample-efficient method for adapting a generative MDN (used for modeling the channel) based on the properties of Gaussian mixtures ( § 3.1 and § 3.2). • Based on the MDN adaptation, we propose an optimal input-transformation method at the decoder that compensates for changes in the channel distribution, and decreases or maintains the error rate of the autoencoder without any modification to the encoder and decoder networks ( § 3.3). • Experiments on a mmWave FPGA platform and a number of simulated distribution changes show strong performance improvements for our method. For instance, in the FPGA experiment, our method improves the SER by 69% with only 10 samples per class from the target distribution ( § 4).

Related Work.

Recent approaches for DA such as DANN (Ganin et al., 2016) , based on adversarial learning of a shared representation between the source and target domains (Ganin & Lempitsky, 2015; Ganin et al., 2016; Long et al., 2018; Saito et al., 2018; Zhao et al., 2019; Johansson et al., 2019) , have achieved much success on computer vision and natural language processing. Their high-level idea is to adversarially learn a shared feature representation for which inputs from the source and target distributions are nearly indistinguishable to a domain discriminator DNN, such that a label predictor DNN using this representation and trained using labeled data from only the source domain also generalizes well to the target domain. Adversarial DA methods are not suitable for our problem, which requires fast and frequent test-time DA, because of their high computational and sample complexity and the imbalance in the number of source and target domain samples. 



Code for our work: https://github.com/jayaram-r/domain-adaptation-autoencoder In our problem, labels correspond to the transmitted messages and are essentially obtained for free (see § 3).



Related frameworks such as transfer learning(Long et al., 2015; 2016  ), model-agnostic metalearning (Finn et al., 2017)), domain-adaptive few-shot learning(Zhao et al., 2021; Sun et al., 2019),  and supervised DA (Motiian et al., 2017a;b)  also deal with the problem of adaptation using a small number of samples. Most of them are not applicable to our problem because they primarily address novel classes (with potentially different distributions), and knowledge transfer from existing to novel tasks.Motiian et al. (2017a)  is closely related since they also deal with a target domain that only has a small labeled dataset and has the same label space. The key difference is that Motiian et al. (2017a) address the training-time few-shot DA problem, while we focus on test-time few-shot DA. Specifically, their adversarial DA method requires both the source and target domain datasets at training time, and can be computationally expensive to retrain for every new batch of target domain data (a key motivation for this work is to avoid frequent retraining).

