ADVERSARIAL PERTURBATION BASED LATENT RECON-STRUCTION FOR DOMAIN-AGNOSTIC SELF-SUPERVISED LEARNING

Abstract

Most self-supervised learning (SSL) methods rely on domain-specific pretext tasks and data augmentations to learn high-quality representations from unlabeled data. Development of those pretext tasks and data augmentations requires expert domain knowledge. In addition, it is not clear why solving certain pretext tasks leads to useful representations. Those two reasons hinder wider application of SSL to different domains. To overcome such limitations, we propose adversarial perturbation based latent reconstruction (APLR) for domain-agnostic self-supervised learning. In APLR, a neural network is trained to generate adversarial noise to perturb the unlabeled training sample so that domain-specific augmentations are not required. The pretext task in APLR is to reconstruct the latent representation of a clean sample from a perturbed sample. We show that representation learning via latent reconstruction is closely related to multi-dimensional Hirschfeld-Gebelein-Rényi (HGR) maximal correlation and has theoretical guarantees on the linear probe error. To demonstrate the effectiveness of APLR, the proposed method is applied to various domains such as tabular data, images, and audios. Empirical results indicate that APLR not only outperforms existing domain-agnostic SSL methods, but also closes the performance gap to domain-specific SSL methods. In many cases, APLR also outperforms training the full network in a supervised manner.

1. INTRODUCTION

Unsupervised deep learning has been highly successful in discovering useful representations in natural language processing (NLP) (Devlin et al., 2019; Brown et al., 2020) and computer vision (Chen et al., 2020; He et al., 2020) . These methods define pretext tasks on unlabeled data so that unsupervised representation learning can be done in a self-supervised manner without explicit human annotations. The success of self-supervised learning (SSL) depends on domain-specific pretext tasks, as well as domain-specific data augmentations. However, the development of semantic-preserving data augmentations requires expert domain knowledge, and such knowledge may not be readily available for certain data types such as tabular data (Ucar et al., 2021) . Furthermore, the theoretical understanding of why certain pretext tasks lead to useful representations remains fairly elusive Tian et al. (2021) . Those two reasons hinder wider applications of SSL beyond the fields of NLP and computer vision. Self-supervised algorithms benefit from inductive biases from domain-specific designs but they do not generalize across different domains. For example, masked language models like BERT (Devlin et al., 2019) are not directly applicable to untokenized data. Although contrastive learning does not require tokenized data, its success in computer vision cannot be easily leveraged in other domains due to its sensitivity to image-specific data augmentations (Chen et al., 2020) . Furthermore, in contrastive learning, the quality of representations degrades significantly without those hand-crafted data augmentations (Grill et al., 2020) . Inspired by denoising auto-encoding (Vincent et al., 2008; 2010; Pathak et al., 2016) , perturbation of natural samples with Gaussian, Bernoulli, and mixup noises (Verma et al., 2021; Yoon et al., 2020) has been utilized as domain-agnostic data augmentations applicable for self-supervised representation learning of images, graphs, and tabular data. However, random noises may not be as effective since uniformly perturbing uninformative features may not lead to the intended goal of augmentations. Specifically, convex combinations in mixup noises (Zhang

