INSTANCE-SPECIFIC AUGMENTATION: CAPTURING LOCAL INVARIANCES

Abstract

We introduce InstaAug, a method for automatically learning input-specific augmentations from data. Previous data augmentation methods have generally assumed independence between the original input and the transformation applied to that input. This can be highly restrictive, as the invariances that the augmentations are based on are themselves often highly input dependent; e.g., we can change a leaf from green to yellow while maintaining its label, but not a lime. InstaAug instead allows for input dependency by introducing an invariance module that maps inputs to tailored transformation distributions. It can be simultaneously trained alongside the downstream model in a fully end-to-end manner, or separately learned for a pre-trained model. We empirically demonstrate that InstaAug learns meaningful input-dependent augmentations for a wide range of transformation classes, which in turn provides better performance on both supervised and self-supervised tasks.

1. INTRODUCTION

Data augmentation is an important tool in deep learning (Shorten & Khoshgoftaar, 2019) . It allows one to incorporate inductive biases and invariances into models (Chen et al., 2019; Lyle et al., 2020) , providing a highly effective regularization technique that aids generalization (Goodfellow et al., 2016) . It has proved particularly successful for computer vision tasks, forming an essential component of many modern supervised (Perez & Wang, 2017; Krizhevsky et al., 2012; Cubuk et al., 2020; Mikołajczyk & Grochowski, 2018) and self-supervised (Bachman et al., 2019; Chen et al., 2020; Tian et al., 2020; Foster et al., 2021) approaches. Algorithmically, data augmentations apply a random transformation τ : X → X , τ ∼ p(τ ), to each input data point x ∈ X , before feeding this augmented data into the downstream model. These transformations are resampled each time the data point is used (e.g. at each training epoch), effectively populating the training set with additional samples. Augmentation is also sometimes used at test time by ensembling predictions from multiple transformations of the input. A particular augmentation is defined by the choice of the transformation distribution p(τ ), whose construction thus forms the key design choice. Good transformation distributions induce substantial and wide-ranging changes to the input, while preserving the information relevant to the task at hand. Data augmentation necessarily relies on exploiting problem-specific expertise: though aspects of p(τ ) can be learned from data (Benton et al., 2020) , trying to learn p(τ ) from the set of all possible transformation distributions is not only unrealistic, but actively at odds with the core motivations of introducing inductive biases and capturing invariances. One, therefore, restricts τ to transformations that reflect how we desire our model to generalize, such as cropping and color jitter for image data. Current approaches (Cubuk et al., 2018; Lim et al., 2019; Benton et al., 2020) are generally limited to learning augmentations where the transformation is sampled independently from the input it is applied to, such that p(τ ) has no dependence on x. This means that they are only able to learn global invariances, severely limiting their flexibility and potential impact. For example, when using color jittering, changing the color of a leaf from yellow to green would likely preserve its label, but the same transformation would change a lemon to a lime (see Figure 1b ). This transformation cannot be usefully applied as a global augmentation, even though it is a useful invariance for the specific input instance of a leaf. Similar examples regularly occur for other transformations, as shown in Figure 1 . To address this shortfall, we introduce InstaAug, a new approach that allows one to learn instancespecific augmentations that encapsulate local invariances of the underlying data generating process, 1

