TRADEOFFS IN DATA AUGMENTATION: AN EMPIRICAL STUDY

Abstract

Though data augmentation has become a standard component of deep neural network training, the underlying mechanism behind the effectiveness of these techniques remains poorly understood. In practice, augmentation policies are often chosen using heuristics of distribution shift or augmentation diversity. Inspired by these, we conduct an empirical study to quantify how data augmentation improves model generalization. We introduce two interpretable and easy-to-compute measures: Affinity and Diversity. We find that augmentation performance is predicted not by either of these alone but by jointly optimizing the two.

1. INTRODUCTION

Models that achieve state-of-the-art in image classification often use heavy data augmentation strategies. The best techniques use various transforms applied sequentially and stochastically. Though the effectiveness of this is well-established, the mechanism through which these transformations work is not well-understood.



Figure 1: Affinity and Diversity parameterize the performance of a model trained with augmentation. (a, b) Each point represents a different augmentation that yields test accuracy greater than (CIFAR-10: 84.7%, ImageNet: 71.1%). Color shows the final test accuracy relative to the baseline trained without augmentation (CIFAR-10: 89.7%, ImageNet: 76.1%). (c) Representation of how clean data and augmented data are related in the space of these two metrics. Higher diversity is represented by a larger bubble while distributional similarity is depicted through the overlap of bubbles. Test accuracy generally improves to the upper right in this space. Adding real new data to the training set is expected to be in the upper right corner.

