NOSE AUGMENT: FAST AND EFFECTIVE DATA AUG-MENTATION WITHOUT SEARCHING

Abstract

Data augmentation has been widely used for enhancing the diversity of training data and model generalization. Different from traditional handcrafted methods, recent research introduced automated search for optimal data augmentation policies and achieved state-of-the-art results on image classification tasks. However, these search-based implementations typically incur high computation cost and long search time because of large search spaces and complex searching algorithms. We revisited automated augmentation from alternate perspectives, such as increasing diversity and manipulating the overall usage of augmented data. In this paper, we present an augmentation method without policy searching called NOSE Augment (NO SEarch Augment). Our method completely skips policy searching; instead, it jointly applies multi-stage augmentation strategy and introduces more augmentation operations on top of a simple stochastic augmentation mechanism. With more augmentation operations, we boost the data diversity of stochastic augmentation; and with the phased complexity driven strategy, we ensure the whole training process converged smoothly to a good quality model. We conducted extensive experiments and showed that our method could match or surpass state-of-the-art results provided by search-based methods in terms of accuracies. Without the need for policy search, our method is much more efficient than the existing AutoAugment series of methods. Besides image classification, we also examine the general validity of our proposed method by applying our method to Face Recognition and Text Detection of the Optical Character Recognition (OCR) problems. The results establish our proposed method as a fast and competitive data augmentation strategy that can be used across various CV tasks.

1. INTRODUCTION

Data is an essential and dominant factor for learning AI models, especially in deep learning era where deep neural networks normally require large data volume for training. Data augmentation techniques artificially create new samples to increase the diversity of training data and in turn the generalization of AI models. For example, different image transformation operations, such as rotation, flip, shear etc., have been used to generate variations on original image samples in image classification and other computer vision tasks. More intricate augmentation operations have also been implemented, such as Cutout (Devries & Taylor, 2017) , Mixup (Zhang et al., 2018 ), Cutmix (Yun et al., 2019) , Sample Pairing (Inoue, 2018), and so on. How to formulate effective augmentation strategies with these basic augmentation methods becomes the crucial factor to the success of data augmentation. Recent works (Cubuk et al., 2019; Lim et al., 2019; Ho et al., 2019) introduced automated searching or optimization techniques in augmentation policy search. The common assumption of these methods is: a selected subset of better-fit augmentation policies will produce more relevant augmented data which will in turn result in a better trained model. Here the augmentation policy is defined by an ordered sequence of augmentation operations, such as image transformations, parameterized with probability and magnitude. Though these methods achieved state-of-the-art accuracies on image classification tasks, they lead to high computational cost in general, due to large search space and extra training steps. More importantly, it is worth exploring whether it is really necessary to find the best-fit subset of policies with specific parameter values of probability and magnitude. Ran-dAugment (Cubuk et al., 2020) has started to simplify the parameters and scale down the search space defined by AutoAugment Cubuk et al. ( 2019), but their method still relied on grid search for iterative optimization of the simplified parameters. Our method aims to fully avoid policy search and cost, meanwhile to maintain or improve model performance in terms of both accuracy and training efficiency. Our work showed that by applying simple stochastic augmentation policies with the same sampling space and other settings of training, we could obtain equal or very close performance with search-based augmentation methods. Another advantage of stochastic policies is that adding more operations in the pool does not bring additional cost; while in search-base methods, more operations in the pool causes exponential increase of the search space. Therefore, the second part of our method is to add more operations to the pool to bring more data diversity. In practice, we introduced a new category of operations such as mixup and cutmix into the operation pool. Furthermore, we tackled automated augmentation from overall data usage point of view, in contrast to data creation point of view accentuated by policy-search based methods. Inspired by the idea of Curriculum Learning (CL) (Bengio et al., 2009) , which presents training samples in an increasing order of difficulties, our method defines various complexity levels of augmentation strategies and applies them with orders on phased training stages. To avoid the confounding overfitting problem of original Curriculum Learning in practice, our method applies the inverted order of Curriculum Learning, which presents the hardest augmentation strategies from the beginning and gradually decreases the complexity levels. In general, our augmentation method replaces policy search with stochastic policy generation, upon which it introduces more operations for better diversity and phased augmentation strategy with decreasing complexities for a smooth learning convergence, and as an integral solution it achieves better results. Figure 1 describes our method and the difference compared to search-based methods. The main contributions of this paper can be summarized as follows:

Operation Pool

( N ops ) ✔ op 1 ✔ op 2 ... ✔ op N

Policy

1. We present a no-search (NOSE) augmentation method as an alternative of computation-intensive search-based auto-augment methods. By jointly applying phased augmentation strategy and introducing more augmentation operations on top of a simple stochastic augmentation mechanism, NOSE augment achieves state-of-the-art (SOTA) accuracies on CIFAR 10, CIFAR 100 (Krizhevsky, 2009) and close-to-SOTA results on other benchmark datasets. Our ablation study demonstrates that all the components of our methods should be combined together to achieve the best performance. 2. We demonstrate that a stochastic-based augmentation approach can obtain accuracies comparable to those of search-based methods while achieving overwhelming advantage on overall augmentation and training efficiency as the searching phase is completely avoided. 3. Besides image classification, we also applied NOSE augment on face recognition and text detection (OCR) tasks, and obtained competitive or better results in comparison with search-based methods. This further proves the advantage and generality of NOSE augment.



Figure 1: No Search (NOSE) Augment vs Search-based Augment

