BYPASSING THE RANDOM INPUT MIXING IN MIXUP Anonymous

Abstract

Mixup and its variants have promoted a surge of interest due to their capability of boosting the accuracy of deep models. For a random sample pair, such approaches generate a set of synthetic samples through interpolating both the inputs and their corresponding one-hot labels. Current methods either interpolate random features from an input pair or learn to mix salient features from the pair. Nevertheless, the former methods can create misleading synthetic samples or remove important features from the given inputs, and the latter strategies incur significant computation cost for selecting descriptive input regions. In this paper, we show that the effort needed for the input mixing can be bypassed. For a given sample pair, averaging the features from the two inputs and then assigning it with a set of soft labels can effectively regularize the training. We empirically show that the proposed approach performs on par with state-of-the-art strategies in terms of predictive accuracy.

1. INTRODUCTION

Deep neural networks have demonstrated their profound successes in many challenging real-world applications, including image classification (Krizhevsky et al., 2012) , speech recognition (Graves et al., 2013) , and machine translation (Bahdanau et al., 2015; Sutskever et al., 2014) . One key factor attributing to such successes is the deployment of effective model regularization techniques, which empower the learning to avoid overfitting the training data and to generalize well to unseen samples. This is because current deep models typically embrace high modeling freedom with a very large number of parameters. To this end, many regularizers for deep models have been introduced, including weight decay (Hanson & Pratt, 1988 ), dropout (Srivastava et al., 2014 ), stochastic depth (Huang et al., 2016) , batch normalization (Ioffe & Szegedy, 2015) , and data augmentation schemes (Cubuk et al., 2019; Hendrycks et al., 2020; Inoue, 2018; Lecun et al., 1998; Simard et al., 1998) . (Guo, 2020; Guo et al., 2019; Kim et al., 2020; Li et al., 2020a; Tokozume et al., 2018a; b; Verma et al., 2019; Yun et al., 2019; Zhang et al., 2018) have attracted a surge of interest and shown their effectiveness on boosting the accuracy of deep networks. 1



Figure 1: Illustration of the proposed method. The mixed input is the average of the two inputs; the training target for the averaged input is the combination of the local soft label (average of the two one-hot targets) and the global soft label (dynamically generated during training). Among those effective regularizers, Mixup (Zhang et al., 2018) is a simple and yet effective, dataaugmentation based regularizer for enhancing the deep classification models. Through linearly interpolating random input pairs and their training targets in one-hot representation, Mixup generates a set of synthetic examples with soft labels to regularize the training. Such pairwise, label-variant data augmentation techniques(Guo,  2020; Guo et al., 2019;  Kim et al., 2020; Li et al.,  2020a; Tokozume et al., 2018a;b; Verma et al., 2019; Yun et al., 2019; Zhang et al., 2018)  have attracted a surge of interest and shown their effectiveness on boosting the accuracy of deep networks.

