DECOUPLED MIXUP FOR DATA-EFFICIENT LEARNING

Abstract

Mixup is an efficient data augmentation approach that improves the generalization of neural networks by smoothing the decision boundary with mixed data. Recently, dynamic mixup methods have improved previous static policies effectively (e.g., linear interpolation) by maximizing salient regions or maintaining the target in mixed samples. The discrepancy is that the generated mixed samples from dynamic policies are more instance discriminative than the static ones, e.g., the foreground objects are decoupled from the background. However, optimizing mixup policies with dynamic methods in input space is an expensive computation compared to static ones. Hence, we are trying to transfer the decoupling mechanism of dynamic methods from the data level to the objective function level and propose the general decoupled mixup (DM) loss. The primary effect is that DM can adaptively focus on discriminative features without losing the original smoothness of the mixup while avoiding heavy computational overhead. As a result, DM enables static mixup methods to achieve comparable or even exceed the performance of dynamic methods. This also leads to an interesting objective design problem for mixup training that we need to focus on both smoothing the decision boundaries and identifying discriminative features. Extensive experiments on supervised and semisupervised learning benchmarks across seven classification datasets validate the effectiveness of DM by equipping it with various mixup methods.

1. INTRODUCTION

Deep Learning has become the bedrock of modern AI for many tasks in machine learning (Bishop, 2006) such as computer vision (He et al., 2016; 2017) , natural language processing (Devlin et al., 2018) . Using a large number of learnable parameters, deep neural networks (DNNs) can recognize subtle dependencies in large training datasets to be later leveraged to perform accurate predictions on unseen data. However, models might overfit the training set without constraints or enough data (Srivastava et al., 2014) . To this end, regularization techniques have been deployed to improve generalization (Wan et al., 2013) , which can be categorized into data-independent or data-dependent ones (Guo et al., 2019) . Some data-independent strategies, for example, constrain the model by punishing the parameters' norms, such as weight decay (Loshchilov & Hutter, 2017). Among data-dependent strategies, data augmentations (Shorten & Khoshgoftaar, 2019) are widely used. Mixup (Zhang et al., 2017; Yun et al., 2019) , a data-dependent augmentation technique, is proposed to generate virtual samples by a linear combination of data pairs and the corresponding labels with the mixing ratio λ ∈ [0, 1]. DNNs trained with this technique are typically more generalizable and calibrated (Thulasidasan et al., 2019) , whose prediction accuracy tends to be consistent with confidence. The main reason is that mixup heuristically smooths the decision boundary to improve 



inconsistency between mixed labels and sample

Figure 1: Illustration of the problem of label mismatch. The red mixed labels are the ground truth.

