PATCH-LEVEL NEIGHBORHOOD INTERPOLATION: A GENERAL AND EFFECTIVE GRAPH-BASED REGU-LARIZATION STRATEGY

Abstract

Regularization plays a crucial role in machine learning models, especially for deep neural networks. The existing regularization techniques mainly rely on the i.i.d. assumption and only consider the knowledge from the current sample, without the leverage of neighboring relationship between samples. In this work, we propose a general regularizer called Patch-level Neighborhood Interpolation (Pani) that conducts a non-local representation in the computation of network. Our proposal explicitly constructs patch-level graphs in different network layers and then linearly interpolates neighborhood patch features, serving as a general and effective regularization strategy. Further, we customize our approach into two kinds of popular regularization methods, namely Virtual Adversarial Training (VAT) and MixUp as well as its variants. The first derived Pani VAT presents a novel way to construct non-local adversarial smoothness by employing patch-level interpolated perturbations. In addition, the second derived Pani MixUp method extends the original MixUp regularization and its variant to the Pani version, achieving a significant improvement in the performance. Finally, extensive experiments are conducted to verify the effectiveness of our Patch-level Neighborhood Interpolation approach in both supervised and semi-supervised settings.

1. INTRODUCTION

In the statistical learning theory, regularization techniques are typically leveraged to achieve the trade-off between empirical error minimization and the control of model complexity (Vapnik & Chervonenkis, 2015) . In contrast to the classical convex empirical risk minimization where regularization can rule out trivial solutions, regularization plays a rather different role in deep learning due to its highly non-convex optimization nature (Zhang et al., 2016) . Among all the explicit and implicit regularization, regularization with stochastic transformation, perturbations and randomness, such as adversarial training (Goodfellow et al., 2014), dropout and MixUp (Zhang et al., 2017) , play a key role in the deep learning models due to their superiority in the performance (Berthelot et al., 2019b; Zhang et al., 2017; Miyato et al., 2018; Berthelot et al., 2019a) . In this section, we firstly review two kinds of effective and prestigious regularization branches for deep neural networks, which can elegantly generalize from supervised learning to semi-supervised setting. Adversarial Training (Goodfellow et al., 2014; Madry et al., 2017) can provide an additional regularization beyond that provided by other generic regularization strategies, such as dropout, pretraining and model averaging. However, recent works (Zhang et al., 2019; Tsipras et al., 2018) demonstrated that this kind of training method holds a trade-off between the robustness and accuracy, limiting the efficacy of the adversarial regularization. Besides, Virtual Adversarial Training (VAT) (Miyato et al., 2018) can be regarded as a natural extension of adversarial training to semi-supervised setting through adversarially smoothing the posterior output distribution with the leverage of unlabeled data. This strategy has achieved great success in image classification (Miyato et al., 2018 ), text classification (Miyato et al., 2016) and node classification (Sun et al., 2019) . Tangent-Normal Adversarial Regularization (TNAR) (Yu et al., 2019) extended VAT by taking the data manifold into consideration and applied VAT along the tangent space and the orthogonal normal space of the data manifold, outperforming previous semi-supervised approaches.

