PATCH-LEVEL NEIGHBORHOOD INTERPOLATION: A GENERAL AND EFFECTIVE GRAPH-BASED REGU-LARIZATION STRATEGY

Abstract

Regularization plays a crucial role in machine learning models, especially for deep neural networks. The existing regularization techniques mainly rely on the i.i.d. assumption and only consider the knowledge from the current sample, without the leverage of neighboring relationship between samples. In this work, we propose a general regularizer called Patch-level Neighborhood Interpolation (Pani) that conducts a non-local representation in the computation of network. Our proposal explicitly constructs patch-level graphs in different network layers and then linearly interpolates neighborhood patch features, serving as a general and effective regularization strategy. Further, we customize our approach into two kinds of popular regularization methods, namely Virtual Adversarial Training (VAT) and MixUp as well as its variants. The first derived Pani VAT presents a novel way to construct non-local adversarial smoothness by employing patch-level interpolated perturbations. In addition, the second derived Pani MixUp method extends the original MixUp regularization and its variant to the Pani version, achieving a significant improvement in the performance. Finally, extensive experiments are conducted to verify the effectiveness of our Patch-level Neighborhood Interpolation approach in both supervised and semi-supervised settings.

1. INTRODUCTION

In the statistical learning theory, regularization techniques are typically leveraged to achieve the trade-off between empirical error minimization and the control of model complexity (Vapnik & Chervonenkis, 2015) . In contrast to the classical convex empirical risk minimization where regularization can rule out trivial solutions, regularization plays a rather different role in deep learning due to its highly non-convex optimization nature (Zhang et al., 2016) . Among all the explicit and implicit regularization, regularization with stochastic transformation, perturbations and randomness, such as adversarial training (Goodfellow et al., 2014), dropout and MixUp (Zhang et al., 2017) , play a key role in the deep learning models due to their superiority in the performance (Berthelot et al., 2019b; Zhang et al., 2017; Miyato et al., 2018; Berthelot et al., 2019a) . In this section, we firstly review two kinds of effective and prestigious regularization branches for deep neural networks, which can elegantly generalize from supervised learning to semi-supervised setting. Adversarial Training (Goodfellow et al., 2014; Madry et al., 2017) can provide an additional regularization beyond that provided by other generic regularization strategies, such as dropout, pretraining and model averaging. However, recent works (Zhang et al., 2019; Tsipras et al., 2018) demonstrated that this kind of training method holds a trade-off between the robustness and accuracy, limiting the efficacy of the adversarial regularization. Besides, Virtual Adversarial Training (VAT) (Miyato et al., 2018) can be regarded as a natural extension of adversarial training to semi-supervised setting through adversarially smoothing the posterior output distribution with the leverage of unlabeled data. This strategy has achieved great success in image classification (Miyato et al., 2018 ), text classification (Miyato et al., 2016) and node classification (Sun et al., 2019) . Tangent-Normal Adversarial Regularization (TNAR) (Yu et al., 2019) extended VAT by taking the data manifold into consideration and applied VAT along the tangent space and the orthogonal normal space of the data manifold, outperforming previous semi-supervised approaches. MixUp (Zhang et al., 2017) augmented the training data by incorporating the prior knowledge that linear interpolation of input vectors should lead to linear interpolation of the associated targets, accomplishing consistent improvement of generalization on image, speech and tabular data. Mix-Match (Berthelot et al., 2019b) extended MixUp to semi-supervised tasks by guessing low-entropy labels for data-augmented unlabeled examples and mixing labeled and unlabeled data using MixUp. In contrast with VAT, MixMatch (Berthelot et al., 2019b) utilizes one specific form of consistency regularization, i.e., using the standard data augmentation for images, such as random horizontal flips, rather than computing adversarial perturbations to smooth the posterior distribution of the classifier. Nevertheless, the vast majority of regularization methods, including the aforementioned approaches, assume that the training samples are drawn independently and identically from an unknown data generating distribution. For instance, Support Vector Machine (SVM), Back-Propagation (BP) for Neural Networks, and many other common algorithms implicitly make this assumption as part of their derivation. However, this i.i.d. assumption is commonly violated in realistic scenarios where batches or sub-groups of training samples are likely to have internal correlations. In particular, Dundar et al. (2007) demonstrated that accounting for the correlations in real-world training data leads to statistically significant improvements in accuracy. Similarly, Peer-Regularized Networks (Peer-Net) (Svoboda et al., 2018) applied graph convolutions (Velickovic et al., 2017; Kipf & Welling, 2016) to harness information of peer samples, and verified its effectiveness on defending adversarial attacks. Motivated by these facts, we aim to design a general regularization strategy that can fully utilize the internal relationship between samples by explicitly constructing a graph within a minibatch in order to consistently improve the generalization of deep neural networks in both semi-and supervised settings. In this paper, we propose the Patch-level Neighborhood Interpolation (Pani) for deep neural networks, serving as a simple yet effective non-local regularization. We firstly construct a patch-level graph in each mini-batch during the stochastic gradient decent training process. Then we apply linear interpolation on the neighboring patch features and the resulting non-local representation additionally captures the relationship of neighboring patch features in different layers, serving as a general and effective regularization. Furthermore, to prove the generality and superiority of our Pani method, we explicitly customize our approach into two kinds of popular and general regularization strategies, i.e., Virtual Adversarial Regularization and MixUp, resulting in Pani VAT and Pani MixUp. For the Pani VAT, we reformulate the construction of adversarial perturbations, transforming from solely depending on the current sample to the linear interpolation of neighboring patch features. This non-local adversarial perturbations can leverage the information of neighboring correlation from all samples within a batch, providing more informative adversarial smoothness in semisupervised setting. Besides, in the Pani MixUp, we extend MixUp and its variant MixMatch from image to patch level by mixing fine-grained patch features and corresponding supervised signals. Finally, we conduct extensive experiments to demonstrate that both of the two derived regularization strategies can outperform other state-of-the-art approaches in both supervised and semi-supervised tasks. More importantly, these successful cases verify the generality and superiority of our Patchlevel Neighborhood Interpolation method. Our contributions can be summarized as follow: • We propose a general interpolation strategy either in input or feature space, i.e., Patch-level Neighborhood Interpolation, helping to improve the generalization of deep neural networks on both semi-and supervised scenarios. This strategy can serve as an effective graph-based representation method and has much potential to be leveraged in a wider range of tasks. • Based on our method, the customized approaches Pani VAT and Pani MixUP as well as Pani MixMatch can boost the generalization performance significantly, and thus provide a guidance to the deployment of our Pani strategy into more regularization methods.

2. OUR METHOD: PATCH-LEVEL NEIGHBORHOOD INTERPOLATION

Before introducing our approach, we highly recommend readers to go through some preliminary knowledge about VAT (Miyato et al., 2017 ), MixUP (Zhang et al., 2017) and PeerNet (Svoboda et al., 2018) in Appendix A. For our method, one related work is PeerNet (Svoboda et al., 2018 ) that designed graph-based layers to defend against adversarial attacks, but unfortunately the construction of pixel-level K-NN graphs in PeerNet (Svoboda et al., 2018) is costly in computation. By contrast, our motivation is to develop a general regularization that can consistently boost the performance of

