COMFORT ZONE: A VICINAL DISTRIBUTION FOR REGRESSION PROBLEMS

Abstract

Domain-dependent data augmentation methods generate artificial samples using transformations suited for the underlying data domain, such as rotations on images and time warping on time series data. However, domain-independent approaches, e.g. mixup, are applicable to various data modalities, and as such they are general and versatile. While mixup-based techniques are used extensively in classification problems, their effect on regression tasks is somewhat less explored. To bridge this gap, we study the problem of domain-independent augmentation for regression, and we introduce comfort-zone: a new data-driven domainindependent data augmentation method. Essentially, our approach samples new examples from the tangent planes of the train distribution. Augmenting data in this way aligns with the network tendency towards capturing the dominant features of its input signals. Evaluating comfort-zone on regression and time series forecasting benchmarks, we show that it improves the generalization of several neural architectures. We also find that mixup and noise injection are less effective in comparison to comfort-zone.

1. INTRODUCTION

Classification and regression problems primarily differ in their output's domain. In classification, we have a finite set of labels, whereas in regression, the range is an infinite set of quantitieseither discrete or continuous (Goodfellow et al., 2016) . In classical work (Devroye et al., 2013) , classification is argued to be "easier" than regression, but more generally, it is agreed by many that classification and regression problems should be treated differently (Muthukumar et al., 2021) . Particularly, the differences between classification and regression are actively explored in the context of regularization. Regularizing neural networks to improve their performance on new samples has received a lot of attention in the past few years. One of the main reasons for this increased interest is that most of the recent successful neural models are overparameterized. Namely, the amount of learnable parameters is significantly larger than the number of available training samples (Allen-Zhu et al., 2019a; b) , and thus regularization is often necessary to alleviate overfitting issues. Recent studies on overparameterized linear models identify conditions under which overfitting is "benign" in regression (Bartlett et al., 2020) , and uncover the relationship between the choice of loss functions in classification and regression tasks (Muthukumar et al., 2021) . Still, the regularization of deep neural regression networks is not well understood. In this work, we focus on a common regularization approach known as Data Augmentation (DA) in which data samples are artificially generated and used during training. In general, DA techniques can be categorized into domain-dependent (DD) methods and domain-independent (DI) approaches. The former techniques are specific for a certain data modality such as images, whereas the latter methods typically do not depend on the data modality. Numerous DD-and DI-DA approaches are available for classification tasks (Shorten & Khoshgoftaar, 2019; Shorten et al., 2021) , and many of them consistently improve over non-augmented models. Unfortunately, DI-DA for regression problems is a significantly less explored topic. Recent works on linear models study the connection between the DA policy and optimization (Hanin & Sun, 2021) , as well as the generalization effects of linear DA transformations (Wu et al., 2020) . We contribute to this line of work by proposing and analyzing a new domain-independent data augmentation method for nonlinear deep regression, and by extensively evaluating our approach in comparison to existing baselines. Many strong data augmentation methods were proposed in the past few years. Particularly relevant to our study is the family of mixup-based techniques that are commonly used in classification

