NEAR-OPTIMAL LINEAR REGRESSION UNDER DIS-TRIBUTION SHIFT

Abstract

Transfer learning is an essential technique when sufficient data comes from the source domain, while no or scarce data is from the target domain. We develop estimators that achieve minimax linear risk for linear regression problems under the distribution shift. Our algorithms cover different kinds of settings with covariate shift or model shift. We also consider when data are generating from either linear or general nonlinear models. We show that affine minimax rules are within an absolute constant of the minimax risk even among nonlinear rules for various source/target distributions.

1. INTRODUCTION

The success of machine learning crucially relies on the availability of labeled data. The data labeling process usually requires much human labor and can be very expensive and time-consuming, especially for large datasets like ImageNet (Deng et al., 2009) . On the other hand, models trained on one dataset, despite performing well on test data from the same distribution they are trained on, are often sensitive to distribution shifts, i.e., they do not adapt well to related but different distributions. Even small distributional shift can result in substantial performance degradation (Recht et al., 2018; Lu et al., 2020) . Transfer learning has been an essential paradigm to tackle the challenges associated with insufficient labeled data (Pan & Yang, 2009; Weiss et al., 2016; Long et al., 2017) . The main idea is to make use of a source domain with a lot of labeled data (e.g. ImageNet), and to try to learn a model that performs well on our target domain (e.g. medical images) where few or no labels are available. Despite the lack of labeled data, we may still use unlabeled data from the target domain, which are usually much easier to obtain and can provide helpful information about the target domain. Although this approach has been integral to many applications, many fundamental questions are left open even in very basic settings. In this work, we focus on the setting of linear regression under distribution shift and ask the fundamental question of how to optimally learn a linear model for a target domain, using labeled data from a source domain and unlabeled data (and possibly some labeled data) from the target domain. For various settings, including covariate shift (i.e., when p(x) changes) and model shift (i.e., when p(y|x) changes), we develop estimators that achieve near minimax risk (up to universal constant factors) among all linear estimation rules. Here linear estimators refer to all estimators that depend linearly on the label vector; these include almost all popular estimators known in linear regression, such as ridge regression and its variants. When the input covariances in source and target domains commute, we prove that our estimators achieve near minimax risk among all possible estimators.

