INTERPOLATING COMPRESSED PARAMETER SUBSPACES

Abstract

Though distribution shifts have caused growing concern for machine learning scalability, solutions tend to specialize towards a specific type of distribution shift. Methods for label shift may not succeed against domain or task shift, and vice versa. We learn that constructing a Compressed Parameter Subspaces (CPS), a geometric structure representing distance-regularized parameters mapped to a set of train-time distributions, can maximize average accuracy over a broad range of distribution shifts concurrently. We show sampling parameters within a CPS can mitigate backdoor, adversarial, permutation, stylization and rotation perturbations. We also show training a hypernetwork representing a CPS can adapt to seen tasks as well as unseen interpolated tasks.

1. INTRODUCTION

Recent work on the geometry of the loss landscape, such as neural subspaces (Wortsman et al., 2021) and mode connectivity (Fort & Jastrzebski, 2019; Draxler et al., 2019; Garipov et al., 2018) discovered properties of robustness between multiple parameters. Departing from constructing subspaces w.r.t. a single/unperturbed input distribution, we investigate the construction of subspces w.r.t. multiple perturbed distributions, and find improved mappability between shifted distributions and low-loss parameters contained in these subspaces. Contributions. We share a method to construct a compressed parameter subspace such that the likelihood of a parameter sampled from this subspace can be mapped to a shifted input distribution is higher. We demonstrate a high average accuracy across distribution shifts in single and multiple test-time settings (Figure 1 ). We show improved robustness across perturbation types, reduced catastrophic forgetting on Split-CIFAR10/100, and strong capacity for multitask solutions and unseen/distant tasks.

