DON'T FORGET THE NULLSPACE! NULLSPACE OCCU-PANCY AS A MECHANISM FOR OUT OF DISTRIBUTION FAILURE

Abstract

Out of distribution (OoD) generalization has received considerable interest in recent years. In this work, we identify a particular failure mode of OoD generalization for discriminative classifiers that is based on test data (from a new domain) lying in the nullspace of features learnt from source data. We demonstrate the existence of this failure mode across multiple networks trained across RotatedMNIST, PACS, TerraIncognita, DomainNet and ImageNet-R datasets. We then study different choices for characterizing the feature space and show that projecting intermediate representations onto the span of directions that obtain maximum training accuracy provides consistent improvements in OoD performance. Finally, we show that such nullspace behavior also provides an insight into neural networks trained on poisoned data. We hope our work galvanizes interest in the relationship between the nullspace occupancy failure mode and generalization.

1. INTRODUCTION

Neural networks often succeed in learning rich function approximators that generalize remarkably well to the distribution they are trained on, but are often brittle when exposed to inputs that come from a different distribution (Gulrajani & Lopez-Paz, 2020) . With rapid adoption of neural networks to various safety critical applications such as autonomous driving, healthcare etc. more attention is being paid to the question of robustness under domain shift (Alcorn et al., 2018; Dai & Van Gool, 2018; AlBadawy et al., 2018) . Recent findings from Huh et al. ( 2021) hint that overparameterized, deep neural networks are biased to learn functions with (approximate) low-rank covariance structure and posit that this might be related to the phenomenon of implicit regularization (Galanti & Poggio, 2022) that has been used to explain in-distribution generalization of deep networks. How might such low-rank structure relate to out-of-distribution generalization? As a simple thought experiment, consider a setting where training data D train is embedded in a three dimensional space (v 1 , v 2 , v 3 ) that exhibits variance only along the first two dimensions (fig. 1  (left)) (with v 3 = 1) 1 . Let us train a neural network f θ on this data using a loss functional L(f, D train ). Since v 3 does not contribute to any reduction in training error, standard empirical risk minimization (ERM) (Vapnik, 1999) training need not differentiate between functions f which handle v 3 in different ways. Now consider an out-of-distribution (OoD) dataset which has the same structure as the original dataset along v 1 and v 2 , but the value of v 3 now has a different value, e.g. 10. In this case, one would incur an error (fig. 1 , right) if one learns a function f where f (•, •, v 3 = 1) ̸ = f (•, •, v 3 = 10). Thus, the low-rank simplicity bias, while beneficial for IID generalization (Huh et al., 2021) can potentially cause issues for OoD generalization. In cases where removing the "additional" features observed at test time improves performance (such as in fig. 1 ) we say that the network incurs nullspace error and call the failure mode as "nullspace occupancy".



This is a special case ofHuh et al. (2021) where the third eigenvalue is 0, instead of being very small 1

