AN INVESTIGATION OF DOMAIN GENERALIZATION WITH RADEMACHER COMPLEXITY

Abstract

The domain generalization (DG) setting challenges a model trained on multiple known data distributions to generalise well on unseen data distributions. Due to its practical importance, many methods have been proposed to address this challenge. However much work in general purpose DG is heuristically motivated, as the DG problem is hard to model formally; and recent evaluations have cast doubt on existing methods' practical efficacy -in particular compared to a well tuned empirical risk minimisation baseline. We present a novel learning-theoretic generalisation bound for DG that bounds unseen domain generalisation performance in terms of the model's empirical risk and Rademacher complexity -providing a sufficient condition for DG. Based on this insight, we empirically analyze the performance of several methods and show that their performance is indeed influenced by model complexity in practice. Algorithmically, our analysis suggests that tuning for domain generalisation should be achieved by simply performing regularised ERM with a leave-one-domain-out cross-validation objective. Empirical results on the DomainBed benchmark corroborate this.

1. INTRODUCTION

Machine learning systems have shown exceptional performance on numerous tasks in computer vision and beyond. However performance drops rapidly when the standard assumption of i.i.d. training and testing data is violated. This domain-shift phenomenon occurs widely in many applications of machine learning (14; 37; 25), and often leads to disappointing results in practical machine learning deployments, since data 'in the wild' is almost inevitably different from training sets. Given the practical significance of this issue, numerous methods have been proposed that aim to improve models' robustness to deployment under train-test domain shift (37), a problem setting known as domain generalisation (DG). These span diverse approaches including specialised neural architectures, data augmentation strategies, and regularisers. Nevertheless, the DG problem setting is difficult to model formally for principled derivation and theoretical analysis of algorithms; since the target domain(s) of interest cannot be observed during training, and cannot be directly approximated by the training domains due to unknown distribution shift. Therefore the many popular approaches (37) are based on poorly understood empirical heuristics-a problem highlighted by (20), who found that no DG methods reliably outperform a well-tune empirical risk minimisation (ERM) baseline. Our first contribution is to present an intuitive learning-theoretic bound for DG performance. Intuitively, while the held-out domain of interest is unobservable during training, we can bound its performance using learning theoretic tools similar to the standard ones used to bound the performance on (unobserved) testing data given (observed) training data. In particular we show that the performance on a held out target domain is bounded by the performance on known source domains, plus two additional model complexity terms, that describe how much a model can possibly have overfitted to the training domains. This provides a sufficient condition for DG and leads to several insights. Firstly, our theory suggests that DG performance is influenced by a trade-off between empirical risk and model complexity that is analogous to the corresponding and widely understood trade-off that explains generalisation in standard i.i.d. learning as an overfitting-underfitting trade-off (17). Based on this, we conjecture that the efficacy of the plethora of available strategies (37) -from data-augmentation to specialised optimisers -is largely influenced by explicitly or implicitly choosing 1

