LEARNING A NON-REDUNDANT COLLECTION OF CLASSIFIERS

Abstract

Supervised learning models constructed under the i.i.d. assumption have often been shown to exploit spurious or brittle predictive signals instead of more robust ones present in the training data. Inspired by Quality-Diversity algorithms, in this work we train a collection of classifiers to learn distinct solutions to a classification problem, with the goal of learning to exploit a variety of predictive signals present in the training data. We propose an information-theoretic measure of model diversity based on minimizing an estimate of conditional total correlation of final layer representations across models given the label. We consider datasets with synthetically injected spurious correlations and evaluate our framework's ability to rapidly adapt to a change in distribution that destroys the spurious correlation. We compare our method to a variety of baselines under this evaluation protocol, showing that it is competitive with other approaches while being more successful at isolating distinct signals. We also show that our model is competitive with Invariant Risk Minimization (IRM) under this evaluation protocol without requiring access to the environment information required by IRM to discriminate between spurious and robust signals.

1. INTRODUCTION

The Empirical Risk Minimization (ERM) principle (Vapnik, 2013) , which underpins many machine learning models, is built on the assumption that training and testing samples are drawn i.i.d. from some hypothetical distribution. It has been demonstrated that certain violations of this assumption (in conjunction with potential misalignments between the formulation of the learning objectives and the underlying task of interest) lead to models that exploit spurious or brittle correlations in the training data. Examples include learning to exploit image backgrounds instead of objects in the foreground due to data biases (such as using grassy backgrounds to predict the presence of cows (Beery et al., 2018) ), using textural as opposed to shape information to classify objects (Geirhos et al., 2018) , and using signals not robust to small adversarial perturbations (Ilyas et al., 2019) . Implicit in work that attempts to address these phenomena is the assumption that more robust predictive signals are indeed present in the training data, even if for various reasons current models do not have the tendency to leverage them. In this work, drawing inspiration from Quality-Diversity algorithms (Pugh et al., 2016) -which seek to construct a collection of high-performing, diverse solutions to a task -we aim to learn a collection of models, each incentivized to find a distinct, high-performing solution to a given supervised learning problem from a fixed training set. Informally, our motivation is that a sufficiently large collection of such distinct models would exploit robust signals present in the training data in addition to the brittle signals that current models tend to exploit. Thus, given the representations computed by such a collection, it may be possible to rapidly adapt to test-time shifts in distribution that destroy the predictive power of brittle features. Addressing this problem hinges on defining and enforcing an appropriate measure of model diversity. To this end, we make the following contributions: • We propose and motivate a novel measure of model diversity based on conditional total correlation (across models) of final layer representations given the label. Informally, this

