PROVABLY AUDITING ORDINARY LEAST SQUARES IN LOW DIMENSIONS

Abstract

Auditing the stability of a machine learning model to small changes in the training procedure is critical for engendering trust in practical applications. For example, a model should not be overly sensitive to removing a small fraction of its training data. However, algorithmically validating this property seems computationally challenging, even for the simplest of models: Ordinary Least Squares (OLS) linear regression. Concretely, recent work defines the stability of a regression as the minimum number of samples that need to be removed so that rerunning the analysis overturns the conclusion (Broderick et al., 2020), specifically meaning that the sign of a particular coefficient of the OLS regressor changes. But the only known approach for estimating this metric, besides the obvious exponentialtime algorithm, is a greedy heuristic that may produce severe overestimates and therefore cannot certify stability. We show that stability can be efficiently certified in the low-dimensional regime: when the number of covariates is a constant but the number of samples is large, there are polynomial-time algorithms for estimating (a fractional version of) stability, with provable approximation guarantees. Applying our algorithms to the Boston Housing dataset, we exhibit regression analyses where our estimator outperforms the greedy heuristic, and can successfully certify stability even in the regime where a constant fraction of the samples are dropped.

1. INTRODUCTION

A key facet of interpretability of machine learning models is understanding how different subsets of the training data influence the learned model and its predictions. Computing the influences of individual training points has been shown to be a useful tool for enhancing trust in the model (Zhou et al., 2019) , for tracing the origins of model bias (Brunet et al., 2019) , and for identifying mislabelled training data and other model debugging (Koh & Liang, 2017) . Modelling the influence of groups of training points has applications to measuring fairness (Chen et al., 2018) , vulnerability to contamination of multi-source training data (Hayes & Ohrimenko, 2018) , and (most relevant to this paper) identification of unstable predictions (Ilyas et al., 2022) and models (Broderick et al., 2020) . In a high-stakes machine learning application, it would likely be alarming if some data points were so influential that the removal of, say, 1% of the training data dramatically changed the model. An ideal, trustworthy machine learning pipeline therefore should include validation that this does not happen. But the obvious algorithm for checking if a model trained on n data points exhibits this instability would require computing the group influences of n n/100 different subsets of the data, which is computationally infeasible even for fairly small n. Instead, current methods for estimating the stability of a model simply use the first-order approximation of group influence: namely, the sum of individual influences of data points in the group. With this approximation, vulnerability of a model to dropping αn data points is heuristically estimated by dropping the αn most individually influential data points (Broderick et al., 2020; Ilyas et al., 2022) . This heuristic can be thought of as using "local" stability as a proxy for "global" stability, and it has found substantial anecdotal success in diagnosing unstable models. Unfortunately, for correlated groups of data points, the first-order approximation of the group influence is often an underestimate (Koh et al., 2019) , so large local stability does not actually certify that a model is provably stable to

