TOWARDS EXPLAINING DISTRIBUTION SHIFT

Abstract

A distribution shift can have fundamental consequences such as signaling a change in the operating environment or significantly reducing the accuracy of downstream models. Thus, understanding distribution shifts is critical for examining and hopefully mitigating the effect of such a shift. Most prior work has focused on merely detecting if a shift has occurred and assumes any detected shift can be understood and handled appropriately by a human operator. We hope to aid in these manual mitigation tasks by explaining the distribution shift using interpretable transportation maps from the original distribution to the shifted one. We derive our interpretable mappings from a relaxation of the optimal transport problem, where the candidate mappings are restricted to a set of interpretable mappings. We then use quintessential examples of distribution shift in simulated and real-world cases to showcase how our explanatory mappings provide a better balance between detail and interpretability than the de facto standard mean shift explanation by both visual inspection and our PercentExplained metric.

1. INTRODUCTION

Most real-world environments are constantly changing, and in many situations, understanding how a specific operating environment has changed is crucial to making decisions respective to such a change. Such a change might be a new data distribution seen in deployment which causes a machine learning model to begin to fail. Another example is a decrease in monthly sales data which could be due to a temporary supply chain issue in distributing a product or could mark a shift in consumer preferences. When these changes are encountered, the burden is often placed on a human operator to investigate the shift and determine the appropriate reaction, if any, that needs to be taken. In this work, our goal is to aid these operators in providing an explanation of such a change. This ubiquitous phenomenon of having a difference between related distributions is known as distribution shift. Much prior work focuses on detecting distribution shifts; however, there is little prior work that looks into understanding a detected distribution shift. As it is usually solely up to an operator investigating a flagged distribution shift to decide what to do next, understanding the shift is critical for the operator to more efficiently mitigate any harmful effects of the distribution shift. Without a defined approach to this task, the current de facto standard in analyzing a shift is looking at how the mean of the original, source, distribution shifted to the new, target, distribution. However, this simple explanation can miss crucial shift information due to being a coarse summary (e.g., Fig. 2 ). Further, in high-dimensional regimes, a shift in means could be uninterpretable due to its high dimensionality. Instead, if after flagging that a shift has occurred, we could automatically provide more detailed information about the shift but still remain at a level that is interpretable, we could reduce the manual load on the operator to understand the shift, and, ultimately, to take action if necessary. Therefore, we propose a novel framework for explaining distribution shifts, such as showing how features have changed or how groups within the distributions have shifted. Since a distribution shift can be seen as a movement from a source distribution x ∼ P src to a target distribution y ∼ P tgt , we define a distribution shift explanation as a transport map T (x) which maps a point in our source distribution onto a point in our target distribution. For example, under this framework, the typical distribution shift explanation via mean shift can be written as T (x) = x + (µ y -µ x ). Intuitively, these transport maps can be thought of as a functional approximation of how the source distribution could have moved in a distribution space to become our target distribution. However, without making assumptions on the type of shift, there exist many possible mappings that explain the shift (see subsection A.2 for examples). Thus, we leverage prior optimal transport work to define an

