INTERPRETABILITY WITH FULL COMPLEXITY BY CONSTRAINING FEATURE INFORMATION

Abstract

Interpretability is a pressing issue for machine learning. Common approaches to interpretable machine learning constrain interactions between features of the input, sacrificing model complexity in order to render more comprehensible the effects of those features on the model's output. We approach interpretability from a new angle: constrain the information about the features without restricting the complexity of the model. We use the Distributed Information Bottleneck to optimally compress each feature so as to maximally preserve information about the output. The learned information allocation, by feature and by feature value, provides rich opportunities for interpretation, particularly in problems with many features and complex feature interactions. The central object of analysis is not a single trained model, but rather a spectrum of models serving as approximations that leverage variable amounts of information about the inputs. Information is allocated to features by their relevance to the output, thereby solving the problem of feature selection by constructing a learned continuum of feature inclusion-to-exclusion. The optimal compression of each feature-at every stage of approximation-allows fine-grained inspection of the distinctions among feature values that are most impactful for prediction. We develop a framework for extracting insight from the spectrum of approximate models and demonstrate its utility on a range of tabular datasets.

1. INTRODUCTION

Interpretability is a pressing issue for machine learning (ML) (Doshi-Velez & Kim, 2017; Fan et al., 2021; Rudin et al., 2022) . As models continue to grow in complexity, machine learning is increasingly integrated into fields where flawed decisions can have serious ramifications (Caruana et al., 2015; Rudin et al., 2018; Rudin, 2019) . Interpretability is not a binary property that machine learning methods have or do not: rather, it is the degree to which a learning system can be probed and comprehended (Doshi-Velez & Kim, 2017) . Importantly, interpretability can be attained along many distinct routes (Lipton, 2018) . Various constraints on the learning system can be incorporated, such as to force feature effects to combine in a simple (e.g., linear) manner, restricting the space of possible models in exchange for a degree of comprehensibility (Molnar et al., 2020; Molnar, 2022) . In contrast to explainable AI that happens post-hoc after a black-box model is trained, interpretable methods engineer the constraints into the model from the outset (Rudin et al., 2022) . In this work, we introduce a novel route to interpretability that places no restrictions on model complexity, and instead tracks how much and what information is most important for prediction. By identifying optimal information from features of the input, the method grants a measure of salience to each feature, produces a spectrum of models utilizing different amounts of optimal information about the input, and provides a learned compression scheme for each feature that highlights from where the information is coming in fine-grained detail. The central object of interpretation is not a single (c) Our method places an information bottleneck penalty on each feature, with no constraints on complexity before or after the bottlenecks. The features X i are compressed to optimal representations U i , which communicate the most relevant information to the rest of the model. Compressing the features in this way allows feature inclusion/exclusion to exist on a continuum. The learned compression scheme for each feature grants insight into feature effects on the model output. optimized model, but rather a family of models that reveals how predictive information is distributed across features at every level of fidelity. The information constraint that we incorporate is a variant of the Information Bottleneck (IB) (Tishby et al., 2000; Asoodeh & Calmon, 2020) . The IB extracts relevant information in a relationship, though it generally lacks interpretability because the extracted information is free to be any black-box function of the input. Here, we use the Distributed IB (Estella Aguerri & Zaidi, 2018; Murphy & Bassett, 2022), which siloes the information from each feature and applies a sum-rate constraint at an appropriate level in the model's processing: after every feature is processed and before any interaction effects between the features can be included. Before and after the bottlenecks, arbitrarily complex processing can occur, thereby allowing the Distributed IB to be used in scenarios with many features and complex feature interactions. In this paper, we develop a framework for extracting insight from optimization with the Distributed IB. Through experiments on a variety of tabular datasets and comparisons to other approaches in interpretable ML and feature selection, we demonstrate the following strengths of the approach: 1. Capacity to find the features and feature values that are most important for prediction. The Distributed IB optimally allocates information across the features for every degree of approximation. Each feature is compressed during optimization, and the compression schemes reveal the distinctions among feature values that are most important for prediction. 2. Full complexity of feature processing and interactions. Compatible with arbitrary neural network architectures before and after the bottlenecks, the method gains interpretability without sacrificing predictive accuracy. 3. Intuitive to implement, minimal additional overhead. The method is a probabilistic encoder for each feature, employing the same loss penalty as β-VAE (Higgins et al., 2017) . A single model is trained once while annealing the magnitude of the information bottleneck.

2. RELATED WORK

We propose a method that analyzes the information content of the features of an input with respect to an output. The method's generality bears relevance to several broad areas of research. Feature selection and variable importance. Feature selection methods isolate the most important features in a relationship (Fig. 1a ), both to increase interpretability and to reduce computational burden (Chandrashekar & Sahin, 2014; Cai et al., 2018) . When higher order feature interactions are present, the large combinatorial search space presents an NP-hard optimization problem (Chandrashekar & Sahin, 2014) . Modern feature selection methods navigate the search space with feature inclusion weights, or masks, that can be trained as part of the full stack with the machine learning model (Fong & Vedaldi, 2017; Balın et al., 2019; Lemhadri et al., 2021; Kolek et al., 2022) .



Figure 1: Restricting information flow from features grants interpretability with full complexity. (a) Feature selection methods optimize a binary mask on features to uncover the most informative ones. (b) Interpretable ML methods sacrifice model complexity in exchange for comprehensible feature interactions (shown is a generalized additive model (GAM) with a neural network to transform each feature).(c) Our method places an information bottleneck penalty on each feature, with no constraints on complexity before or after the bottlenecks. The features X i are compressed to optimal representations U i , which communicate the most relevant information to the rest of the model. Compressing the features in this way allows feature inclusion/exclusion to exist on a continuum. The learned compression scheme for each feature grants insight into feature effects on the model output.

