DEBIASING CONCEPT-BASED EXPLANATIONS WITH CAUSAL ANALYSIS

Abstract

Concept-based explanation approach is a popular model interpertability tool because it expresses the reasons for a model's predictions in terms of concepts that are meaningful for the domain experts. In this work, we study the problem of the concepts being correlated with confounding information in the features. We propose a new causal prior graph for modeling the impacts of unobserved variables and a method to remove the impact of confounding information and noise using a two-stage regression technique borrowed from the instrumental variable literature. We also model the completeness of the concepts set and show that our debiasing method works when the concepts are not complete. Our synthetic and real-world experiments demonstrate the success of our method in removing biases and improving the ranking of the concepts in terms of their contribution to the explanation of the predictions.

1. INTRODUCTION

Explaining the predictions of neural networks through higher level concepts (Kim et al., 2018; Ghorbani et al., 2019; Brocki & Chung, 2019; Hamidi-Haines et al., 2018) enables model interpretation on data with complex manifold structure such as images. It also allows the use of domain knowledge during the explanation process. The concept-based explanation has been used for medical imaging (Cai et al., 2019) , breast cancer histopathology (Graziani et al., 2018 ), cardiac MRIs (Clough et al., 2019 ), and meteorology (Sprague et al., 2019) . When the set of concepts is carefully selected, we can estimate a model in which the discriminative information flow from the feature vectors x through the concept vectors c and reach the labels y. To this end, we train two models for prediction of the concept vectors from the features denoted by c(x) and the labels from the predicted concept vector y( c). This estimation process ensures that for each prediction we have the reasons for the prediction stated in terms of the predicted concept vector c(x). However, in reality, noise and confounding information (due to e.g. non-discriminative context) can influence both of the feature and concept vectors, resulting in confounded correlations between them. Figure 1 provides an evidence for noise and confounding in the CUB-200-2011 dataset (Wah et al., 2011) . We train two predictors for the concepts vectors based on features c(x) and labels c(y) and compare the Spearman correlation coefficients between their predictions and the true ordinal value of the concepts. Having concepts for which c(x) is more accurate than c(y) could be due to noise, or due to hidden variables independent of the labels that spuriously correlated c and x, leading to undesirable explanations that include confounding or noise. In this work, using the Concept Bottleneck Models (CBM) (Koh et al., 2020; Losch et al., 2019) we demonstrate a method for removing the counfounding and noise (debiasing) the explanation with concept vectors and extend the results to Testing with Concept Activation Vectors (TCAV) (Kim et al., 2018) technique. We provide a new causal prior graph to account for the confounding information and concept completeness (Yeh et al., 2020) . We describe the challenges in estimation of our causal prior graph and propose a new learning procedure. Our estimation technique defines and predicts debiased concepts such that the predictive information of the features maximally flow through them. We show that using a two-stage regression technique from the instrumental variables literature, we can successfully remove the impact of the confounding and noise from the predicted concept vectors. Our proposed procedure has three steps: (1) debias the concept vectors using the labels, (2) predict 1

