DEBIASING CONCEPT-BASED EXPLANATIONS WITH CAUSAL ANALYSIS

Abstract

Concept-based explanation approach is a popular model interpertability tool because it expresses the reasons for a model's predictions in terms of concepts that are meaningful for the domain experts. In this work, we study the problem of the concepts being correlated with confounding information in the features. We propose a new causal prior graph for modeling the impacts of unobserved variables and a method to remove the impact of confounding information and noise using a two-stage regression technique borrowed from the instrumental variable literature. We also model the completeness of the concepts set and show that our debiasing method works when the concepts are not complete. Our synthetic and real-world experiments demonstrate the success of our method in removing biases and improving the ranking of the concepts in terms of their contribution to the explanation of the predictions.

1. INTRODUCTION

Explaining the predictions of neural networks through higher level concepts (Kim et al., 2018; Ghorbani et al., 2019; Brocki & Chung, 2019; Hamidi-Haines et al., 2018) enables model interpretation on data with complex manifold structure such as images. It also allows the use of domain knowledge during the explanation process. The concept-based explanation has been used for medical imaging (Cai et al., 2019) , breast cancer histopathology (Graziani et al., 2018) , cardiac MRIs (Clough et al., 2019) , and meteorology (Sprague et al., 2019) . When the set of concepts is carefully selected, we can estimate a model in which the discriminative information flow from the feature vectors x through the concept vectors c and reach the labels y. To this end, we train two models for prediction of the concept vectors from the features denoted by c(x) and the labels from the predicted concept vector y( c). This estimation process ensures that for each prediction we have the reasons for the prediction stated in terms of the predicted concept vector c(x). However, in reality, noise and confounding information (due to e.g. non-discriminative context) can influence both of the feature and concept vectors, resulting in confounded correlations between them. Figure 1 provides an evidence for noise and confounding in the CUB-200-2011 dataset (Wah et al., 2011) . We train two predictors for the concepts vectors based on features c(x) and labels c(y) and compare the Spearman correlation coefficients between their predictions and the true ordinal value of the concepts. Having concepts for which c(x) is more accurate than c(y) could be due to noise, or due to hidden variables independent of the labels that spuriously correlated c and x, leading to undesirable explanations that include confounding or noise. In this work, using the Concept Bottleneck Models (CBM) (Koh et al., 2020; Losch et al., 2019) we demonstrate a method for removing the counfounding and noise (debiasing) the explanation with concept vectors and extend the results to Testing with Concept Activation Vectors (TCAV) (Kim et al., 2018) technique. We provide a new causal prior graph to account for the confounding information and concept completeness (Yeh et al., 2020) . We describe the challenges in estimation of our causal prior graph and propose a new learning procedure. Our estimation technique defines and predicts debiased concepts such that the predictive information of the features maximally flow through them. We show that using a two-stage regression technique from the instrumental variables literature, we can successfully remove the impact of the confounding and noise from the predicted concept vectors. Our proposed procedure has three steps: (1) debias the concept vectors using the labels, (2) predict et al., 2011) . 112 concepts can be predicted more accurately with the features rather than the labels. Concept ids in the x-axis are sorted in the increasing ρ( c(y), c) order. We provide the detailed steps to obtain the figure in Section 4.2. the debiased concept vectors using the features, and (3) use the predict concept vectors in the second step to predict the labels. Optionally, we can also find the residual predictive information in the features that are not in the concepts. We validate the proposed method using a synthetic dataset and the CUB-200-2011 dataset. On the synthetic data, we have access to the ground truth and show that in the presence of confounding and noise, our debiasing procedure improves the accuracy of recovering the true concepts. On the CUB-200-2011 dataset, we use the RemOve And Retrain (ROAR) framework (Hooker et al., 2019) to show that our debiasing procedure ranks the concepts in the order of their explanation more accurately than the regular concept bottleneck models. We also show that we improve the accuracy of CBNs in the prediction of labels using our debiasing technique. Finally, using several examples, we also qualitatively show when the debasing helps improve the quality of concept-based explanations.

2. METHODOLOGY

Notations. We follow the notation of Goodfellow et al. (2016) and denote random vectors by bold font letters x and their values by bold symbols x. The notation p(x) is a probability measure on x and dp(x = x) is the infinitesimal probability mass at x = x. We use y(x) to denote the the prediction of y given x. In the graphical models, we show the observed and unobserved variables using filled and hollow circles, respectively. Problem Statement. We assume that during the training phase, we are given triplets (x i , c i , y i ) for i = 1, . . . , n data points. In addition to the regular features x and labels y, we are given a human interpretable concepts vector c for each data point. Each element of the concept vector measures the degree of existence of the corresponding concept in the features. Thus, the concept vector typically have binary or ordinal values. Our goal is to learn to predict y as a function of x and use c for explaining the predictions. Performing in two steps, we first learn a function c(x) and then learn another function y( c(x)). The prediction c(x) is the explanation for our prediction y. During the test time, only the features are given and the prediction+explanation algorithm predicts both y and c. In this paper, we aim to remove the bias and noise components from the estimated concept vector c such that it explains the reasons for prediction of the labels more accurately. To this end, we first need to propose a new causal prior graph that includes the potential unobserved confounders.

2.1. A NEW CAUSAL PRIOR GRAPH FOR CBMS

Figure 2a shows the ideal situation in explanation via high-level concepts. The generative model corresponding to Figure 2a states that for generating each feature x i we first randomly draw the label y i . Given the label, we draw the concepts c i . Given the concepts, we generate the features. The



Figure 1: Spearman correlation coefficients (ρ) of the predictors of the concepts given features c(x) and labels c(y) for the 312 concepts in the test partition of the CUB-200-2011 dataset (Wahet al., 2011). 112 concepts can be predicted more accurately with the features rather than the labels. Concept ids in the x-axis are sorted in the increasing ρ( c(y), c) order. We provide the detailed steps to obtain the figure in Section 4.2.

