CONFOUNDER IDENTIFICATION-FREE CAUSAL VISUAL FEATURE LEARNING

Abstract

Confounders in deep learning are in general detrimental to model's generalization where they infiltrate feature representations. Therefore, learning causal features that are free of interference from confounders is important. Most previous causal learning-based approaches employ back-door criterion to mitigate the adverse effect of certain specific confounders, which require the explicit identification of confounders. However, in real scenarios, confounders are typically diverse and difficult to be identified. In this paper, we propose a novel Confounder Identification-free Causal Visual Feature Learning (CICF) method, which obviates the need for identifying confounders. CICF models the interventions among different samples based on the front-door criterion, and then approximates the global-scope intervening effect based on the instance-level intervention from the perspective of optimization. In this way, we aim to find a reliable optimization direction, which eliminates the confounding effects of confounders, to learn causal features. Furthermore, we uncover the relation between CICF and the popular meta-learning strategy MAML (Finn et al., 2017), and provide an interpretation of why MAML works from the theoretical perspective of causal learning for the first time. Thanks to the effective learning of causal features, our CICF enables models to have superior generalization capability. Extensive experiments on domain generalization benchmark datasets demonstrate the effectiveness of our CICF, which achieves the state-of-the-art performance.

1. INTRODUCTION

Deep learning excels at capturing correlations between the inputs and labels in a data-driven manner, which has achieved remarkable successes on various tasks, such as image classification, object detection, and question answering (Liu et al., 2021; He et al., 2016; Redmon et al., 2016; He et al., 2017; Antol et al., 2015) . Even so, in the field of statistics, correlation is in fact not equivalent to causation (Pearl et al., 2016) . For example, when tree branches usually appear together with birds in the training data, deep neural networks (DNNs) are easy to mistake features of tree branches as the features of birds. A close association between two variables does not imply that one of them causes the other. Capturing/modeling correlations instead of causation is at high risk of allowing various confounders to infiltrate into the learned feature representations. When affected by intervening effects of confounders, a network may still make correct predictions when the testing and training data follow the same distribution, but fails when the testing data is out of distribution. This harms the generalization capability of learned feature representations. Thus, learning causal feature, where the interference of confounders is excluded, is important for achieving reliable results. As shown in Fig. 1 , confounders C bring a spurious (non-causal) connection X ← -C -→ Y between samples X and their corresponding labels Y . A classical example to shed light on this is that we can instantiate X, Y, C as the sales volume of ice cream, violent crime and hot weather. Seemingly, an increase in ice cream sales X is correlated with an increase in violent crime Y . However, the hot weather is the common cause of them, which makes an increase in ice cream sales to be a misleading factor of analyzing violent crime. Analogically, in deep learning, once the misleading features/confounders are captured, the introduced biases may be mistakenly fitted by neural networks, thus leading to the detriment of the generalization capability of learned features. In theory, we expect DNNs to model the causation between X and Y . Deviating from such expectation, the interventions of confounders C make the learned model implicitly condition on C. This makes that

