CAUSAL ATTENTION TO EXPLOIT TRANSIENT EMER-GENCE OF CAUSAL EFFECT Anonymous

Abstract

We propose a causal reasoning mechanism called causal attention that can improve performance of machine learning models on a class of causal inference tasks by revealing the generation process behind the observed data. We consider the problem of reconstructing causal networks (e.g., biological neural networks) connecting large numbers of variables (e.g., nerve cells), of which evolution is governed by nonlinear dynamics consisting of weak coupling-drive (i.e., causal effect) and strong self-drive (dominants the evolution). The core difficulty is sparseness of causal effect that emerges (the coupling force is significant) only momentarily and otherwise remains quiescent in the neural activity sequence. Causal attention is designed to guide the model to make inference focusing on the critical regions of time series data where causality may manifest. Specifically, attention coefficients are assigned autonomously by a neural network trained to maximise the Attentionextended Transfer Entropy, which is a novel generalization of the iconic transfer entropy metric. Our results show that, without any prior knowledge of dynamics, causal attention explicitly identifies areas where the strength of coupling-drive is distinctly greater than zero. This innovation substantially improves reconstruction performance for both synthetic and real causal networks using data generated by neuronal models widely used in neuroscience.

1. INTRODUCTION

In this work, our task is to infer causal relationships between observed variables based on time series data and reconstruct the causal network connecting large numbers of these variables. Assume the time series x it record the time evolution of variable i governed by coupled nonlinear dynamics, as represented by a general differential equation ẋit = g(x it ) + B ij f (x it , x jt ), where g and f are self-and coupling functions respectively. The parent variable influences the dynamic evolution of the child variable via the coupling function f . Note that these two functions are hidden and usually unknown for real systems. The asymmetric adjacency matrix B represents the causal, i.e., directional coupling relationship between variables. Hence, the goal is to infer matrix B from observed time series x it , i = 1, 2, . . . , N where N is the number of variables in the system. If B ij = 1, the variable i is a coupling driver (parent variable) of variable j, otherwise it is zero. The key challenge is that the causal effect in neural dynamics (e.g., biological neural systems observed via neuronal activity sequences) is too weak to be detected, rendering powerless classic unsupervised techniques of causal inference across multiple research communities Granger (1969); Schreiber ( 2000 2021). This difficulty manifests in three aspects. First, the dynamics contains self-drive and coupling-drive. The strength of coupling f (x it , •) is usually many orders of magnitude smaller than self-drive g(x it ), and the latter dominates evolution. Second, the behavior of the coupling-drive is chaotic, unlike in linear models Shimizu et al. (2006); Xie et al. (2020) . The resulting unpredictability and variability of system state means that coupling force can be significant momentarily and otherwise almost vanish, as illustrated in Figure 3 (gray lines). This dilutes the information in time series that can be useful for inferring the causal relationship. Third, in the heterogeneous networks common in applications, some variables are hubs coupled with many parent variables, among which it is difficult to distinguish individual causes. When causal effects are weak, we do not observe clearly the principle of Granger Causality, whereby the parent variable can help to explain the future change in its child variable Pfister et al. 1



); Sugihara et al. (2012); Sun et al. (2015); Nauta et al. (2019); Runge et al. (2019); Gerhardus & Runge (2020); Tank et al. (2021); Mastakouri et al. (

