INTERPRETABLE MODELS FOR GRANGER CAUSALITY USING SELF-EXPLAINING NEURAL NETWORKS

Abstract

Exploratory analysis of time series data can yield a better understanding of complex dynamical systems. Granger causality is a practical framework for analysing interactions in sequential data, applied in a wide range of domains. In this paper, we propose a novel framework for inferring multivariate Granger causality under nonlinear dynamics based on an extension of self-explaining neural networks. This framework is more interpretable than other neural-network-based techniques for inferring Granger causality, since in addition to relational inference, it also allows detecting signs of Granger-causal effects and inspecting their variability over time. In comprehensive experiments on simulated data, we show that our framework performs on par with several powerful baseline methods at inferring Granger causality and that it achieves better performance at inferring interaction signs. The results suggest that our framework is a viable and more interpretable alternative to sparse-input neural networks for inferring Granger causality.

1. INTRODUCTION

Granger causality (GC) (Granger, 1969 ) is a popular practical approach for the analysis of multivariate time series and has become instrumental in exploratory analysis (McCracken, 2016) in various disciplines, such as neuroscience (Roebroeck et al., 2005) , economics (Appiah, 2018) , and climatology (Charakopoulos et al., 2018) . Recently, the focus of the methodological research has been on inferring GC under nonlinear dynamics (Tank et al., 2018; Nauta et al., 2019; Wu et al., 2020; Khanna & Tan, 2020; Löwe et al., 2020) , causal structures varying across replicates (Löwe et al., 2020) , and unobserved confounding (Nauta et al., 2019; Löwe et al., 2020) . To the best of our knowledge, the latest powerful techniques for inferring GC do not target the effect sign detection (see Section 2.1 for a formal definition) or exploration of effect variability with time and, thus, have limited interpretability. This drawback defeats the purpose of GC analysis as an exploratory statistical tool. In some nonlinear interactions, one variable may have an exclusively positive or negative effect on another if it consistently drives the other variable up or down, respectively. Negative and positive causal relationships are common in many real-world systems, for example, gene regulatory networks feature inhibitory effects (Inoue et al., 2011) or in metabolomics, certain compounds may inhibit or promote synthesis of other metabolites (Rinschen et al., 2019) . Differentiating between the two types of interactions would allow inferring and understanding such inhibition and promotion relationships in real-world dynamical systems and would facilitate a more comprehensive and insightful exploratory analysis. Therefore, we see a need for a framework capable of inferring nonlinear GC which is more amenable to interpretation than previously proposed methods (Tank et al., 2018; Nauta et al., 2019; Khanna & Tan, 2020) . To this end, we introduce a novel method for detecting nonlinear multivariate Granger causality that is interpretable, in the sense that it allows detecting effect signs and exploring influences among variables throughout time. The main contributions of the paper are as follows: 1. We extend self-explaining neural network models (Alvarez-Melis & Jaakkola, 2018) to time series analysis. The resulting autoregressive model, named generalised vector autore-gression (GVAR), is interpretable and allows exploring GC relations between variables, signs of Granger-causal effects, and their variability through time. 2. We propose a framework for inferring nonlinear multivariate GC that relies on a GVAR model with sparsity-inducing and time-smoothing penalties. Assume that causal relationships between variables are given by the following structural equation model: x i t := g i x 1 1:(t-1) , ..., x j 1:(t-1) , ..., x p 1:(t-1) + ε i t , for 1 ≤ i ≤ p, where x j 1:(t-1) is a shorthand notation for x j 1 , x j 2 , ..., x j t-1 ; ε i t are additive innovation terms; and g i (•) are potentially nonlinear functions, specifying how the future values of variable x i depend on the past values of x. We then say that variable x j does not Granger-cause variable x i , denoted as x j -→ x i , if and only if g i (•) is constant in x j 1:(t-1) . Depending on the form of the functional relationship g i (•), we can also differentiate between positive and negative Granger-causal effects. In this paper, we define the effect sign as follows: if g i (•) is increasing in all x j 1:(t-1) , then we say that variable x j has a positive effect on x i , if g i (•) is decreasing in x j 1:(t-1) , then x j has a negative effect on x i . Note that an effect may be neither positive nor negative. For example, x j can 'contribute' both positively and negatively to the future of x i at different delays, or, for instance, the effect of x j on x i could depend on another variable. Granger-causal relationships can be summarised by a directed graph G = (V, E), referred to as summary graph (Peters et al., 2017) , where V = {1, ..., p} is a set of vertices corresponding to variables, and E = (i, j) : x i -→ x j is a set of edges corresponding to Granger-causal relationships. Let A ∈ {0, 1} p×p denote the adjacency matrix of G. The inference problem is then to estimate A from observations {x t } T t=1 , where T is the length of the time series observed. In practice, we usually fit a time series model that explicitly or implicitly infers dependencies between variables. Consequently, a statistical test for GC is performed. A conventional approach (Lütkepohl, 2007) used to test for linear Granger causality is the linear vector autoregression (VAR) (see Appendix A).

2.2.1. TECHNIQUES FOR INFERRING NONLINEAR GRANGER CAUSALITY

Relational inference in time series has been studied extensively in statistics and machine learning. Early techniques for inferring undirected relationships include time-varying dynamic Bayesian networks (Song et al., 2009) and time-smoothed, regularised logistic regression with time-varying coefficients (Kolar et al., 2010) . Recent approaches to inferring Granger-causal relationships leverage the expressive power of neural networks (Montalto et al., 2015; Wang et al., 2018; Tank et al., 2018; Nauta et al., 2019; Khanna & Tan, 2020; Wu et al., 2020; Löwe et al., 2020) and are often based on regularised autoregressive models, reminiscent of the Lasso Granger method (Arnold et al., 2007) .



Spurious associations are mitigated by finding relationships that are stable across original and time-reversed (Winkler et al., 2016) time series data. 3. We comprehensively compare the proposed framework and the powerful baseline methods of Tank et al. (2018), Nauta et al. (2019), and Khanna & Tan (2020) on a range of synthetic time series datasets with known Granger-causal relationships. We evaluate the ability of the methods to infer the ground truth GC structure and effect signs.

