RHINO: DEEP CAUSAL TEMPORAL RELATIONSHIP LEARNING WITH HISTORY-DEPENDENT NOISE

Abstract

Discovering causal relationships between different variables from time series data has been a long-standing challenge for many domains such as climate science, finance, and healthcare. Given the complexity of real-world relationships and the nature of observations in discrete time, causal discovery methods need to consider non-linear relations between variables, instantaneous effects and historydependent noise (the change of noise distribution due to past actions). However, previous works do not offer a solution addressing all these problems together. In this paper, we propose a novel causal relationship learning framework for timeseries data, called Rhino, which combines vector auto-regression, deep learning and variational inference to model non-linear relationships with instantaneous effects while allowing the noise distribution to be modulated by historical observations. Theoretically, we prove the structural identifiability of Rhino. Our empirical results from extensive synthetic experiments and two real-world benchmarks demonstrate better discovery performance compared to relevant baselines, with ablation studies revealing its robustness under model misspecification.

1. INTRODUCTION

Time series data is a collection of data points recorded at different timestamps describing a pattern of chronological change. Identifying the causal relations between different variables and their interactions through time (Spirtes et al., 2000; Berzuini et al., 2012; Guo et al., 2020; Peters et al., 2017) is essential for many applications e.g. climate science, health care, etc. Randomized control trials are the gold standard for discovering such relationships, but may be unavailable due to cost and ethical constraints. Therefore, causal discovery with just observational data is important and fundamental to many real-world applications (Löwe et al., 2022; Bussmann et al., 2021; Moraffah et al., 2021; Wu et al., 2020; Runge, 2018; Tank et al., 2018; Hyvärinen et al., 2010; Pamfil et al., 2020) . The task of temporal causal discovery can be challenging for several reasons: (1) relations between variables can be non-linear in the real world; (2) with a slow sampling interval, everything happens in between will be aggregated into the same timestamp, i.e. instantaneous effect; (3) the noise may be non-stationary (its distribution depends on the past observations), i.e. history-dependent noise. For example, in stock markets, the announcements of some decisions from a leading company after the market closes may have complex effects (i.e. non-linearity) on its stock price immediately after the market opening (i.e. slow sampling interval and instantaneous effect) and its price volatility may also be changed (i.e. history-dependent noise). Similarly, in education, students that recently earned good marks on algebra tests should also score well on an upcoming algebra exam with little variation (i.e. history-dependent noise). To the best of our knowledge, existing frameworks' performances suffer in many real-world scenarios as they cannot address these aspects in a satisfactory way. Especially, history-dependent noise has been rarely considered in past. A large category of the preceding works, called Granger causality (Granger, 1969) , is based on the fact that cause-effect relationships can never go against time. Despite many recent advances (Wu et al., 2020; Shojaie & Michailidis, 2010; Siggiridou & 

annex

 Kugiumtzis, 2015; Amornbunchornvej et al., 2019; Löwe et al., 2022; Tank et al., 2018; Bussmann et al., 2021; Dang et al., 2018; Xu et al., 2019) , they all rely on the absence of instantaneous effects with a fixed noise distribution. Constraint-based methods have also been extended for time series causal discovery (Runge, 2018; 2020) , which is commonly applied by folding the time-series. This introduced new assumptions and translated the aforementioned requirements to challenges in conditional independence testing (Shah & Peters, 2020) .Additionally, they require a stronger faithfulness assumption and can only identify the causal graph up to a Markov equivalence class without detailed functional relationships.An alternative line of research leverages the development of causal discovery with functional causal models (Hyvärinen et al., 2010; Pamfil et al., 2020; Peters et al., 2013) . They can model both instantaneous and lagged effects as long as they have theoretically guaranteed structural identifiability. Unfortunately, they do not consider history-dependent noise. One central challenge of modelling this dependency is that noise depending on the lagged parents may break the model structural identifiability. For static data, Khemakhem et al. (2021) proves the structural identifiability only when this dependency is restricted to a simple functional form. Thus, the key research question is whether the identifiability can be preserved with complex historical dependencies in the temporal setting.Motivated by these requirements, we propose a novel temporal discovery framework called Rhino (deep causal temporal relationship learning with history dependent noise), which can model nonlinear lagged and instantaneous effects with flexible history-dependent noise. Our contributions are:• A novel causal discovery framework called Rhino, Revision(Q2)-Reviewer hiTaconsisting of a novel functional form of its SEMs and variational training framework, where the proposed form of its SEM combines vector auto-regression and deep learning to model non-linear lagged and instantaneous effects with history-dependent noise. • We prove that Rhino SEMs with the proposed form are structurally identifiable. To achieve this, we provide general conditions for structural identifiability with history-dependent noise, of which the form of Rhino SEMs is a special case. Furthermore, we clarify relations to several previous works. • We conduct extensive synthetic experiments with ablation studies to demonstrate the advantages of Rhino and its robustness across different settings. Additionally, we compare its performance to a wide range of baselines in two real-world discovery benchmarks.

2. BACKGROUND

In this section, we briefly introduce necessary prerequisite knowledge. In particular, we focus on structural equation models, Granger causality (Granger, 1969) and vector auto-regression. For review of more recent related work, please refer to Section 5.Structural Equation Models (SEMs) Consider X ∈ R D with D variables, SEM describes the causal relationships between them given a causal graph G:where Pa i G are the parents of node i and ϵ i are mutually independent noise variables. Under the context of multivariate time series, X t = X i t i∈V where V is a set of nodes with size D, the corresponding SEM given a temporal causal graph G iswhere Pa i G (< t) contains the parent values specified by G in previous time (lagged parents); Pa i G (t) are the parents at the current time t (instantaneous parents). The above SEM induces a joint distribution over the stationary time series {X t } T t=0 (see Assumption 1 in Appendix B for the definition). However, functional causal models with the above general form cannot be directly used for causal discovery due to the structural unidentifiability (Lemma 1, Zhang et al. (2015) One way to solve this is sacrificing the flexibility by restricting the functional class. For example, additive noise models (ANM), (Hoyer et al., 2008) X i = f i (Pa G (X i )) + ϵ i , (3) which have recently been used for causal reasoning with non-temporal data (Geffner et al., 2022) .

