CUTS: NEURAL CAUSAL DISCOVERY FROM IRREGULAR TIME-SERIES DATA

Abstract

Causal discovery from time-series data has been a central task in machine learning. Recently, Granger causality inference is gaining momentum due to its good explainability and high compatibility with emerging deep neural networks. However, most existing methods assume structured input data and degenerate greatly when encountering data with randomly missing entries or non-uniform sampling frequencies, which hampers their applications in real scenarios. To address this issue, here we present CUTS, a neural Granger causal discovery algorithm to jointly impute unobserved data points and build causal graphs, via plugging in two mutually boosting modules in an iterative framework: (i) Latent data prediction stage: designs a Delayed Supervision Graph Neural Network (DSGNN) to hallucinate and register irregular data which might be of high dimension and with complex distribution; (ii) Causal graph fitting stage: builds a causal adjacency matrix with imputed data under sparse penalty. Experiments show that CUTS effectively infers causal graphs from irregular time-series data, with significantly superior performance to existing methods. Our approach constitutes a promising step towards applying causal discovery to real applications with non-ideal observations.

1. INTRODUCTION

Causal interpretation of the observed time-series data can help answer fundamental causal questions and advance scientific discoveries in various disciplines such as medical and financial fields. To enable causal reasoning and counterfactual prediction, researchers in the past decades have been dedicated to discovering causal graphs from observed time-series and made large progress (Gerhardus & Runge, 2020; Tank et al., 2022; Khanna & Tan, 2020; Wu et al., 2022; Pamfil et al., 2020; Löwe et al., 2022; Runge, 2021) . This task is called causal discovery or causal structure learning, which usually formulates causal relationships as Directed Acyclic Graphs (DAGs). Among these causal discovery methods, Granger causality (Granger, 1969; Marinazzo et al., 2008) is attracting wide attentions and demonstrates advantageous due to its high explainability and compatibility with emerging deep neural networks (Tank et al., 2022; Khanna & Tan, 2020; Nauta et al., 2019) ). In spite of the progress, actually most existing causal discovery methods assume well structured time-series, i.e., completely sampled with an identical dense frequency. However, in real-world scenarios the observed time-series might suffer from random data missing (White et al., 2011) or be with non-uniform periods. The former is usually caused by sensor limitations or transmission loss, while the latter occurs when multiple sensors are of distinct sampling frequencies. Robustness to such data imperfections is urgently demanded, but has not been well explored yet so far. When confronted with unobserved data points, some straightforward solutions fill the points with zero padding, interpolation, or other imputation algorithms, such as Gaussian Process Regression or neural-network-based approaches (Cini et al., 2022; Cao et al., 2018; Luo et al., 2018) . We will show in the experiments section that addressing missing entries via performing such trivial data imputation in a pre-processing manner would lead to hampered causal conclusions. To push causal discovery towards real applications, we attempt to infer reliable causal graphs from irregular time-series data. Fortunately, for data that are assumed to be generated with certain causal structural models (Pamfil et al., 2020; Tank et al., 2022) , a well designed neural network can fill a small proportion of missing entries decently given a plausible causal graph, which would conversely improve the causal discovery, and so forth. Leveraging this benefit, we propose to conduct causal discovery and data completion in a mutually boosting manner under an iterative framework, instead of sequential processing. Specifically, the algorithm alternates between two stages, i.e., (a) Latent data prediction stage that hallucinates missing entries with a delayed supervision graph neural network (DSGNN) and (b) Causal graph fitting stage inferring causal graphs from filled data under sparse constraint utilizing the extended nonlinear Granger Causality scheme. We name our algorithm Causal discovery from irregUlar Time-Series (CUTS), and the main contributions are listed as follows: • We proposed CUTS, a novel framework for causal discovery from irregular time-series data, which to our best knowledge is the first to address the issues of irregular time-series in causal discovery under this paradigm. Theoretically CUTS can recover the correct causal graph with fair assumptions, as proved in Theorem 1. • In the data imputation stage we design a deep neural network DSGNN, which successfully imputes the unobserved entries in irregular time-series data and boosts the subsequent causal discovery stage and latter iterations. • We conduct extensive experiments to show our superior performance to state-of-the-art causal discovery methods combined with widely used data imputation methods, the advantages of mutually-boosting strategies over sequential processing, and the robustness of CUTS (in Appendix Section A.4).

2. RELATED WORKS

Causal Structural Learning / Causal Discovery. Causal Structural Learning (or Causal Discovery) is a fundamental and challenging task in the field of causality and machine learning, which can be categorized into four classes. (i) Constraint-based approaches which build causal graphs by conditional independence tests. Two most widely used algorithms are PC (Spirtes & Glymour, 1991) and Fast Causal Inference (FCI) (Spirtes et al., 2000) which is later extended by Entner & Hoyer (2010) 2012) that tackles the problems of nonseparable weakly connected dynamic systems by reconstructing nonlinear state space. Later, CCM is extended to situation of synchrony (Ye et al., 2015) , confounding (Benkő et al., 2020) or sporadic time series (Brouwer et al., 2021) . (iv) Approaches based on Additive Noise Model that infer causal graph based on additive noise assumption (Shimizu et al., 2006; Hoyer et al., 2008 ). Recently Hoyer et al. (2008) extend ANM to nonlinear models with almost any nonlinearities. (v) Granger causality approach proposed by Granger (1969) which has been widely used to analyze the temporal causal relationships by testing the aid of a time-series on predicting another time-series. Granger causal analysis originally assumes that linear models and the causal structures can be discovered by fitting a Vector Autoregressive (VAR) model. Later, the Granger causality idea was extended to nonlinear situations (Marinazzo et al., 2008) . Thanks to its high compatibility with the emerging deep neural network, Granger causal analysis is gaining momentum and is used in our work for incorporating a neural network imputing irregular data with high complexities. 



to time-series data. Recently, Runge et al. propose PCMCI to combine the above two constraint-based algorithms with linear/nonlinear conditional independence tests (Gerhardus & Runge, 2020; Runge, 2018b) and achieve high scalability on large scale time-series data. (ii) Scorebased learning algorithms based on penalized Neural Ordinary Differential Equations (Bellot et al., 2022) or acyclicity constraint (Pamfil et al., 2020). (iii) Convergent Cross Mapping (CCM) firstly proposed by Sugihara et al. (

Granger Causal Discovery. With the rapid progress and wide applications of deep Neural Networks (NNs), researchers begin to utilize RNN (or other NNs) to infer nonlinear Granger causality. Wu et al. (2022) used individual pair-wise Granger causal tests, while Tank et al. (2022) inferred Granger causality directly from component-wise NNs by enforcing sparse input layers. Building on Tank et al. (2022)'s idea, Khanna & Tan (2020) explored the possibility of inferring Granger causality with Statistical Recurrent Units (SRUs, Oliva et al. (2017)). Later, Löwe et al. (2022) extends the neural Granger causality idea to causal discovery on multiple samples with different causal relation-

