GRACE-C: Generalized Rate Agnostic Causal Estimation via Constraints

Abstract

Graphical structures estimated by causal learning algorithms from time series data can provide misleading causal information if the causal timescale of the generating process fails to match the measurement timescale of the data. Existing algorithms provide limited resources to respond to this challenge, and so researchers must either use models that they know are likely misleading, or else forego causal learning entirely. Existing methods face up-to-four distinct shortfalls, as they might a) require that the difference between causal and measurement timescales is known; b) only handle very small number of random variables when the timescale difference is unknown; c) only apply to pairs of variables; or d) be unable to find a solution given statistical noise in the data. This paper addresses these challenges. Our approach combines constraint programming with both theoretical insights into the problem structure and prior information about admissible causal interactions to achieve multiple orders of magnitude in speed-up. The resulting system maintains theoretical guarantees while scaling to significantly larger sets of random variables (> 100) without knowledge of timescale differences. This method is also robust to edge misidentification and can use parametric connection strengths, while optionally finding the optimal solution among many possible ones.

1. Introduction

Dynamic causal models play a pivotal role in modeling real-world systems in diverse domains, including economics, education, climatology, and neuroscience. Given a sufficiently accurate causal graph over random variables, one can predict, explain, and potentially control some system; more generally, one can understand it. In practice, however, specifying or learning an accurate causal model of a dynamical system can be challenging for both statistical and theoretical reasons. One particular challenge arises when data are not measured at the speed of the underlying causal connections. For example, fMRI scanning of the brain indirectly measures dynamical neural activity by measuring the resulting bloodflow and oxygen level changes in different brain regions. However, fMRI measures occur (at most) every second while the brain's actual dynamics are known to proceed at a faster rate (Oram & Perrett, 1992 ), though we do not know how much faster. In general, when the measurement timescale is significantly slower than the causal timescale (as with fMRI), learning can output vastly incorrect causal information. For instance, if we only measure every other timestep in Figure 1 , then the true graph (top left) would differ from the data graph (top right). We might thus conclude that variable 2 directly influences variable 5, when variable 3 is the actual direct cause. These errors can lead to inefficient or costly attempts at control. More generally, understanding of a system depends on the timescale of the causal relations, not the timescale of measurements. In this paper, we consider the problem of learning the causal structure at the causal timescale from data collected at an unknown measurement timescale. This challenge has received significant attention in recent years (Plis et al., 2015b; Gong et al., 2015; Hyttinen et al., 2017; Plis et al., 2015a) , but all current algorithms have significant limitations (see Section 2) that make them unusable for many real-world scientific challenges. Current algorithms show the theoretical possibility of causal learning from undersampled data, but their practical applicability is limited to small graph sizes, perhaps only a pair of variables (Gong et al., 2015) . In contrast, we present a provably correct and complete solution that can operate on 100-node graphs, and hence is potentially applicable in biological and other domains, for learning causal timescale structure from undersampled data. 2 Related Work And Notation A directed dynamic causal model is a generalization of "regular" causal models (Pearl et al., 2000; Spirtes et al., 1993) : graph G includes n distinct nodes for random variables V = {V 1 , V 2 , ..., V n } at both the current timestep t (V t ), and also previous timesteps (V t-k ) for which there is a direct cause of some V t i . We assume that the "true" underlying causal structure is first-order Markov: the independence V t ⊥ ⊥ V t-k | V t-foot_0 holds for all k > 1. 1 G is thus over 2V, and the only permissible edges are V t-1 i → V t j , where possibly i = j. The quantitative component of the dynamic causal model is fully specified by parameters for P(V t |V t-1 ). We assume that these conditional probabilities are stationary over time, but the marginal P(V t ) need not be stationary. We denote the timepoints of the underlying causal structure as {t 0 , t 1 , t 2 , ..., t k , ...}. The data are said to be undersampled at rate u if measurements occur at {t 0 , t u , t 2u , ..., t ku , ...}. We denote undersample rate with superscripts: the true causal graph (i.e., undersampled at rate 1) is G 1 ; the same graph undersampled at rate u is G u . To determine the implied G for u > 1, the graph is first "unrolled" by adding instantiations of G 1 at previous timesteps, where V t-2 bear the same causal relationships to V t-1 that V t-1 bear to V t , and so forth. In this unrolled (time-indexed by t) graph, all V at intermediate timesteps are unmeasured; this lack of measurement is equivalent to marginalizing out (the variables in) those timesteps to yield G u . The problem of moving from G 1 to G u was structurally addressed by Danks & Plis (2013), and parametrically addressed (for 2-variable systems) by Gong et al. (2015) . Various representations have been developed for graphs with latent confounders, including partiallyobserved ancestral graphs (PAGs) (Zhang, 2008) and maximal ancestral graphs (MAGs) (Richardson & Spirtes, 2002) . However, these graph-types cannot easily capture the types of latents produced by undersampling (Mooij & Claassen, 2020) . Instead, we use compressed graphs, along with properties that were previously proven for this representation (Danks & Plis, 2013) . A compressed graph includes only V, where temporal information is implicitly encoded in the edges. In particular, a compressed graph G for dynamic causal graph G has V i → V j in G iff V t-1 i → V t j is in G. Undersampling (i.e., marginalizing intermediate timesteps) is a straightforward operation for compressed graphs: (1) V i → V j in G u iff there is a length-u directed path from V i to V j in G 1 iff there is a directed path from V t-u i to V t j in G 1 ; and (2) V i ↔ V j in G u iff there exists length-s < u directed paths from V k to V i , and to V j , in G 1 (i.e., V k is an unobserved common cause in G 1 fewer than u timesteps back). (See Appendix A for additional lemmas and proofs.) The bottom row of Figure 1 shows compressed graphs for the unrolled ones on the top row; the left shows the causal timescale and the right shows the graphs undersampled at rate 2. (See Appendix B for more examples of graphs through undersampling.)



This assumption is relatively weak, as we do not assume that we measure at this causal timescale. The causal timescale could be arbitrarily fast. This assumption is a form of causal sufficiency(Spirtes et al., 2000).



Figure 1: Causal graph G 1 and its undersampled version G 2 : unrolled and compressed versions.

