GRACE-C: Generalized Rate Agnostic Causal Estimation via Constraints

Abstract

Graphical structures estimated by causal learning algorithms from time series data can provide misleading causal information if the causal timescale of the generating process fails to match the measurement timescale of the data. Existing algorithms provide limited resources to respond to this challenge, and so researchers must either use models that they know are likely misleading, or else forego causal learning entirely. Existing methods face up-to-four distinct shortfalls, as they might a) require that the difference between causal and measurement timescales is known; b) only handle very small number of random variables when the timescale difference is unknown; c) only apply to pairs of variables; or d) be unable to find a solution given statistical noise in the data. This paper addresses these challenges. Our approach combines constraint programming with both theoretical insights into the problem structure and prior information about admissible causal interactions to achieve multiple orders of magnitude in speed-up. The resulting system maintains theoretical guarantees while scaling to significantly larger sets of random variables (> 100) without knowledge of timescale differences. This method is also robust to edge misidentification and can use parametric connection strengths, while optionally finding the optimal solution among many possible ones.

1. Introduction

Dynamic causal models play a pivotal role in modeling real-world systems in diverse domains, including economics, education, climatology, and neuroscience. Given a sufficiently accurate causal graph over random variables, one can predict, explain, and potentially control some system; more generally, one can understand it. In practice, however, specifying or learning an accurate causal model of a dynamical system can be challenging for both statistical and theoretical reasons. One particular challenge arises when data are not measured at the speed of the underlying causal connections. For example, fMRI scanning of the brain indirectly measures dynamical neural activity by measuring the resulting bloodflow and oxygen level changes in different brain regions. However, fMRI measures occur (at most) every second while the brain's actual dynamics are known to proceed at a faster rate (Oram & Perrett, 1992 ), though we do not know how much faster. In general, when the measurement timescale is significantly slower than the causal timescale (as with fMRI), learning can output vastly incorrect causal information. For instance, if we only measure every other timestep

