DIFFUSION MODELS FOR CAUSAL DISCOVERY VIA TOPOLOGICAL ORDERING

Abstract

Discovering causal relations from observational data becomes possible with additional assumptions such as considering the functional relations to be constrained as nonlinear with additive noise (ANM). Even with strong assumptions, causal discovery involves an expensive search problem over the space of directed acyclic graphs (DAGs). Topological ordering approaches reduce the optimisation space of causal discovery by searching over a permutation rather than graph space. For ANMs, the Hessian of the data log-likelihood can be used for finding leaf nodes in a causal graph, allowing its topological ordering. However, existing computational methods for obtaining the Hessian still do not scale as the number of variables and the number of samples are increased. Therefore, inspired by recent innovations in diffusion probabilistic models (DPMs), we propose DiffAN 1 , a topological ordering algorithm that leverages DPMs for learning a Hessian function. We introduce theory for updating the learned Hessian without re-training the neural network, and we show that computing with a subset of samples gives an accurate approximation of the ordering, which allows scaling to datasets with more samples and variables. We show empirically that our method scales exceptionally well to datasets with up to 500 nodes and up to 10 5 samples while still performing on par over small datasets with state-of-the-art causal discovery methods.

1. INTRODUCTION

Figure 1 : Plot showing run time in seconds for different sample sizes, for discovery of causal graphs with 500 nodes. Most causal discovery methods have prohibitive run time and memory cost for datasets with many samples; the previous state-of-the-art SCORE algorithm (Rolland et al., 2022) which is included in this graph cannot be computed beyond 2000 samples in a machine with 64GB of RAM. By contrast, our method DiffAN has a reasonable run time even for numbers of samples two orders of magnitude larger than capable by most existing methods. Understanding the causal structure of a problem is important for areas such as economics, biology (Sachs et al., 2005) and healthcare (Sanchez et al., 2022) , especially when reasoning about the effect of interventions. When interventional data from randomised trials are not available, causal discovery methods (Glymour et al., 2019) may be employed to discover the causal structure of a problem solely from observational data. Causal structure is typically modelled as a directed acyclic graph (DAG) G in which each node is associated with a random variable and each edge represents a causal mechanism i.e. how one variable influences another. However, learning such a model from data is NP-hard (Chickering, 1996) . Traditional methods search the DAG space

