GEASS: NEURAL CAUSAL FEATURE SELECTION FOR HIGH-DIMENSIONAL BIOLOGICAL DATA

Abstract

Identifying nonlinear causal relationships in high-dimensional biological data is an important task. However, current neural network based causality detection approaches for such data suffer from poor interpretability and cannot scale well to the high dimensional regime. Here we present GEASS (Granger fEAture Selection of Spatiotemporal data), which identifies sparse Granger causal interacting features of high dimensional spatiotemporal data by a single neural network. GEASS maximizes sparsity-regularized modified transfer entropy with a theoretical guarantee of recovering features with spatial/temporal Granger causal relationships. The sparsity regularization is achieved by a novel combinatorial stochastic gate layer to select sparse non-overlapping feature subsets. We demonstrate the efficacy of GEASS in several synthetic datasets and real biological data from single-cell RNA sequencing and spatial transcriptomics.

1. INTRODUCTION

Advances in single-cell omics research enable full characterizations of high-dimensional gene dynamics in biological systems on a either temporal or spatial scale. An example for the temporal case is single-cell RNA sequencing (scRNA-seq) trajectories, where cells are sampled from a dynamical biological process, sequenced, and ordered based on either real sampled time or inferred pseudo-time (Cannoodt et al., 2016; Saelens et al., 2019) . Gene dynamics along the specified cell order encodes information of causal regulation for the underlying biological process. An example for the spatial case is single-cell level spatial transcriptomics (e.g. SeqFISH+ (Eng et al., 2019 ), Merfish (Fang et al., 2022) ), in which cells from a tissue slice are sequenced with their spatial coordinates preserved (Moses and Pachter, 2022; Rao et al., 2021; Palla et al., 2022) . Spatial profiling allows investigations of the cellular interplay, corresponding to conditional gene expression change caused by neighborhood phenotypic states. However, despite the potential significance, data-driven causal discovery for such data remains largely unexplored, especially for the spatial omics data. Identifications of causal regulatory patterns in such data can be reformulated into the general task of causal feature selection in observational data with intrinsic structures, e.g. spatial data or temporal data. Identifications of causal interactions in time-series has lead to valuable findings in multiple disciplines, including but not limited to, economy, climate science, and biology (Hoover, 2006; Kamiński et al., 2001; Runge et al., 2019a) . Learning directed causal relationships in temporal/spatial data is feasible as time and space both induce asymmetric dependencies. In the case of time-series data, a feature in the future cannot have effect on past values of other features. For spatial data, a similar definition of causal dependency can be established (Herrera Gómez et al., 2014) . The concept of Granger causality is proposed in order to uncover the assymetric causal dependency (Granger, 1969; Shojaie and Fox, 2022) . In time-series data, this would translate to identifying one variable's causal relationship with other variables based on how well the historical observations of other variables can predict the variable's present value. The application of Granger causality in a spatial context corresponds to predicting significant relationships between neighboring observations of other variables and the specified variable (Mielke et al., 2020) , which is a key insight used in recent works aimed to discover cellular interaction patterns in spatial omics data (Fischer et al., 2021; Valdés-Sosa et al., 18) . In the nonlinear regime, information-theoretic measures such as directed information, transfer entropy (Schreiber, 2000) , and partial transfer entropy (Staniek and Lehnertz, 2008) , are used as a counterpart of linear Granger causality. Moreover, some works consider modeling conditional independence (CI) in time-series data to identify the underlying causal graph (Entner and Hoyer, 2010; Malinsky and Spirtes, 2018; Moneta et al., 2011; Runge et al., 2019a; Pfister et al., 2019; Mastakouri et al., 2021) . Two examples are VarLINGAM (Hyvärinen et al., 2010) and PCMCI (Runge et al., 2019b) , which are generalizations of LINGAM (Shimizu et al., 2006) and PC (Spirtes et al., 2000) respectively. Finally, multiple recent works have proposed to use neural network approaches to model the nonlinear Granger causality, including MLP, LSTM, and neural-ODE based approaches, resulting in improved prediction power for nonlinear time-series dynamics (Li et al., 2017; Tank et al., 2021; Nauta et al., 2019; Yin and Barucca, 2022; Bellot et al., 2021) . Despite the success of these methods in various systems of interest, multiple challenges limit their use in high-dimensional biological datasets. • Although linear methods (LINGAM, linear Granger causality) have succeeded in various settings and can potentially scale to high feature numbers, these methods may completely fail when the feature dependency in data is highly complex and nonlinear. • As the number of conditional independencies generally scales exponentially or at least polynomially with the feature size, applying causal discovery methods which are based on CI tests to high-dimensional data is not realistic. Distinctively, Granger-causality based methods are built with a prediction model for each feature in the data. The time complexity of solving the stacked prediction model for all features is of polynomial level with respect to the feature size. • In previous methods, the number of causal edges between features is assumed to be sparse (edge sparsity) to maximize interpretability of the identified causal graph. However, in biological data, there exists a large proportion of nuisance features. Also, one functional gene may activate a large number of downstream genes in neighboring cells. Sparsifying the number of interacting features (feature sparsity) has the potential to improve causal discovery in biological systems, which remains to be explored. • While a large number of methods are designed for causal discovery in time-series data, only a limited number of present works aim for causal discovery in general graph-structured data. Time-series based methods cannot be directly adopted on data with multi-branch trajectory dynamics or spatial structures. Our contribution. In this work, we present GEASS (Granger fEAture Selection of Spatiotemporal data), which identifies causally interacting features of high dimensional temporal / spatial data by a single neural network. GEASS considers the aforementioned feature sparsity instead of edge sparsity, thus selects most significant interacting features for downstream causal discovery. Our contributions are three-folds. 1. Instead of direct causal discovery in data, we formulate the task as two steps of causal feature selection and causal graph identification. We provide a novel solution of causal feature selection problem in general graph-structured data by the use of modified transfer entropy maximization with theoretical guarantees. 2. In order to solve our proposed optimization problem, we design a novel combinatorial stochastic gate layer to select non-overlapping sparse feature sets with a newly designed initialization procedure. 3. We demonstrate the power of our method by benchmarking it on both temporal data and spatial data of multiple settings. Our method gives accurate and robust causal feature identification and reveals novel biology in real datasets.

1.1. RELATED WORKS

Neural Granger causality. Despite the large body of work based on linear Granger causal discovery, neural Granger causality still remains an active area of research. Various neural network architectures, such as MLP, sequential model, and attention-based architecture (Tank et al., 2021; Nauta et al., 2019; Khanna and Tan, 2019; Sun et al., 2021) , have been proposed for nonlinear Granger causality

