MAXIMAL CORRELATION-BASED POST-NONLINEAR LEARNING FOR BIVARIATE CAUSAL DISCOVERY

Abstract

Bivariate causal discovery aims to determine the causal relationship between two random variables from passive observational data (as intervention is not affordable in many scientific fields), which is considered fundamental and challenging. Designing algorithms based on the post-nonlinear (PNL) model has aroused much attention for its generality. However, the state-of-the-art (SOTA) PNL-based algorithms involve highly non-convex objectives due to the use of neural networks and non-convex losses, thus optimizing such objectives is often time-consuming and unable to produce meaningful solutions with finite samples. In this paper, we propose a novel method that incorporates maximal correlation into the PNL model learning (short as MC-PNL) such that the underlying nonlinearities can be accurately recovered. Owing to the benign structure of our objective function, when modeling the nonlinearities with linear combinations of random Fourier features, the target optimization problem can be solved rather efficiently and rapidly via the block coordinate descent. We also compare the MC-PNL with SOTA methods on the downstream synthetic and real causal discovery tasks to show its superiority in time and accuracy.

1. INTRODUCTION AND RELATED WORKS

Causal discovery is an old and new topic to the machine learning community, which aims to find causal relationships among variables. Many recent attempts at application have emerged in various scientific domains, such as climate science (Ebert-Uphoff & Deng, 2012; Runge et al., 2019 ), bioinformatics (Choi et al., 2020; Foraita et al., 2020; Shen et al., 2020) , etc. The gold standard for causal discovery is to conduct randomized experiments (via interventions), however, interventions are often expensive, unethical, and impractical. It is highly demanded to discover causal relationships purely from passive observational data. In the past three decades, many pioneer algorithms for directed acyclic graph (DAG) searching have been developed for multi-variate causal discovery to reduce the computational complexity and improve the accuracy. For example, there are constraint/independencebased algorithms such as IC, PC, FCI (Pearl, 2009; Spirtes et al., 2000) , RFCI (Colombo et al., 2012) (too many to be listed), as well as score-based methods such as GES (Chickering, 2002) , NOTEARS (Zheng et al., 2018) , etc. However, the algorithms mentioned above can merely return a Markov equivalence class (MEC) that encodes the same set of conditional independencies, with many undetermined edge directions; moreover, the discovered DAG may not necessarily be causal. In this paper, we will focus on a fundamental problem, namely bivariate causal discovery, which aims to determine the causal direction between two random variables X and Y . Bivariate causal discovery is one promising routine for further identification of the underlying causal DAG (Peters et al., 2017) . Bivariate causal discovery is a challenging task, which cannot be directly solved using the existing methodologies for the multivariate case, as the two candidate DAGs, X → Y and X ← Y , are in the same MEC. More assumptions should be imposed to make bivariate causal discovery feasible, as summarized by Peters et al. (2017) . One assumption is on the a priori model class restriction, e.g., linear non-Gaussian acyclic model (LiNGAM) (Shimizu et al., 2006) , nonlinear additive noise model (ANM) (Mooij et al., 2016) , post-nonlinear (PNL) model (Zhang & Hyvärinen, 2009) , etc. The other assumption is on the "independence of cause and mechanism" leading to the algorithms of trace condition (Janzing et al., 2010) , IGCI (Janzing et al., 2012) , distance correlations (Liu & Chan, 2016 ), meta-transfer (Bengio et al., 2020) , CDCI (Duong & Nguyen, 2022), etc. There are

availability

//anonymous.4open.

