DIFFERENTIABLE and TRANSPORTABLE STRUCTURE LEARNING

Abstract

Directed acyclic graphs (DAGs) encode a lot of information about a particular distribution in its structure. However, compute required to infer these structures is typically super-exponential in the number of variables, as inference requires a sweep of a combinatorially large space of potential structures. That is, until recent advances made it possible to search this space using a differentiable metric, drastically reducing search time. While this technique-named NOTEARS -is widely considered a seminal work in DAG-discovery, it concedes an important property in favour of differentiability: transportability. To be transportable, the structures discovered on one dataset must apply to another dataset from the same domain. In our paper, we introduce D-Struct which recovers transportability in the discovered structures through a novel architecture and loss function, while remaining completely differentiable. Because D-Struct remains differentiable, our method can be easily adopted in existing differentiable architectures, as was previously done with NOTEARS. In our experiments, we empirically validate D-Struct with respect to edge accuracy and structural Hamming distance in a variety of settings.

1. INTRODUCTION

Machine learning has proven to be a crucial tool in many disciplines. With successes in medicine [1] [2] [3] [4] [5] , economics [6] [7] [8] , physics [9] [10] [11] [12] [13] [14] , robotics [15] [16] [17] [18] , and even entertainment [19] [20] [21] , machine learning is transforming the way in which experts interact with their field. These successes are in large part due to increasing accuracy of diagnoses, marketing campaigns, analyses of experiments, and so forth. However, machine learning has much more to offer than improved accuracy alone. Indeed, recent advances seem to support this claim, as machine learning is slowly recognised as a tool for scientific discovery [22] [23] [24] [25] . In these successes, machine learning helped to uncover a previously unknown relationships between variables. Discovering such relationships is the first step of the long process of scientific discovery and are the focus of our paper as D-Struct-the model we propose in this paper -aims to help through differentiable and transportable structure learning. The structures. We focus on discovering directed acyclic graphs (DAGs) in a domain X . A DAG helps us understand how different variables in X interact with each other. Consider a three-variable domain X := {X, Y, Z}, governed by a joint-distribution, P X . A DAG explicitly models variable interactions in P X . For example, consider the following DAG: G = X Y Z , where G depicts P X as a DAG. Such a DAG allows useful analysis of dependence and independence of variables in P X [26, 27] . From G, we learn that X does not directly influence Y , and that X ⊥ ⊥ Y |Z as X does not give us any additional information on Y once we know Z. While DAGs are the model of choice in causality [28] , it is impossible to discover a causal DAG from observational data alone [29] [30] [31] [32] . As we only wish to assume access to observational data, our goal is not causal discovery. The above forms the basis for conventional DAG-structure learning [33] . In particular, X ⊥ ⊥ Y |Z strongly limits the possible DAGs that model P X . Given more independence statements, we limit the potential DAGs further. However, independence tests are computationally expensive which is problematic as the number of potential DAGs increases super-exponentially in |X | [34] . This limitation strongly impacted the adoption of until Zheng et al. [35] proposed NOTEARS which incorporates a differentiable metric to evaluate whether or not a discovered structure is a DAG [35, 36] . Using automatic differentiation, NOTEARS learns a DAG-structure in a much more efficient way than earlier methods based on conditional independence tests (CITs).

