DIRECTED ACYCLIC GRAPH NEURAL NETWORKS

Abstract

Graph-structured data ubiquitously appears in science and engineering. Graph neural networks (GNNs) are designed to exploit the relational inductive bias exhibited in graphs; they have been shown to outperform other forms of neural networks in scenarios where structure information supplements node features. The most common GNN architecture aggregates information from neighborhoods based on message passing. Its generality has made it broadly applicable. In this paper, we focus on a special, yet widely used, type of graphs-DAGs-and inject a stronger inductive bias-partial ordering-into the neural network design. We propose the directed acyclic graph neural network, DAGNN, an architecture that processes information according to the flow defined by the partial order. DAGNN can be considered a framework that entails earlier works as special cases (e.g., models for trees and models updating node representations recurrently), but we identify several crucial components that prior architectures lack. We perform comprehensive experiments, including ablation studies, on representative DAG datasets (i.e., source code, neural architectures, and probabilistic graphical models) and demonstrate the superiority of DAGNN over simpler DAG architectures as well as general graph architectures.

1. INTRODUCTION

Graph-structured data is ubiquitous across various disciplines (Gilmer et al., 2017; Zitnik et al., 2018; Sanchez-Gonzalez et al., 2020) . Graph neural networks (GNNs) use both the graph structure and node features to produce a vectorial representation, which can be used for classification, regression (Hu et al., 2020) , and graph decoding (Li et al., 2018; Zhang et al., 2019) . Most popular GNNs update node representations through iterative message passing between neighboring nodes, followed by pooling (either flat or hierarchical (Lee et al., 2019; Ranjan et al., 2020) ), to produce a graph representation (Li et al., 2016; Kipf & Welling, 2017; Gilmer et al., 2017; Veličković et al., 2018; Xu et al., 2019) . The relational inductive bias (Santoro et al., 2017; Battaglia et al., 2018; Xu et al., 2020) -neighborhood aggregation-empowers GNNs to outperform graph-agnostic neural networks. To facilitate subsequent discussions, we formalize a message-passing neural network (MPNN) architecture, which computes representations h v for all nodes v in a graph G in every layer and a final graph representation h G , as (Gilmer et al., 2017) : h v = COMBINE h -1 v , AGGREGATE {h -1 u | u ∈ N (v)} , = 1, . . . , L, h G = READOUT {h L v , v ∈ V} , where h 0 v is the input feature of v, N (v) denotes a neighborhood of node v (sometimes including v itself), V denotes the node set of G, L is the number of layers, and AGGREGATE , COMBINE , and READOUT are parameterized neural networks. For notational simplicity, we omit edge attributes; but they can be straightforwardly incorporated into the framework (1)-(2). Directed acyclic graphs (DAGs) are a special type of graphs, yet broadly seen across domains. Examples include parsing results of source code (Allamanis et al., 2018) , logical formulas (Crouse et al., 2019) , and natural language sentences, as well as probabilistic graphical models (Zhang et al., 2019 ), neural architectures (Zhang et al., 2019) , and automated planning problems (Ma et al., 2020) .

