EFFICIENT GRAPH NEURAL ARCHITECTURE SEARCH

Abstract

Recently, graph neural networks (GNN) have been demonstrated effective in various graph-based tasks. To obtain state-of-the-art (SOTA) data-specific GNN architectures, researchers turn to the neural architecture search (NAS) methods. However, it remains to be a challenging problem to conduct efficient architecture search for GNN. In this work, we present a novel framework for Efficient GrAph Neural architecture search (EGAN). By designing a novel and expressive search space, an efficient one-shot NAS method based on stochastic relaxation and natural gradient is proposed. Further, to enable architecture search in large graphs, a transfer learning paradigm is designed. Extensive experiments, including node-level and graph-level tasks, are conducted. The results show that the proposed EGAN can obtain SOTA data-specific architectures, and reduce the search cost by two orders of magnitude compared to existing NAS baselines.

1. INTRODUCTION

Recent years have witnessed the success of graph neural networks (GNN) (Gori et al., 2005; Battaglia et al., 2018) in various graph-based tasks, e.g., recommendation (Ying et al., 2018a) , chemistry (Gilmer et al., 2017) , circuit design (Zhang et al., 2019) , subgraph counting (Liu et al., 2020) , and SAT generation (You et al., 2019) . To adapt to different graph-based tasks, various GNN models, e.g., GCN (Kipf & Welling, 2016) , GAT (Veličković et al., 2018) , or GIN (Xu et al., 2019) , have been designed in the past five years. Most existing GNN models follow a neighborhood aggregation (or message passing) schema (Gilmer et al., 2017) , as shown in the left part of Figure 1 , which is that the representation of a node in a graph is learned by iteratively aggregating the features of its neighbors. Despite the broad applications of GNN models, researchers have to take efforts to design proper GNN architectures given different tasks by imposing different relational inductive biases (Battaglia et al., 2018) . As pointed out by Battaglia et al. (2018) , the GNN architectures can support one form of combinatorial generalization given different tasks, i.e., graphs. Then a natural and interesting question can be asked: Can we automatically design state-of-the-art (SOTA) GNN architectures for graph-based tasks? A straightforward solution is to adopt the neural architecture search (NAS) approaches, which have shown promising results in automatically designing architectures for convolutional neural networks (CNN) (Zoph & Le, 2017; Pham et al., 2018; Liu et al., 2019a; Tan & Le, 2019; You et al., 2020a) . However, it is nontrivial to adopt NAS to GNN. The first challenge is to define the search space. One can design a dummy search space to include as many as possible the related parameters, e.g., aggregation functions, number of layers, activation functions, etc., on top of the message passing framework (Eq. ( 1)), however, it leads to quite a large discrete space, for example, 315,000 possible GNN architectures are generated by including just 12 types of model parameters in You et al. (2020b)), which is challenging for any search algorithm. The second challenge is to design an effective and efficient search algorithm. In the literature, reinforcement learning (RL) based and evolutionary based algorithms have been explored for GNN architecture search (Gao et al., 2020; Zhou et al., 2019; Lai et al., 2020; Nunes & Pappa, 2020) . However, they are inherently computationally expensive due to the stand-alone training manner. In the NAS literature, by adopting the weight sharing strategy, one-shot NAS methods are orders of magnitude more efficient than RL based ones (Pham et al., 2018; Liu et al., 2019a; Xie et al., 2019; Guo et al., 2019) . However, the one-shot methods cannot be directly applied to the aforementioned dummy search space, since it remains unknown how to search for some model parameters like number of layers and activation functions by the weight sharing strategy. Therefore, it is a challenging problem to conduct effective and efficient architecture search for GNN. In this work, we propose a novel framework, called EGAN (Efficient GrAph Neural architecture search), to automatically design SOTA GNN architectures. Motivated by two well-established works (Xu et al., 2019; Garg et al., 2020) that the expressive capabilities of GNN models highly rely on the properties of the aggregation functions, a novel search space consisting of node and layer aggregators is designed, which can emulate many popular GNN models. Then by representing the search space as a directed acyclic graph (DAG) (Figure 1(c )), we design a one-shot framework by using the stochastic relaxation and natural gradient method, which can optimize the architecture selection and model parameters in a differentiable manner. To enable architecture search in large graphs, we further design a transfer learning paradigm, which firstly constructs a proxy graph out of the large graph by keeping the properties, and then searches for GNN architectures in the proxy graph, finally transfers the searched architecture to the large graph. To demonstrate the effectiveness and efficiency of the proposed framework, we apply EGAN to various tasks, from node-level to graph-level ones. The experimental results on ten different datasets show that EGAN can obtain SOTA data-specific architectures for different tasks, and at the same time, reduce the search cost by two orders of magnitude. Moreover, the transfer learning paradigm, to the best of our knowledge, is the first framework to enable architecture search in large graphs. Notations. Let G = (V, E) be a simple graph with node features X ∈ R N ×d , where V and E represent the node and edge sets, respectively. N represents the number of nodes and d is the dimension of node features. We use N (v) to represent the first-order neighbors of a node v in G, i.e., N (v) = {u ∈ V|(v, u) ∈ E}. In the literature, we also create a new set N (v) is the neighbor set including itself, i.e., N (v) = {v} ∪ {u ∈ V|(v, u) ∈ E}.

2. RELATED WORKS

GNN was first proposed in (Gori et al., 2005) and in the past five years different GNN models (Kipf & Welling, 2016; Hamilton et al., 2017; Veličković et al., 2018; Gao et al., 2018; Battaglia et al., 



Figure 1: An illustrative example of a GNN model and the propsed EGAN (Best view in color). (a) An example graph with five nodes. The gray rectangle represents the input features of each node; (b) A typical 3-layer GNN model following the message passing neighborhood aggregation schema, which computes the embeddings of node "2"; (c) The DAG represents the search space for a 3-layer GNN, and α n , α l , α s represent, respectively, weight vectors for node aggregators, layer aggregators, and skip-connections in the corresponding edges. The rectangles denote the representations, out of which three green ones represent the hidden embeddings, gray (h 0 v ) and yellow (h 5 v ) ones represent the input and output embeddings, respectively, and blue one (h 4 v ) represent the set of output embeddings of three node aggregators for the layer aggregator. (d) At the t-th epoch, an architecture is sampled from p(Z n ), p(Z s ), p(Z l ), whose rows Z i,j are one-hot random variable vector indicating masks multipled to edges (i, j) in the DAG. Columns of these matrices represent the operations from O n , O s , O l .

