EFFICIENT AUTOMATIC GRAPH LEARNING VIA DESIGN RELATIONS Anonymous authors Paper under double-blind review

Abstract

Despite the success of automated machine learning (AutoML), which aims to find the best design, including the architecture of neural networks and hyper-parameters, conventional AutoML methods are computationally expensive and hardly provide insights into the relations of different model design choices. This work focus on the scope of AutoML on graph tasks. To tackle the challenges, we propose FALCON, an efficient sample-based method to search for the optimal model design on graph tasks. Our key insight is to model the design space of possible model designs as a design graph, where the nodes represent design choices, and the edges denote design similarities. FALCON features 1) a task-agnostic module, which performs message passing on the design graph via a Graph Neural Network (GNN), and 2) a task-specific module, which conducts label propagation of the known model performance information on the design graph. Both modules are combined to predict the design performances in the design space, navigating the search direction. We conduct extensive experiments on 27 node and graph classification tasks from various application domains. We empirically show that FALCON can efficiently obtain the well-performing designs for each task using only 30 explored nodes. Specifically, FALCON has a comparable time cost with the one-shot approaches while achieving an average improvement of 3.3% compared with the best baselines.

1. INTRODUCTION

Automated machine learning (AutoML) (Liu et al., 2019; Pham et al., 2018; Bender et al., 2018; Real et al., 2019; Zoph & Le, 2017; Cai et al., 2019; 2021; Gao et al., 2019; You et al., 2020b; Zhang et al., 2021) has demonstrated great success in various domains including computer vision (Chu et al., 2020; Ghiasi et al., 2019; Chen et al., 2019) , language modeling (Zoph & Le, 2017; So et al., 2019) , and recommender systems (Chen et al., 2022) . It is an essential component for the state-of-the-art deep learning models (Liu et al., 2018; Baker et al., 2017; Xu et al., 2020; Chen et al., 2021) . Given a graph learning task, e.g., a node/graph classification task on graphs, our goal of AutoML is to search for a model architecture and hyper-parameter setting from a design space that results in the best test performance on the task. Following previous works (You et al., 2020b) , we define design as a set of architecture and hyper-parameter choices (e.g., 3 layer, 64 embedding dimensions, batch normalization, skip connection between consecutive layers), and define design space as the space of all possible designs for a given task. However, AutoML is very computationally intensive. The design space of interest often involves millions of possible designs (Elsken et al.; You et al., 2020a) . Sample-based AutoML (Zoph & Le, 2017; Gao et al., 2019; Bergstra et al., 2011; Liu et al., 2017; Luo et al., 2018) has been used to perform search via sampling candidate designs from the design space to explore. One central challenge of existing sample-based AutoML solutions is its sample efficiency: it needs to train as few models as possible to identify the best-performing model in the vast design space. To improve the efficiency, existing research focuses on developing good search algorithms to navigate in the design space (White et al., 2021; Shi et al., 2020; Ma et al., 2019) . However, these methods do not consider modeling the effect of model design choices, which provides strong inductive biases in searching for the best-performing model. By "inductive bias", we refer to the patterns of multiple variables interacting together, which can happen in multiple parts of the design space. Thus, an efficient search strategy should rapidly rule out a large subset of the design space with potentially bad performance leveraging such learned inductive bias. Proposed approach. To overcome the limitations, we propose FALCON, an AutoML framework on graph tasks that achieves state-of-the-art sample efficiency and performance by leveraging model design insights. Our key insight is to build a design graph over the design space of architecture and hyper-parameter choices. FALCON extracts model design insights by learning a meta-model that captures the relation between the design graph and model performance and uses it to inform a sample-efficient search strategy. FALCON consists of the following two novel components. Design space as a graph. Previous works view the model design space as a high-dimensional space with isolated design choices (You et al., 2020b) , which offer few insights regarding the relations between different design choices. For example, through trial runs if we find the models with more than 3 layers do not work well without batch normalization, this knowledge can help us reduce the search space by excluding all model designs of more than 3 layers with batch normalization set to false. While such insights are hardly obtained with existing AutoML algorithms (Liu et al., 2019; Pham et al., 2018; Gao et al., 2019; Zoph & Le, 2017; Cai et al., 2019) , FALCON achieves it via constructing a graph representation, design graph, among all the design choices. Figure 1 (a) shows a visualization of a design graph, where each node represents a candidate design, and edges denote the similarity between the designs. See Section 3.1 for details on the similarity and graph construction. Search by navigating on the design graph. Given the design graph, FALCON deploys a Graph Neural Network predictor, short for meta-GNN, which is supervised by the explored nodes' performances and learns to predict the performance of a specific design given the corresponding node in the design graph. The meta-GNN is designed with 1) a task-agnostic module, which performs message passing on the design graph, and 2) a task-specific module, which conducts label propagation of the known model performance information on the design graph. Furthermore, we propose a search strategy that uses meta-GNN predictions to navigate the search in the design graph efficiently. Experiments. We conduct extensive experiments on 27 graph datasets, covering node-and graphlevel tasks with distinct distributions. Moreover, we demonstrate FALCON' potential applicability on image datasets by conducting experiments on the CIFAR-10 image dataset. Our code is available at https://anonymous.4open.science/r/Falcon.

2. RELATED WORK

Automatic Machine Learning (AutoML) is the cornerstone of discovering state-of-the-art model designs without costing massive human efforts. We introduce four types of related works below.



Figure 1: Overview of FALCON. (a) Design graph example. We present a small design graph on TU-COX2 graph classification dataset. The design choices are shown in the table, #pre, #mp, #post denotes the numbers of pre-processing, message passing, and post-processing layers, respectively. The better design performance, the darker node colors. (b) FALCON search strategy. Red: Explored nodes. Green: Candidate nodes to be sampled from. Blue: The best node. Gray: Other nodes. Locally, FALCON extends the design subgraph via a search strategy detailed in Section 3.3. Globally, FALCON approaches the optimal design navigated by the inductive bias of the design relations.

