EFFICIENT AUTOMATIC GRAPH LEARNING VIA DESIGN RELATIONS Anonymous authors Paper under double-blind review

Abstract

Despite the success of automated machine learning (AutoML), which aims to find the best design, including the architecture of neural networks and hyper-parameters, conventional AutoML methods are computationally expensive and hardly provide insights into the relations of different model design choices. This work focus on the scope of AutoML on graph tasks. To tackle the challenges, we propose FALCON, an efficient sample-based method to search for the optimal model design on graph tasks. Our key insight is to model the design space of possible model designs as a design graph, where the nodes represent design choices, and the edges denote design similarities. FALCON features 1) a task-agnostic module, which performs message passing on the design graph via a Graph Neural Network (GNN), and 2) a task-specific module, which conducts label propagation of the known model performance information on the design graph. Both modules are combined to predict the design performances in the design space, navigating the search direction. We conduct extensive experiments on 27 node and graph classification tasks from various application domains. We empirically show that FALCON can efficiently obtain the well-performing designs for each task using only 30 explored nodes. Specifically, FALCON has a comparable time cost with the one-shot approaches while achieving an average improvement of 3.3% compared with the best baselines.

1. INTRODUCTION

Automated machine learning (AutoML) (Liu et al., 2019; Pham et al., 2018; Bender et al., 2018; Real et al., 2019; Zoph & Le, 2017; Cai et al., 2019; 2021; Gao et al., 2019; You et al., 2020b; Zhang et al., 2021) has demonstrated great success in various domains including computer vision (Chu et al., 2020; Ghiasi et al., 2019; Chen et al., 2019) , language modeling (Zoph & Le, 2017; So et al., 2019) , and recommender systems (Chen et al., 2022) . It is an essential component for the state-of-the-art deep learning models (Liu et al., 2018; Baker et al., 2017; Xu et al., 2020; Chen et al., 2021) . Given a graph learning task, e.g., a node/graph classification task on graphs, our goal of AutoML is to search for a model architecture and hyper-parameter setting from a design space that results in the best test performance on the task. Following previous works (You et al., 2020b) , we define design as a set of architecture and hyper-parameter choices (e.g., 3 layer, 64 embedding dimensions, batch normalization, skip connection between consecutive layers), and define design space as the space of all possible designs for a given task. However, AutoML is very computationally intensive. The design space of interest often involves millions of possible designs (Elsken et al.; You et al., 2020a) . Sample-based AutoML (Zoph & Le, 2017; Gao et al., 2019; Bergstra et al., 2011; Liu et al., 2017; Luo et al., 2018) has been used to perform search via sampling candidate designs from the design space to explore. One central challenge of existing sample-based AutoML solutions is its sample efficiency: it needs to train as few models as possible to identify the best-performing model in the vast design space. To improve the efficiency, existing research focuses on developing good search algorithms to navigate in the design space (White et al., 2021; Shi et al., 2020; Ma et al., 2019) . However, these methods do not consider modeling the effect of model design choices, which provides strong inductive biases in searching for the best-performing model. By "inductive bias", we refer to the patterns of multiple variables interacting together, which can happen in multiple parts of the 1

