AUTOTRANSFER: AUTOML WITH KNOWLEDGE TRANS-FER -AN APPLICATION TO GRAPH NEURAL NET-WORKS

Abstract

AutoML has demonstrated remarkable success in finding an effective neural architecture for a given machine learning task defined by a specific dataset and an evaluation metric. However, most present AutoML techniques consider each task independently from scratch, which requires exploring many architectures, leading to high computational costs. Here we propose AUTOTRANSFER, an AutoML solution that improves search efficiency by transferring the prior architectural design knowledge to the novel task of interest. Our key innovation includes a task-model bank that captures the model performance over a diverse set of GNN architectures and tasks, and a computationally efficient task embedding that can accurately measure the similarity among different tasks. Based on the task-model bank and the task embeddings, we estimate the design priors of desirable models of the novel task, by aggregating a similarity-weighted sum of the top-K design distributions on tasks that are similar to the task of interest. The computed design priors can be used with any AutoML search algorithm. We evaluate AUTOTRANSFER on six datasets in the graph machine learning domain. Experiments demonstrate that (i) our proposed task embedding can be computed efficiently, and that tasks with similar embeddings have similar best-performing architectures; (ii) AUTOTRANSFER significantly improves search efficiency with the transferred design priors, reducing the number of explored architectures by an order of magnitude. Finally, we release GNN-BANK-101, a large-scale dataset of detailed GNN training information of 120,000 task-model combinations to facilitate and inspire future research.

1. INTRODUCTION

Deep neural networks are highly modular, requiring many design decisions to be made regarding network architecture and hyperparameters. These design decisions form a search space that is nonconvex and costly even for experts to optimize over, especially when the optimization must be repeated from scratch for each new use case. Automated machine learning (AutoML) is an active research area that aims to reduce the human effort required for architecture design that usually covers hyperparameter optimization and neural architecture search. AutoML has demonstrated success (Zoph and Le, 2016; Pham et al., 2018; Zoph et al., 2018; Cai et al., 2018; He et al., 2018; Guo et al., 2020; Erickson et al., 2020; LeDell and Poirier, 2020) in many application domains. Finding a reasonably good model for a new learning taskfoot_0 in a computationally efficient manner is crucial for making deep learning accessible to domain experts with diverse backgrounds. Efficient AutoML is especially important in domains where the best architectures/hyperparameters are highly sensitive to the task. A notable example is the domain of graph learningfoot_1 . First, graph learning methods receive input data composed of a variety of data types and optimize over tasks that span an equally diverse set of domains and modalities such as recommendation (Ying et al., 2018; He et al., 2020) , physical simulation (Sanchez-Gonzalez et al., 2020; Pfaff et al., 2020) , and bioinformatics (Zitnik et al., 2018) . This differs from computer vision and natural language processing where the input data has a predefined, fixed structure that can be shared across different neural architectures. Second, neural networks that operate on graphs come with a rich set of design choices and a large set of parameters to explore. However, unlike other domains where a few pre-trained architectures such as ResNet (He et al., 2016) and GPT-3 (Brown et al., 2020) dominate the benchmarks, it has been shown that the best graph neural network (GNN) design is highly task-dependent (You et al., 2020) . Although AutoML as a research domain is evolving fast, existing AutoML solutions have massive computational overhead when finding a good model for a new learning task is the goal. Most present AutoML techniques consider each task independently and in isolation, therefore they require redoing the search from scratch for each new task. This approach ignores the potentially valuable architectural design knowledge obtained from the previous tasks, and inevitably leads to a high computational cost. The issue is especially significant in the graph learning domain Gao et al. ( 2019); Zhou et al. ( 2019), due to the challenges of diverse task types and the huge design space that are discussed above. Here we propose AUTOTRANSFERfoot_2 , an AutoML solution that drastically improves AutoML architecture search by transferring previous architectural design knowledge to the task of interest. Our key innovation is to introduce a task-model bank that stores the performance of a diverse set of GNN architectures and tasks to guide the search algorithm. To enable knowledge transfer, we define a task embedding space such that tasks close in the embedding space have similar corresponding top-performing architectures. The challenge here is that the task embedding needs to capture the performance rankings of different architectures on different datasets, while being efficient to compute. Our innovation here is to embed a task by using the condition number of its Fisher Information Matrix of various randomly initialized models and also a learning scheme with an empirical generalization guarantee. This way we implicitly capture the properties of the learning task, while being orders of magnitudes faster (within seconds). We then estimate the design prior of desirable models for the new task, by aggregating design distributions on tasks that are close to the task of interest. Finally, we initiate a hyperparameter search algorithm with the task-informed design prior computed. We evaluate AUTOTRANSFER on six datasets, including both node classification and graph classification tasks. We show that our proposed task embeddings can be computed efficiently and the distance measured between tasks correlates highly (0.43 Kendall correlation) with model performance rankings. Furthermore, we present AUTOTRANSFER significantly improves search efficiency when using the transferred design prior. AUTOTRANSFER reduces the number of explored architectures needed to reach a target accuracy by an order of magnitude compared to SOTA. Finally, we release GNN-BANK-101-the first large-scale database containing detailed performance records for 120,000 task-model combinations which were trained with 16,128 GPU hours-to facilitate future research.

2. RELATED WORK

In this section, we summarize the related work on AutoML regarding its applications on GNNs, the common search algorithms, and pioneering work regarding transfer learning and task embeddings. AutoML for GNNs. Neural architecture search (NAS), a unique and popular form of AutoML for deep learning, can be divided into two categories: multi-trial NAS and one-shot NAS. During multi-trial NAS, each sampled architecture is trained separately. GraphNAS (Gao et al., 2020) and Auto-GNN (Zhou et al., 2019) are typical multi-trial NAS algorithms on GNNs which adopt an RNN controller that learns to suggest better sets of configurations through reinforcement learning. One-shot NAS (e.g., (Liu et al., 2018; Qin et al., 2021; Li et al., 2021) ) involves encapsulating the entire model space in one super-model, training the super-model once, and then iteratively sampling sub-models from the super-model to find the best one. In addition, there is work that explicitly studies fine-grained design choices such as data augmentation (You et al., 2021) , message passing layer type (Cai et al., 2021; Ding et al., 2021; Zhao et al., 2021), and graph pooling (Wei et al., 2021) . Notably, AUTOTRANSFER is the first AutoML solution for GNNs that efficiently transfer design knowledge across tasks. HPO Algorithms. Hyperparameter Optimization (HPO) algorithms search for the optimal model hyperparameters by iteratively suggesting a set of hyperparameters and evaluating their performance. Random search samples hyperparameters from the search space with equal probability. Despite not



In this paper, we refer to a task as a given dataset with an evaluation metric/loss, e.g., cross-entropy loss on node classification on the Cora dataset. We focus on the graph learning domain in this paper. AUTOTRANSFER can be generalized to other domains. Source code is available at https://github.com/snap-stanford/AutoTransfer.

