DO NOT TRAIN IT: A LINEAR NEURAL ARCHITECTURE SEARCH OF GRAPH NEURAL NETWORKS

Abstract

Neural architecture search (NAS) for Graph neural networks (GNNs), called NAS-GNNs, has achieved significant performance over manually designed GNN architectures. However, these methods inherit issues from the conventional NAS methods, such as high computational cost and optimization difficulty. More importantly, previous NAS methods have ignored the uniqueness of GNNs, where the non-linearity has limited effect. Based on this, we are the first to theoretically prove that a GNN fixed with random weights can obtain optimal outputs under mild conditions. With the randomly-initialized weights, we can then seek the optimal architecture parameters via the sparse coding objective and derive a novel NAS-GNNs method, namely neural architecture coding (NAC). Consequently, our NAC holds a no-update scheme on GNNs and can efficiently compute in linear time. Empirical evaluations on multiple GNN benchmark datasets demonstrate that our approach leads to state-of-the-art performance, which is up to 200× faster and 18.8% more accurate than the strong baselines.

1. INTRODUCTION

Remarkable progress of graph neural networks (GNNs) has boosted research in various domains, such as traffic prediction, recommender systems, etc., as summarized in (Wu et al., 2021) . The central paradigm of GNNs is to generate node embeddings through the message-passing mechanism (Hamilton, 2020), including passing, transforming, and aggregating node features across the input graph. Despite its effectiveness, designing GNNs requires laborious efforts to choose and tune neural architectures for different tasks and datasets (You et al., 2020) , which limits the usability of GNNs. To automate the process, researchers have made efforts to leverage neural architecture search (NAS) (Liu et al., 2019a; Zhang et al., 2021b) for GNNs, including GraphNAS (Gao et al., 2020 ), Auto-GNN (Zhou et al., 2019) , PDNAS (Zhao et al., 2020) and SANE (Zhao et al., 2021b) . In this work, we refer the problem of NAS for GNNs as NAS-GNNs. While NAS-GNNs have shown promising results, they inherit issues from general NAS methods and fail to account for the unique properties of GNN operators. It is important to understand the difficulty in general NAS training (e.g., architecture searching and weight evaluation). Based on the searching strategy, NAS methods can be categorized into three types: reinforcement learning-based methods (Zoph & Le, 2017), evolutionary algorithms-based methods (Jozefowicz et al., 2015) , and differential-based methods (Liu et al., 2019a; Wu et al., 2019a) Both reinforcement learning-based and evolutionary algorithm-based methods suffer from high computational costs due to the need to re-train sampled architectures from scratch. On the contrary, the weight-sharing differential-based paradigm reuses the neural weights to reduce the search effort and produces the optimal sub-architecture directly without excessive processes, such as sampling, leading to significant computational cost reduction and becoming the new frontier of NAS. However, the weight sharing paradigm requires the neural weights to reach optimality so as to obtain the optimal sub-architecture based on its bi-level optimization (BLO) strategy (Liu et al., 2019a) , which alternately optimizes the network weights (outputs of operators) and architecture parameters (importance of operators). First, it is hard to achieve the optimal neural weights in general due to the curse of dimensionality in deep learning, leading to unstable searching results, also called the optimization gap (Xie et al., 2022) . Second, this paradigm often shows a sloppy gradient estimation (Bi et al., 2020a; b; Guo et al., 2020b) due to the alternating optimization, softmax-based 1

