INTERPRETABLE NEURAL ARCHITECTURE SEARCH VIA BAYESIAN OPTIMISATION WITH WEISFEILER-LEHMAN KERNELS

Abstract

Current neural architecture search (NAS) strategies focus only on finding a single, good, architecture. They offer little insight into why a specific network is performing well, or how we should modify the architecture if we want further improvements. We propose a Bayesian optimisation (BO) approach for NAS that combines the Weisfeiler-Lehman graph kernel with a Gaussian process surrogate. Our method optimises the architecture in a highly data-efficient manner: it is capable of capturing the topological structures of the architectures and is scalable to large graphs, thus making the high-dimensional and graph-like search spaces amenable to BO. More importantly, our method affords interpretability by discovering useful network features and their corresponding impact on the network performance. Indeed, we demonstrate empirically that our surrogate model is capable of identifying useful motifs which can guide the generation of new architectures. We finally show that our method outperforms existing NAS approaches to achieve the state of the art on both closed-and open-domain search spaces.

1. INTRODUCTION

Neural architecture search (NAS) aims to automate the design of good neural network architectures for a given task and dataset. Although different NAS strategies have led to state-of-the-art neural architectures, outperforming human experts' design on a variety of tasks (Real et al., 2017; Zoph and Le, 2017; Cai et al., 2018; Liu et al., 2018a; b; Luo et al., 2018; Pham et al., 2018; Real et al., 2018; Zoph et al., 2018a; Xie et al., 2018) , these strategies behave in a black-box fashion, which returns little design insight except for the final architecture for deployment. In this paper, we introduce the idea of interpretable NAS, extending the learning scope from simply the optimal architecture to interpretable features. These features can help explain the performance of networks searched and guide future architecture design. We make the first attempt at interpretable NAS by proposing a new NAS method, NAS-BOWL; our method combines a Gaussian process (GP) surrogate with the Weisfeiler-Lehman (WL) subtree graph kernel (we term this surrogate GPWL) and applies it within the Bayesian Optimisation (BO) framework to efficiently query the search space. During search, we harness the interpretable architecture features extracted by the WL kernel and learn their corresponding effects on the network performance based on the surrogate gradient information. Besides offering a new perspective on interpratability, our method also improves over the existing BO-based NAS approaches. To accommodate the popular cell-based search spaces, which are noncontinuous and graph-like (Zoph et al., 2018a; Ying et al., 2019; Dong and Yang, 2020) , current approaches either rely on encoding schemes (Ying et al., 2019; White et al., 2019) or manually designed similarity metrics (Kandasamy et al., 2018) , both of which are not scalable to large architectures and ignore the important topological structure of architectures. Another line of work employs graph neural networks (GNNs) to construct the BO surrogate (Ma et al., 2019; Zhang et al., 2019; Shi et al., 2019) ; however, the GNN design introduces additional hyperparameter tuning, and the training of the GNN also requires a large amount of architecture data, which is particularly

availability

available at https://github.com/xingchenwan/nasbowl 

