FASTER HYPERPARAMETER SEARCH FOR GNNS VIA CALIBRATED DATASET CONDENSATION

Abstract

Dataset condensation aims to reduce the computational cost of training multiple models on a large dataset by condensing the training dataset into a small synthetic set. State-of-the-art approaches rely on matching the model gradients for the real and synthetic data and have recently been applied to condense large-scale graphs for node classification tasks. Although dataset condensation may be efficient when training multiple models for hyperparameter optimization, there is no theoretical guarantee on the generalizability of the condensed data: data condensation often generalizes poorly across hyperparameters/architectures in practice, while we find and prove this overfitting is much more severe on graphs. In this paper, we consider a different condensation objective specifically geared towards hyperparameter search. We aim to generate the synthetic dataset so that the validation-performance rankings of the models, with different hyperparameters, on the condensed and original datasets are comparable. We propose a novel hyperparameter-calibrated dataset condensation (HCDC) algorithm, which obtains the synthetic validation data by matching the hyperparameter gradients computed via implicit differentiation and efficient inverse Hessian approximation. HCDC employs a supernet with differentiable hyperparameters, making it suitable for modeling GNNs with widely different convolution filters. Experiments demonstrate that the proposed framework effectively maintains the validation-performance rankings of GNNs and speeds up hyperparameter/architecture search on graphs.

1. INTRODUCTION

Graph neural networks (GNNs) have found remarkable success in tackling a variety of graph-related tasks (Hamilton, 2020) . However, the prevalence of large-scale graphs in real-world contexts, such as social, information, and biological networks (Hu et al., 2020) , which frequently scale up to millions/billions of nodes and edges, poses significant computational issues for training GNNs. While training a single model can be expensive, designing deep learning models for new tasks requires substantially more computation, as this involves training multiple models on the same dataset many times to verify the design choice (e.g., the architecture and hyperparameter choice (Elsken et al., 2019) ). Towards this end, we consider the following question: how can we reduce the computational cost for training multiple models on the same dataset, for hyperparameter search/optimization? Natural approaches to reduce the training set size include methods such as graph coreset selection (Baker et al., 2020 ), graph sparsification (Batson et al., 2013) , graph coarsening (Loukas, 2019) and graph sampling (Zeng et al., 2019) . However, all of these methods involve selecting samples from the given training set, which limits the performance. A more effective alternative is to synthesize informative samples rather than select from the given samples. Dataset condensation (Zhao et al., 2020) has emerged as a competent data mechanism to synthesize data, with promising results. It aims to produce a small synthetic training set such that a model trained on the synthetic set obtains testing accuracy comparable to that trained on the original training set. Although dataset condensation achieves the state-of-the-art performance for neural networks trained on condensed samples, this technique is inadequate for accelerating hyperparameter search/optimization, as: (1) theoretically, dataset condensation obtains synthetic samples that minimize the performance drop of a specific model; however, there is no performance guarantee when using this condensed data to train other models, and (2) in practice, it is unclear how condensation 1

