GRAPH NEURAL NETWORK-INSPIRED KERNELS FOR GAUSSIAN PROCESSES IN SEMI-SUPERVISED LEARN-ING

Abstract

Gaussian processes (GPs) are an attractive class of machine learning models because of their simplicity and flexibility as building blocks of more complex Bayesian models. Meanwhile, graph neural networks (GNNs) emerged recently as a promising class of models for graph-structured data in semi-supervised learning and beyond. Their competitive performance is often attributed to a proper capturing of the graph inductive bias. In this work, we introduce this inductive bias into GPs to improve their predictive performance for graph-structured data. We show that a prominent example of GNNs, the graph convolutional network, is equivalent to some GP when its layers are infinitely wide; and we analyze the kernel universality and the limiting behavior in depth. We further present a programmable procedure to compose covariance kernels inspired by this equivalence and derive example kernels corresponding to several interesting members of the GNN family. We also propose a computationally efficient approximation of the covariance matrix for scalable posterior inference with large-scale data. We demonstrate that these graph-based kernels lead to competitive classification and regression performance, as well as advantages in computation time, compared with the respective GNNs.

1. INTRODUCTION

Gaussian processes (GPs) (Rasmussen & Williams, 2006) are widely used in machine learning, uncertainty quantification, and global optimization. In the Bayesian setting, a GP serves as a prior probability distribution over functions, characterized by a mean (often treated as zero for simplicity) and a covariance. Conditioned on observed data with a Gaussian likelihood, the random function admits a posterior distribution that is also Gaussian, whose mean is used for prediction and the variance serves as an uncertainty measure. The closed-form posterior allows for exact Bayesian inference, resulting in great attractiveness and wide usage of GPs. The success of GPs in practice depends on two factors: the observations (training data) and the covariance kernel. We are interested in semi-supervised learning, where only a small amount of data is labeled while a large amount of unlabeled data can be used together for training (Zhu, 2008) . In recent years, graph neural networks (GNNs) (Zhou et al., 2020; Wu et al., 2021) emerged as a promising class of models for this problem, when the labeled and unlabeled data are connected by a graph. The graph structure becomes an important inductive bias that leads to the success of GNNs. This inductive bias inspires us to design a GP model under limited observations, by building the graph structure into the covariance kernel. An intimate relationship between neural networks and GPs is known: a neural network with fully connected layers, equipped with a prior probability distribution on the weights and biases, converges to a GP when each of its layers is infinitely wide (Lee et al., 2018; de G. Matthews et al., 2018) . Such a result is owing to the central limit theorem (Neal, 1994; Williams, 1996) and the GP covariance

