LEARNING PARAMETRISED GRAPH SHIFT OPERATORS

Abstract

In many domains data is currently represented as graphs and therefore, the graph representation of this data becomes increasingly important in machine learning. Network data is, implicitly or explicitly, always represented using a graph shift operator (GSO) with the most common choices being the adjacency, Laplacian matrices and their normalisations. In this paper, a novel parametrised GSO (PGSO) is proposed, where specific parameter values result in the most commonly used GSOs and message-passing operators in graph neural network (GNN) frameworks. The PGSO is suggested as a replacement of the standard GSOs that are used in state-of-the-art GNN architectures and the optimisation of the PGSO parameters is seamlessly included in the model training. It is proved that the PGSO has real eigenvalues and a set of real eigenvectors independent of the parameter values and spectral bounds on the PGSO are derived. PGSO parameters are shown to adapt to the sparsity of the graph structure in a study on stochastic blockmodel networks, where they are found to automatically replicate the GSO regularisation found in the literature. On several real-world datasets the accuracy of state-of-theart GNN architectures is improved by the inclusion of the PGSO in both nodeand graph-classification tasks.

1. INTRODUCTION

Real-world data and applications often involve significant structural complexity and as a consequence graph representation learning attracts great research interest (Hamilton et al., 2017b; Wu et al., 2020) . The topology of the observations plays a central role when performing machine learning tasks on graph structured data. A variety of supervised, semi-supervised or unsupervised graph learning algorithms employ different forms of operators that encode the topology of these observations. The most commonly used operators are the adjacency matrix, the Laplacian matrix and their normalised variants. All of these matrices belong to a general set of linear operators, the Graph Shift Operators (GSOs) (Sandryhaila & Moura, 2013; Mateos et al., 2019) . Graph Neural Networks (GNNs), the main application domain in this paper, are representative cases of algorithms that use chosen GSOs to encode the graph structure, i.e., to encode neighbourhoods used in the aggregation operators. Several GNN models (Kipf & Welling, 2017; Hamilton et al., 2017a; Xu et al., 2019) choose different variants of normalised adjacency matrices as GSOs. Interestingly, in a variety of tasks and datasets, the incorporation of explicit structural information of neighbourhoods into the model is found to improve results (Pei et al., 2020; Zhang & Chen, 2018; You et al., 2019) , leading us to conclude that the chosen GSO is not entirely capturing the information of the data topology. In most of these approaches, the GSO is chosen without an analysis of the impact of this choice of representation. From this observation arise our two research questions. Question 1: Is there a single optimal representation to encode graph structures or is the optimal representation task-and data-dependent? On different tasks and datasets, the choice between the different representations encoded by the different graph shift operator matrices has shown to be a consequential decision. Due to the past successful approaches that use different GSOs for different tasks and datasets, it is natural to assume that there is no single optimal representation for all scenarios. Finding an optimal representation of network data could contribute positively to a range of learning tasks such as node and graph classification or community detection. Fundamental to this search is an answer to Question 1. In addition, we pose the following second research question. Question 2: Can we learn such an optimal representation to encode graph structure in a numerically stable and computationally efficient way? The utilisation of a GSO as a topology representation is currently a hand-engineered choice of normalised variants of the adjacency matrix. Thus, the learnable representation of node interactions is transferred into either convolutional filters (Kipf & Welling, 2017; Hamilton et al., 2017a) or attention weights (Veličković et al., 2018) , keeping the used GSO constant. In this work, we suggest a parametrisation of the GSO. Specific parameter values in our proposed parametrised (and differentiable) GSO result in the most commonly used GSOs, namely the adjacency, unnormalised Laplacian and both normalised Laplacian matrices, and GNN aggregation functions, e.g., the averaging and summation message passing operations. The beauty of this innovation is that it can be seamlessly included in both message passing and convolutional GNN architectures. Optimising the operator parameters will allow us to find answers to our two research questions. The remainder of this paper is organised as follows. In Section 2, we give an overview of related work in the literature. Then in Section 3, we define our parametrised graph shift operator (PGSO) and discuss how it can be incorporated into many state-of-the-art GNN architectures. This is followed by a spectral analysis of our PGSO in Section 4, where we observe good numerical stability in practice. In Section 5, we analyse the performance of GNN architectures augmented by the PGSO in a node classification task on a set of stochastic blockmodel graphs with varying sparsity and on learning tasks performed on several real-world datasets.

2. RELATED WORK

GSOs emerge in different research fields such as in physics, network science, computer science and mathematics, taking usually the form of either graph Laplacian normalisations or variants of the adjacency matrix. In an abundant number of machine learning applications the expressivity of GSOs is exploited, e.g., in unsupervised learning (von Luxburg, 2007; Kim et al., 2008) , semisupervised node classification on graph-structured data (Kipf & Welling, 2017; Schlichtkrull et al., 2018) and supervised learning on computer vision tasks (Chang & Yeung, 2006) . The majority of these works assumes a specified normalised version of the Laplacian that encodes the structural information of the problem and usually these versions differ depending on the analysed dataset and the end-user task. Recently, new findings on the impact of the chosen Laplacian representation have emerged that highlight the contribution of Laplacian regularisation (Dall'Amico et al., 2020; Saade et al., 2014; Dall'Amico et al., 2019) . The different GSO choices in different tasks indicate a data-dependent relation between the structure of the data and its optimal GSO representation. This observation motivates us to investigate how beneficial a well-chosen GSO can be for a learning task on structured data. GNNs use a variety of GSOs to encode neighbourhood topologies, either normalisations of the adjacency matrix (Xu et al., 2019; Hamilton et al., 2017a) or normalisations of the graph Laplacian (Kipf & Welling, 2017; Wu et al., 2019) . Due to the efficiency and the predictive performance of GNNs, a research interest has recently emerged in their expressive power. One of the examined aspects is that of the equivalence of the GNNs' expressive power with that of the Weisfeiler-Lehman graph isomorphism test (Dasoulas et al., 2020; Maron et al., 2019; Morris et al., 2019; Xu et al., 2019) . Another research direction is that of analysing the depth and the width of GNNs, moving one step forward to the design of deep GNNs (Loukas, 2020; Li et al., 2018; Liu et al., 2020; Alon & Yahav, 2020) . In this analysis, the authors study phenomena of Laplacian oversmoothing and combinatorial oversquashing, that harm the expressiveness of GNNs. In most of these approaches, however, the used GSO is fixed without a motivation of the choice. We hope that the parametrised GSO that is presented in this work can contribute positively to the expressivity analysis of GNNs.

