REVISITING EMBEDDINGS FOR GRAPH NEURAL NET-WORKS

Abstract

Current graph representation learning techniques use Graph Neural Networks (GNNs) to extract features from dataset embeddings. In this work, we examine the quality of these embeddings and assess how changing them can affect the accuracy of GNNs. We explore different embedding extraction techniques for both images and texts; and find that the choice of embedding biases the performance of different GNN architectures and thus the choice of embedding influences the selection of GNNs regardless of the underlying dataset. In addition, we only see an improvement in accuracy from some GNN models compared to the accuracy of models trained from scratch or fine-tuned on the underlying data without utilising the graph connections. As an alternative, we propose Graph-connected Network (GraNet) layers to better leverage existing unconnected models within a GNN. Existing language and vision models are thus improved by allowing neighbourhood aggregation. This gives a chance for the model to use pre-trained weights, if possible, and we demonstrate that this approach improves the accuracy compared to traditional GNNs: on Flickr v2, GraNet beats GAT2 and GraphSAGE by 7.7% and 1.7% respectively.

1. INTRODUCTION

Graph Neural Networks (GNNs) have been successful on a wide array of applications ranging from computational biology (Zitnik & Leskovec, 2017) to social networks (Hamilton et al., 2017) . The input for GNNs, although sourced from many different domains, is often data that has been preprocessed to a computationally digestible format. These digestible formats are commonly known as embeddings. Currently, improvements made to GNN architecture are tested against these embeddings and the state of the art is determined based on those results. However, this does not necessarily correlate with the GNNs accuracy on the underlying dataset and ignores the influence that the source and style of these embeddings have on the performance of particular GNN architectures. To test existing GNN architectures, and demonstrate the importance of the embeddings used in training them, we provide three new datasets each with a set of embeddings generated using different methods. We further analyse the benefit of using GNNs on fixed embeddings. We compare GNNs to standard models that have been trained or fine-tuned on the target raw data; these models treat each data point as unconnected, ignoring the underlying graph information in data. This simple unconnected baseline surprisingly outperforms some strong GNN models. This then prompts the question: Will mixing the two approaches unlock the classification power of existing unconnected models by allowing them to utilize the graph structure in our data? Based on the question above, we propose a new method of mixing GNNs with unconnected models, allowing them to train simultaneously. To achieve this we introduce a variation of the standard message passing framework. With this new framework a subset of the unconnected model's layers can each be graph-connected -exploiting useful graph structure information during the forward pass. We demonstrate that this new approach improves the accuracy of using only a pre-trained or fine-tuned model and outperforms a stand-alone GNN on a fixed embedding. We call this new approach GraNet (Graph-connected Network), and in summary, this paper has the following contributions: • We provide new datasets and a rich set of accompanying embeddings to better test the performance of GNNs. • We empirically demonstrate that only some existing GNNs improve on unconnected model accuracy and those that do vary depending on the embeddings used. We urge that unconnected models be used as a baseline for assessing GNN performance. • We provide a new method, named GraNet, that combines GNNs and models (fine-tuned or trained from scratch) to efficiently exploit the graph structure in raw data. • We empirically show that GraNet outperforms both unconnected models (the strong baseline) and GNNs on a range of datasets and accompanying embeddings.

2. RELATED WORK

Graph Augmented Networks Chen et al. ( 2021) introduce Graph-augmented MultiLayer Perceptrons (GA-MLPs) as a simplified alternative to Graph Neural Netwroks (GNNs). These models involve a two step process -augmenting the node features of the graph based on the topology and then using these node features applying a learnable function at the node level. This allows a fixed graph operator and two sets of MultiLayer Perceptrons (MLPs) be used to extract features from the graph. This approach is related to similar simplified GNN techniques (Wu et al., 2019; Nt & Maehara, 2019) . The paper proves that this simplified approach is not as expressive as standard GNNs when looking at the Weisfeiler-Lehman test for distinguishing non-isomorphic graphs. This suggests that GNNs are well suited for inferring information based on graph structure but the paper does not comment on which approach is best in practice. We differ in our approach to augmenting networks with graph structure by using existing GNNs and do not attempt to simplify the network. We do provide a graph-connected MLP but this looks at adding message passing to MLPs rather than separte funcitons on the graph data. Effect of training on GNN performance Shchur et al. ( 2018) look at the effect of hyperparameters and training in GNNs to show that these have dramatic effect on model ordering. Simply changing the split on a dataset caused large changes in accuracy and which GNN performed best, even though the hyperparameters of the GNNs remained constant. We show similar large difference when considering different embeddings with the same splits across embeddings. Ablation studies on GNNs Further to these discoveries Nt & Maehara (2019) demonstrate that GNNs only utilise the graph structure to de-noise already highly informative features. They go as far as to demonstrate in certain conditions GNNs and MLPs perform the same. Chen et al. ( 2019) demonstrate that linearising the graph filter stage of GNNs does not hinder but actually increases the performance. Similarly Wu et al. (2019) simplify GNNs by removing non-linearity between layers allow for pre-computing the k message passes. This reduces graph representation learning to a simple linear regression. In all of these cases they demonstrate that the major contribution of GNNs is in their graph structure capabilites. We do not analyse these aspects but look at how this capability can be used in existing unconnected networks. We compare our new method (GraNet) against some standard Graph Neural Networks to demonstrate the improvements that GraNet makes in classifying datasets.



An overview of popular datasets

