THE GRAPH LEARNING ATTENTION MECHANISM: LEARNABLE SPARSIFICATION WITHOUT HEURISTICS

Abstract

Graph Neural Networks (GNNs) are local aggregators that derive their expressive power from their sensitivity to network structure. However, this sensitivity comes at a cost: noisy edges degrade performance. In response, many GNNs include edge-weighting mechanisms that scale the contribution of each edge in the aggregation step. However, to account for neighborhoods of varying size, nodeembedding mechanisms must normalize these edge-weights across each neighborhood. As such, the impact of noisy edges cannot be eliminated without removing those edges altogether. Motivated by this issue, we introduce the Graph Learning Attention Mechanism (GLAM): a drop-in, differentiable structure learning layer for GNNs that separates the distinct tasks of structure learning and node embedding. In contrast to existing graph learning approaches, GLAM does not require the addition of exogenous structural regularizers or edge-selection heuristics to learn optimal graph structures. In experiments on citation and co-purchase datasets, we demonstrate that our approach can match state of the art semisupervised node classification accuracies while inducing an order of magnitude greater sparsity than existing graph learning methods.

1. INTRODUCTION

Local interactions govern the properties of nearly all complex systems, from protein folding and cellular proliferation to group dynamics and financial markets (Stocker et al., 1996; Doyle et al., 1997; Mathur, 2006; Özgür, 2011; Jiang et al., 2014) . When modeling such systems, representing interactions explicitly in the form of a graph can improve model performance dramatically, both at the local and global level. Graph Neural Networks (GNNs) are designed to operate on such graphstructured data and have quickly become state of the art in a host of structured domains (Wu et al., 2019) . However, GNN models rely heavily on the provided graph structures representing meaningful relations, for example, the bonds between atoms in a molecule (Fang et al., 2022) . Additionally, to generate useful node embeddings, GNNs employ permutation invariant neighborhood aggregation functions which implicitly assume that neighborhoods satisfy certain homogeneity properties (Zhu et al., 2020) . If noisy edges are introduced, or if the neighborhood assumptions are not met, GNN performance suffers. To address both issues simultaneously, many GNNs include mechanisms for learning edge weights which scale the influence of the features on neighboring nodes in the aggregation step. The Graph Attention Network (GAT) (Veličković et al., 2017) , for example, adapts the typical attention mechanism (Vaswani et al., 2017) to the graph setting, learning attention coefficients between adjacent nodes in the graph as opposed to tokens in a sequence. As we will show in Section 3, the demands of edge weighting (or structure learning) inherently conflict with those of node embedding, and edge weighting mechanisms that are joined together with node embedding mechanisms are not capable of eliminating the negative impact of noisy edges on their own. In this paper, we introduce a method for separating the distinct tasks of structure learning and node embedding in GNNs. Our method takes the form of a structure learning layer that can be placed in front of existing GNN layers to learn task-informed graph structures that optimize performance on the downstream task. Our primary contributions are as follows:

