EXTENDING GRAPH TRANSFORMERS WITH QUANTUM COMPUTED AGGREGATION

Abstract

Recently, efforts have been made in the community to design new Graph Neural Networks (GNN), as limitations of Message Passing Neural Networks became more apparent. This led to the appearance of Graph Transformers using global graph features such as Laplacian Eigenmaps. In our paper, we introduce a GNN architecture where the aggregation weights are computed using the long-range correlations of a quantum system. These correlations are generated by translating the graph topology into the interactions of a set of qubits in a quantum computer. The recent development of quantum processing units enables the computation of a new family of global graph features that would be otherwise out of reach for classical hardware. We give some theoretical insights about the potential benefits of this approach, and benchmark our algorithm on standard datasets. Although not being adapted to all datasets, our model performs similarly to standard GNN architectures, and paves a promising future for quantum enhanced GNNs.

1. INTRODUCTION

Graph machine learning is an expanding field of research with applications in chemistry (Gilmer et al., 2017) , biology (Zitnik et al., 2018) , drug design (Konaklieva, 2014) , social networks (Scott, 2011 ), computer vision (Harchaoui & Bach, 2007) , science (Sanchez-Gonzalez et al., 2020) . In the past few years, much effort has been put into the design of Graph Neural Networks (GNN) (Hamilton). The goal is to learn a vector representation of the nodes while incorporating information about the graph. The learned information is then processed according to the original problem. The dominating approach for designing GNNs have been Message Passing Neural Networks (MPNN) (Gilmer et al., 2017) . At each layer of MPNNs, a linear layer is applied to the node features, then for each node, the feature vectors of the direct neighbors are aggregated and the result is added to the node feature vector. The way the aggregation is performed differentiates a variety of architectures among which GCN (Kipf & Welling, 2016) , SAGE (Hamilton et al., 2018) , GAT (Veličković et al., 2018) , GIN (Xu et al., 2018) . Despite some successes, it has been shown that MPNNs suffer several flaws. First and foremost, their theoretical expressivity is related to the Weisfeiler-Lehman (WL) test. It means that two graphs who are indistinguishable via the WL test (example in 1a) will lead to the same MPNN output (Morris et al., 2019) . This can cause several problems because two different substructures will not be differentiated. This is especially true in chemistry where the graphs represent different molecules (graphs in 1a could represent two molecules). MPNNs also perform best with homophilic data and seem to fail on heterophilic graphs (Zhu et al., 2020) . Homophilic graphs mean that two nodes have the same labels if they are close to each other in the graph, which is not necessarily the case. Finally, MPNNs suffer from oversmoothing (Chen et al., 2020) and oversquashing (Topping et al., 2021) . Oversmoothing means that the output features of all nodes will converge to the same value as the number of layers increases. Oversquashing occurs when few links on the graph separates two dense clusters of nodes. The information that circulates through these links is then an aggregation of many nodes and is much poorer compared to the information initially present. Solutions to circumvent those issues are currently investigated by the community. The main idea is not to limit the aggregation to the neighbors, but to include the whole graph, or a larger part of it. Graph Transformers were created in this spirit with success on standard benchmarks (Ying et al., 2021; Rampášek et al., 2022) . Similarly to the famous transformer architecture, an aggregation

