EXTENDING GRAPH TRANSFORMERS WITH QUANTUM COMPUTED AGGREGATION

Abstract

Recently, efforts have been made in the community to design new Graph Neural Networks (GNN), as limitations of Message Passing Neural Networks became more apparent. This led to the appearance of Graph Transformers using global graph features such as Laplacian Eigenmaps. In our paper, we introduce a GNN architecture where the aggregation weights are computed using the long-range correlations of a quantum system. These correlations are generated by translating the graph topology into the interactions of a set of qubits in a quantum computer. The recent development of quantum processing units enables the computation of a new family of global graph features that would be otherwise out of reach for classical hardware. We give some theoretical insights about the potential benefits of this approach, and benchmark our algorithm on standard datasets. Although not being adapted to all datasets, our model performs similarly to standard GNN architectures, and paves a promising future for quantum enhanced GNNs.

1. INTRODUCTION

Graph machine learning is an expanding field of research with applications in chemistry (Gilmer et al., 2017) , biology (Zitnik et al., 2018) , drug design (Konaklieva, 2014) , social networks (Scott, 2011 ), computer vision (Harchaoui & Bach, 2007) , science (Sanchez-Gonzalez et al., 2020) . In the past few years, much effort has been put into the design of Graph Neural Networks (GNN) (Hamilton) . The goal is to learn a vector representation of the nodes while incorporating information about the graph. The learned information is then processed according to the original problem. The dominating approach for designing GNNs have been Message Passing Neural Networks (MPNN) (Gilmer et al., 2017) . At each layer of MPNNs, a linear layer is applied to the node features, then for each node, the feature vectors of the direct neighbors are aggregated and the result is added to the node feature vector. The way the aggregation is performed differentiates a variety of architectures among which GCN (Kipf & Welling, 2016) , SAGE (Hamilton et al., 2018) , GAT (Veličković et al., 2018) , GIN (Xu et al., 2018) . Despite some successes, it has been shown that MPNNs suffer several flaws. First and foremost, their theoretical expressivity is related to the Weisfeiler-Lehman (WL) test. It means that two graphs who are indistinguishable via the WL test (example in 1a) will lead to the same MPNN output (Morris et al., 2019) . This can cause several problems because two different substructures will not be differentiated. This is especially true in chemistry where the graphs represent different molecules (graphs in 1a could represent two molecules). MPNNs also perform best with homophilic data and seem to fail on heterophilic graphs (Zhu et al., 2020) . Homophilic graphs mean that two nodes have the same labels if they are close to each other in the graph, which is not necessarily the case. Finally, MPNNs suffer from oversmoothing (Chen et al., 2020) and oversquashing (Topping et al., 2021) . Oversmoothing means that the output features of all nodes will converge to the same value as the number of layers increases. Oversquashing occurs when few links on the graph separates two dense clusters of nodes. The information that circulates through these links is then an aggregation of many nodes and is much poorer compared to the information initially present. Solutions to circumvent those issues are currently investigated by the community. The main idea is not to limit the aggregation to the neighbors, but to include the whole graph, or a larger part of it. Graph Transformers were created in this spirit with success on standard benchmarks (Ying et al., 2021; Rampášek et al., 2022) . Similarly to the famous transformer architecture, an aggregation rule is provided to every pair of nodes in the graph with incorporation of global structural features. Examples of global structural features are Laplacian Eigenmaps (Kreuzer et al., 2021) , eigenvectors of the Laplacian matrix. The goal of this work is to explore new types of global structural features emerging from quantum physics that can be added to a GNN. The rapid development of quantum computers during the previous years provides the opportunity to compute features that would be otherwise intractable. These features contain complex topological characteristics of the graph, and including them could improve the quality of the model, the training or inference time, or the energy consumption. The paper is organized as follow. Section 2 provides elements about quantum mechanics for the unfamiliar reader and details how to construct a quantum state from a graph. Section 3 provides theoretical insights on why quantum states can provide relevant information that is hard to compute with a classical computer. Section 4 details our main proposal, a GNN architecture with quantum correlations. Section 5 provides a summary on how our work fits in the current literature. Section 6 describes the numerical experiments.

2.1. QUANTUM INFORMATION PROCESSING

We provide in this subsection the basics of quantum information processing and quantum dynamics for the unfamiliar reader. More details can be found in (Nielsen & Chuang, 2002; Henriet et al., 2020) . A quantum state of a system is a unitary complex vector whose module square of individual entries represent the probability of the system to be in each of these individual entries states. The systems that are often considered are sets of individual two levels systems called qubits whose states are denoted |0⟩ and |1⟩. The state of an individual qubit is represented by the complex vector α β with |α| 2 + |β| 2 = 1, also noted α |0⟩ + β |1⟩. A system of N qubits is a vector from C 2 N noted |ψ⟩ = 2 N -1 i=0 a i |i⟩, i being associated to a bitstring of size N . |i⟩ i is referred as the computational basis of the system. The conjugate transpose of |ψ⟩ is noted ⟨ψ|. A quantum state can be modified by an operator U which is a complex unitary matrix of size 2 N × 2 N . Quantum dynamics follow the Schrödinger equation -i d |ψ⟩ dt = Ĥ(t) |ψ⟩ where Ĥ(t) is the hamiltonian of the dynamic, a complex hermitian matrix of size 2 N × 2 N . The solution to the equation for some time T is |ψ(T )⟩ = U (T ) |ψ(0)⟩ where U (T ) = exp -i T 0 Ĥ t)dt is the time evolution operator. Classical information can be extracted from a quantum state by measuring the expectation value of an observable. An observable Ô is a complex hermitian matrix of size 2 N × 2 N , and its expectation value on the quantum state |ψ⟩ is the scalar ⟨ψ| Ô |ψ⟩. We introduce some notations that will be used along the paper. The Pauli matrices are the following I = 1 0 0 1 , X = 0 1 1 0 , Y = 0 -i i 0 , Z = 1 0 0 -1 A Pauli string of size N is a hermitian complex matrix of size 2 N × 2 N equal to the tensor product, or Kronecker product of N Pauli matrices. In the rest of this paper, we will note Pauli strings by their non-trivial Pauli operations and the qubit they act on, with indexing going from right to left. For instance, in a system of 5 qubits, X 2 Y 1 = I ⊗ I ⊗ X ⊗ Y ⊗ I.

