GRAPH MLP-MIXER

Abstract

Graph Neural Networks (GNNs) have shown great potential in the field of graph representation learning. Standard GNNs define a local message-passing mechanism which propagates information over the whole graph domain by stacking multiple layers. This paradigm suffers from two major limitations, over-squashing and poor long-range dependencies, that can be solved using global attention but significantly increases the computational cost to quadratic complexity. In this work, we consider an alternative approach to overcome these structural limitations while keeping a low complexity cost. Motivated by the recent MLP-Mixer architecture introduced in computer vision, we propose to generalize this network to graphs. This GNN model, namely Graph MLP-Mixer, can make long-range connections without over-squashing or high complexity due to the mixer layer applied to the graph patches extracted from the original graph. As a result, this architecture exhibits promising results when comparing standard GNNs vs. Graph MLP-Mixers on benchmark graph datasets.

1. BACKGROUND AND MOTIVATION

In this section, we review the main classes of GNNs with their advantages and their limitations. Then, we introduce the ViT/MLP-Mixer architectures from computer vision which have motivated us to design a new graph network architecture. Message-Passing GNNs (MP-GNNs). GNNs have become the standard learning architectures for graphs based on their flexibility to work with complex data domains s.a. recommendation (Monti et al., 2017; van den Berg et al., 2018 ), chemistry (Duvenaud et al., 2015; Gilmer et al., 2017 ), physics (Cranmer et al., 2019; Bapst et al., 2020 ), transportation (Derrow-Pinion et al., 2021 ), vision (Han et al., 2022) , NLP (Wu et al., 2021a ), knowledge graphs (Schlichtkrull et al., 2018 ), drug design (Stokes et al., 2020; Gaudelet et al., 2020) and medical domain (Li et al., 2020b; 2021) 2018) that computes node representations by aggregating the local 1-hop neighborhood information. Second, a stack of L layers that aggregates L-hop neighborhood nodes to increase the expressivity of the network and transmit information between nodes that are L-hops apart. Weisfeiler-Leman GNNs (WL-GNNs). One of the major limitations of MP-GNNs is their inability to distinguish (simple) non-isomorphic graphs. This limited expressivity can be formally analyzed with the Weisfeiler-Leman graph isomorphism test (Weisfeiler & Leman, 1968) et al., 2019; Chen et al., 2019) . But to achieve such expressivity, this class of GNNs requires using k-tuples of nodes with memory and speed complexities of O(N k ), with N being the number of nodes and k ≥ 3. Although the complexity can be reduced to O(N 2 ) and O(N 3 ) respectively (Maron et al., 2019; Chen et al., 2019; Azizian & Lelarge, 2020) , it is still computationally costly compared to the linear complexity O(E) of MP-GNNs, which often reduces to O(N ) for real-world graphs that exhibit sparse structures s.a. molecules, knowledge graphs, transportation networks, gene regulatory networks, to name a few. In order to reduce memory and speed complexities of WL-GNNs while keeping high expressivity, several works have focused on designing graph networks from their sub-structures s.a. sub-graph isomorphism (Bouritsas et al., 2022) , sub-graph routing mechanism (Alsentzer et al., 2020 ), cellular WL sub-graphs (Bodnar et al., 2021) , expressive sub-



. Most GNNs are designed to have two core components. First, a structural message-passing mechanism s.a. Defferrard et al. (2016); Kipf & Welling (2017); Hamilton et al. (2017); Monti et al. (2017); Bresson & Laurent (2017); Gilmer et al. (2017); Veličković et al. (

, as first proposed in Xu et al. (2019); Morris et al. (2019). Later on, Maron et al. (2018) introduced a general class of k-order WL-GNNs that can be proved to universally represent any class of k-WL graphs (Maron

