ASYNCHRONOUS MESSAGE PASSING: A NEW FRAMEWORK FOR LEARNING IN GRAPHS Anonymous authors Paper under double-blind review

Abstract

This paper studies asynchronous message passing (AMP), a new framework for applying neural networks to graphs. Existing graph neural networks (GNNs) use the message passing framework which is based on the synchronous distributed computing model. In traditional GNNs, nodes aggregate their neighbors in each round, which causes problems such as oversmoothing and expressiveness limitations. On the other hand, our AMP framework is based on the asynchronous model, where nodes react to messages of their neighbors individually. We prove: (i) AMP is at least as powerful as the message passing framework, (ii) AMP is more powerful than the 1-WL test for graph isomorphism, an important benchmark for message passing GNNs, and (iii) in theory AMP can even separate any pair of graphs and compute graph isomorphism. We experimentally validate the findings on AMP's expressiveness, and show that AMP might be better suited to propagate messages over large distances in graphs. We also demonstrate that AMP performs well on several graph classification benchmarks.

1. INTRODUCTION

Graph Neural Networks (GNNs) have become the de-facto standard model for applying neural networks to graphs in many domains (Bian et al., 2020; Gilmer et al., 2017; Hamilton et al., 2017; Jumper et al., 2021; Kipf & Welling, 2017; Veličković et al., 2018; Wu et al., 2020) . Internally, nodes in GNNs use the message passing framework, i.e., nodes communicate with their neighboring nodes for multiple synchronous rounds. We believe that this style of communication is not always ideal. In GNNs, all nodes speak concurrently, and a node does not listen to individual neighbors but only to an aggregated message of all neighbors. In contrast, humans politely listen when a neighbor speaks, then decide whether the information was relevant, and what information to pass on. The way humans communicate is in line with the asynchronous communication model (Peleg, 2000) . In the asynchronous model, nodes do not communicate concurrently. In fact, a node only acts when it receives a message (or when initialized). If a node receives a new message from one of its neighbors, it updates its state, and then potentially sends a message on its own. This allows nodes to listen to individual neighbors and not only to aggregations. Figure 1 illustrates how this interaction can play out. Figure 1 : Detection of an alcohol (a C atom with an OH group) with AMP. H atoms learned to initially send a message to their neighbors. Every node can choose to ignore the message or react to it. The C atom is not interested in H neighbors and discards the message. On the other hand, the O atom reacts and sends a message on its own. This message is now relevant to the C atom. We make the following contributions. • We introduce AMP, a new framework for learning neural architectures in graphs. Instead of nodes acting synchronously in rounds, nodes in AMP interact asynchronously by exchanging and reacting to individual messages. • We theoretically examine AMP and prove that the AMP framework is at least as powerful as the synchronous message passing network. Furthermore, we show that AMP can separate graphs beyond the 1-WL test; conceptually AMP can solve graph isomorphism. • We examine how AMP can transmit information to far-away nodes. Since AMP handles messages individually and is not limited by the number of communication rounds, AMP can combat the underreaching, oversmoothing, and oversquashing problems that traditional GNNs encounter when propagating information over long distances (many layers). • We run experiments on (i) established GNN expressiveness benchmarks to demonstrate that AMP outperforms all existing methods in distinguishing graphs beyond the 1-WL algorithm. We introduce (ii) synthetic datasets to show that AMP is well suited to propagate information over large distances. Finally, we study (iii) established graph classification benchmarks to show that AMP performs comparably to existing GNNs.

2. RELATED WORK

Apart from some domains with special graph families (for example processing directed acyclic graphs sequentially (Amizadeh et al., 2018)), virtually all GNNs follow the synchronous message passing framework of distributed computing, first suggested by Gilmer et al. ( 2017) and Battaglia et al. (2018) . The underlying idea is that nodes have an embedding and operate in rounds. In each round, every node computes a message and passes the message to every adjacent node. Then, every node aggregates the messages it receives and uses the aggregation to update its embedding. There exist variations of this framework. For example, edges can also have embeddings, or one can add a global sharing node to allow far away nodes to directly share information (Battaglia et al., 2018) . Following the initial work of Scarselli et al. (2008) , different implementations for the individual steps in the message passing framework exist, e.g., (Brody et al., 2022; Hamilton et al., 2017; Kipf & Welling, 2017; Niepert et al., 2016; Veličković et al., 2018; Xu et al., 2018; 2019) . However, these GNN architectures all experience common problems: Oversmoothing. A problem that quickly emerged with GNNs is that we cannot have many GNN layers (Li et al., 2019; 2018) . Each layer averages and hence smooths the neighborhood information and the node's features. This effect leads to features converging after some layers (Oono & Suzuki, 2020), which is known as the oversmoothing problem. Several works address the oversmoothing problem, for example by sampling nodes and edges to use in message passing (Feng et al., 2020; Hasanzadeh et al., 2020; Rong et al., 2020) , leveraging skip connections (Chen et al., 2020b; Xu et al., 2018) , or additional regularization terms (Chen et al., 2020a; Zhao & Akoglu, 2020; Zhou et al., 2020) . Thanks to its asynchrony, AMP does not average over neighborhood messages. This helps preserving the identity of individual messages and makes AMP more resilient against the oversmoothing problem. Underreaching. Using normal GNN layers, a GNN with k layers only learns about nodes at most k hops away. A node cannot act correctly if it would need information that is k + 1 hops away. This problem is called underreaching (Barceló et al., 2020) . There exist countermeasures, for example, having a global exchange of features (Gilmer et al., 2017; Wu et al., 2021) or spreading information using diffusion processes (Klicpera et al., 2019; Scarselli et al., 2008) . Methods that help against oversmoothing are usually also applied against underreaching since we can use more layers and increase the neighborhood size. In AMP, because of asynchrony, some nodes can be involved in the computation/communication much more often than others; this helps AMP to gather information from further away, which is a countermeasure against underreaching. Oversquashing. In many graphs, the size of k-hop neighborhoods grows substantially with k. This requires squashing more and more information into a node embedding of static size. Eventually, this leads to the congestion problem (too much information having to pass through a bottleneck) that is well known in distributed computing (e.g. (Sarma et al., 2012) ) and goes by the name of oversquashing for GNNs (Alon & Yahav, 2021; Topping et al., 2022) . One approach to solve oversquashing is

