ACMP: ALLEN-CAHN MESSAGE PASSING WITH AT-TRACTIVE AND REPULSIVE FORCES FOR GRAPH NEU-RAL NETWORKS

Abstract

Neural message passing is a basic feature extraction unit for graph-structured data considering neighboring node features in network propagation from one layer to the next. We model such process by an interacting particle system with attractive and repulsive forces and the Allen-Cahn force arising in the modeling of phase transition. The dynamics of the system is a reaction-diffusion process which can separate particles without blowing up. This induces an Allen-Cahn message passing (ACMP) for graph neural networks where the numerical iteration for the particle system solution constitutes the message passing propagation. ACMP which has a simple implementation with a neural ODE solver can propel the network depth up to one hundred of layers with theoretically proven strictly positive lower bound of the Dirichlet energy. It thus provides a deep model of GNNs circumventing the common GNN problem of oversmoothing. GNNs with ACMP achieve state of the art performance for real-world node classification tasks on both homophilic and heterophilic datasets.

1. INTRODUCTION

Graph neural networks (GNNs) have received a great attention in the past five years due to its powerful expressiveness for learning graph structured data, with broad applications from recommendation systems to drug and protein designs (Atz et al., 2021; Baek et al., 2021; Bronstein et al., 2021; 2017; Gainza et al., 2020; Wu et al., 2020) . Neural message passing (Gilmer et al., 2017) serves as a fundamental feature extraction unit for graph-structured data that aggregates the features of neighbors in network propagation. We develop a GNN message passing, called the Allen-Cahn message passing (ACMP), using interacting particle dynamics, where nodes are particles and edges representing the interactions of particles. The system is driven by both attractive and repulsive forces, plus the Allen-Cahn double-well potential from phase transition modeling. This model is motivated by the behavior of the particle system of collective behaviors common in nature and human society, for example, insects forming swarms to work; birds forming flocks to immigrate; humans forming parties to express public opinions. Various mathematical models have been proposed to model these behaviors (Albi et al., 2019; Motsch & Tadmor, 2011; Castellano et al., 2009; Proskurnikov & Tempo, 2017; Degond & Motsch, 2008) . There are two major components in this model. First, while the attractive force forces all particles into one cluster, the repulsive forces allow particles to separate into two different clusters, which is essential to avoid oversmoothing. However, repulsive forces could make the Dirichlet energy diverge. We augment the model with the Allen-Cahn (Allen & Cahn, 1979) term (or Rayleigh friction (Rayleigh, 1894)), which is crucial in preventing the Dirichlet energy in the evolution from becoming unbounded, allowing us to prove mathematically that the lower bound of the Dirichlet energy is strictly bigger than zero, hence avoiding oversmoothing. Specifically, we will prove that under suitable conditions on the parameters, the dynamics of the ACMP particle system will time-asymptotically form 2 d different clusters and the Dirichlet energy has a strictly positive lower bound. The structure of ACMP can handle two problems in GNNs: oversmoothing and heterophily. Oversmoothing (Nt & Maehara, 2019; Oono & Suzuki, 2019; Konstantin Rusch et al., 2022) means that all node features become undistinguishable, and equivalently, in the formulation of particle systems, features form only one consensus. Heterophily problems means GNNs perform worse in heterophilic graphs (Lim et al., 2021; Yan et al., 2021) . It is due to the neighboring nodes of different classes are mistaken for the same class in GNNs like GCN and GAT. However, the presence of repulsion in ACMP makes particles separate into two different clusters, hence provides a simple and neat solution for prediction tasks on both two problems. Overall, the benefit of the Allen-Cahn message passing with repulsion is manifold. 1) It circumvents oversmoothing issue, namely the Dirichlet energy is bounded from below. 2) The network is stable in the sense that features and Dirichlet energy are bounded from above. 3) Feature smoothness (energy decreasing) and the balance between nodes features and edge features can be adjusted by network parameters that control the attraction, repulsion and phase transition. The model can then reach an acceptable trade-off on self-features and neighbor effect, as shown in Figure 1 . Our model can thus handle node classification tasks for both homophilic and heterophilic datasets by using only one-hop neighbour information. 4) The proposed model can be implemented by neural ODE solvers for the system with attractive and repulsive forces. In theory, we prove that Dirichlet energy of GNNs with ACMP has a lower bound above zero (limiting oversmoothing), as well as an upper bound (circumventing blow-up) under specific conditions. This agrees with the experimental results (Section 6). We also prove that ACMP is a process for the features to generate clusters thanks to the double-well potential, which provides an interpretable theory for node classification.

2. BACKGROUND

Message Passing in GNNs Graph neural networks are a kind of deep neural networks which take graph data as input. Neural Message Passing (MP) (Gilmer et al., 2017; Battaglia et al., 2018) is a most widely used propagator for node feature update in GNNs, which takes the following form: for



Figure 1: An illustration for one-step ACMP. Graph G t with features x(t) in the purple and green blocks have different treatment of attraction or repulsion. The same color indicates similar node features. The node x(t) is updated by one step to x(t + ∆t) via ODE solver. Nodes in the green block tend to attract each other and in the other block, nodes in different colors repel each other, and thus both colors are strengthened during propagation. It gives rise to forming bi-cluster flocking. The double-well potential turns features darker under gradient flow to circumvent blowup of energy.

availability

Codes are available at https://github.com/ykiiiiii/ACMP.

