ANTI-SYMMETRIC DGN: A STABLE ARCHITECTURE FOR DEEP GRAPH NETWORKS

Abstract

Deep Graph Networks (DGNs) currently dominate the research landscape of learning from graphs, due to their efficiency and ability to implement an adaptive message-passing scheme between the nodes. However, DGNs are typically limited in their ability to propagate and preserve long-term dependencies between nodes, i.e., they suffer from the over-squashing phenomena. This reduces their effectiveness, since predictive problems may require to capture interactions at different, and possibly large, radii in order to be effectively solved. In this work, we present Anti-Symmetric Deep Graph Networks (A-DGNs), a framework for stable and non-dissipative DGN design, conceived through the lens of ordinary differential equations. We give theoretical proof that our method is stable and non-dissipative, leading to two key results: long-range information between nodes is preserved, and no gradient vanishing or explosion occurs in training. We empirically validate the proposed approach on several graph benchmarks, showing that A-DGN leads to improved performance and enables to learn effectively even when dozens of layers are used.

1. INTRODUCTION

Representation learning for graphs has become one of the most prominent fields in machine learning. Such popularity derives from the ubiquitousness of graphs. Indeed, graphs are an extremely powerful tool to represent systems of relations and interactions and are extensively employed in many domains (Battaglia et al., 2016; Gilmer et al., 2017; Zitnik et al., 2018; Monti et al., 2019; Derrow-Pinion et al., 2021) . For example, they can model social networks, molecular structures, protein-protein interaction networks, recommender systems, and traffic networks. The primary challenge in this field is how we capture and encode structural information in the learning model. Common methods used in representation learning for graphs usually employ Deep Graph Networks (DGNs) (Bacciu et al., 2020; Wu et al., 2021) . DGNs are a family of learning models that learn a mapping function that compresses the complex relational information encoded in a graph into an information-rich feature vector that reflects both the topological and the label information in the original graph. As widely popular with neural networks, also DGNs consists of multiple layers. Each of them updates the node representations by aggregating previous node states and their neighbors, following a message passing paradigm. However, in some problems, the exploitation of local interactions between nodes is not enough to learn representative embeddings. In this scenario, it is often the case that the DGN needs to capture information concerning interactions between nodes that are far away in the graph, i.e., by stacking multiple layers. A specific predictive problem typically needs to consider a specific range of node interactions in order to be effectively solved, hence requiring a specific number (possibly large) of DGN layers.

