AN EXPLORATION OF CONDITIONING METHODS IN GRAPH NEURAL NETWORKS

Abstract

The flexibility and effectiveness of message passing based graph neural networks (GNNs) induced considerable advances in deep learning on graph-structured data. In such approaches, GNNs recursively update node representations based on their neighbors and they gain expressivity through the use of node and edge attribute vectors. E.g., in computational tasks such as physics and chemistry usage of edge attributes such as relative position or distance proved to be essential. In this work, we address not what kind of attributes to use, but how to condition on this information to improve model performance. We consider three types of conditioning; weak, strong, and pure, which respectively relate to concatenation-based conditioning, gating, and transformations that are causally dependent on the attributes. This categorization provides a unifying viewpoint on different classes of GNNs, from separable convolutions to various forms of message passing networks. We provide an empirical study on the effect of conditioning methods in several tasks in computational chemistry.

1. INTRODUCTION

Graph neural networks (GNNs) are a family of neural networks that can learn from graph-structured data. Starting with the success of GCN (Kipf & Welling, 2016) in achieving state-of-the-art performance on semi-supervised classification, several variants of GNNs have been developed for this task, including Graph-SAGE (Hamilton et al., 2017) , GAT (Veličković et al., 2017 ), GATv2 (Brody et al., 2021) , EGNN (Satorras et al., 2021) to name a few most recent ones. Most of the models based on the message-passing framework utilize conditional linear layers. We define "conditioning" as using additional information together with feature vectors from its neighbors' nodes. For example, EGNN (Satorras et al., 2021) conditions message vectors on the distance between two nodes or DimeNet (Gasteiger et al., 2020b) additionally utilizes angle information. Many neural network models use conditioning in their layers without exploring their different variants. Therefore, improving upon the type of conditioning could still improve most state-of-the-art models. We believe that this is the first work that analyzes different conditioning methods in GNNs. In this paper, we categorize three conditioning methods: weak, strong, and pure. They differ in the level of dependency on the a given quantity, such as an edge attributes, and differ in complexity. Message passing neural network (MPNN) using weak conditioning method concatenates attributes with node features. In this scenario, linear layers effectively gain an attribute-dependent bias, which we consider a weak type of conditioning as this does not guarantee that the attribute is actually utilized, i.e., it could be ignored. On the other hand, we have pure conditioning method which forces the model to always use the attributes by letting them causally parametrize transformation matrices. However, from a practical perspective pure conditioning is computationally expensive and it can be simplified to a strong conditioning method, which corresponds to an attribute-dependent gating of the outputs of linear layers. We experiment with these three conditioning methods in variations of the EGNN model (Satorras et al., 2021) on computational chemistry datasets QM9 (Ramakrishnan et al., 2014) and MD17 (Chmiela et al., 2017) and show the advantage of strong conditioning over weak conditioning in performance, and over pure conditioning in training time. The main contributions of this paper are: 1

