AN EXPLORATION OF CONDITIONING METHODS IN GRAPH NEURAL NETWORKS

Abstract

The flexibility and effectiveness of message passing based graph neural networks (GNNs) induced considerable advances in deep learning on graph-structured data. In such approaches, GNNs recursively update node representations based on their neighbors and they gain expressivity through the use of node and edge attribute vectors. E.g., in computational tasks such as physics and chemistry usage of edge attributes such as relative position or distance proved to be essential. In this work, we address not what kind of attributes to use, but how to condition on this information to improve model performance. We consider three types of conditioning; weak, strong, and pure, which respectively relate to concatenation-based conditioning, gating, and transformations that are causally dependent on the attributes. This categorization provides a unifying viewpoint on different classes of GNNs, from separable convolutions to various forms of message passing networks. We provide an empirical study on the effect of conditioning methods in several tasks in computational chemistry.

1. INTRODUCTION

Graph neural networks (GNNs) are a family of neural networks that can learn from graph-structured data. Starting with the success of GCN (Kipf & Welling, 2016) in achieving state-of-the-art performance on semi-supervised classification, several variants of GNNs have been developed for this task, including Graph-SAGE (Hamilton et al., 2017) , GAT (Veličković et al., 2017 ), GATv2 (Brody et al., 2021) , EGNN (Satorras et al., 2021) to name a few most recent ones. Most of the models based on the message-passing framework utilize conditional linear layers. We define "conditioning" as using additional information together with feature vectors from its neighbors' nodes. For example, EGNN (Satorras et al., 2021) conditions message vectors on the distance between two nodes or DimeNet (Gasteiger et al., 2020b) additionally utilizes angle information. Many neural network models use conditioning in their layers without exploring their different variants. Therefore, improving upon the type of conditioning could still improve most state-of-the-art models. We believe that this is the first work that analyzes different conditioning methods in GNNs. In this paper, we categorize three conditioning methods: weak, strong, and pure. They differ in the level of dependency on the a given quantity, such as an edge attributes, and differ in complexity. Message passing neural network (MPNN) using weak conditioning method concatenates attributes with node features. In this scenario, linear layers effectively gain an attribute-dependent bias, which we consider a weak type of conditioning as this does not guarantee that the attribute is actually utilized, i.e., it could be ignored. On the other hand, we have pure conditioning method which forces the model to always use the attributes by letting them causally parametrize transformation matrices. However, from a practical perspective pure conditioning is computationally expensive and it can be simplified to a strong conditioning method, which corresponds to an attribute-dependent gating of the outputs of linear layers. We experiment with these three conditioning methods in variations of the EGNN model (Satorras et al., 2021) on computational chemistry datasets QM9 (Ramakrishnan et al., 2014) and MD17 (Chmiela et al., 2017) and show the advantage of strong conditioning over weak conditioning in performance, and over pure conditioning in training time. The main contributions of this paper are: (i) A unifying analysis of geometric message passing by formulating conditional transformations in terms of various forms of conditional linear layers. (ii) An intuitive exposition of different conditioning methods in the context of convolutional message passing. (iii) Empirical studies that show the benefit of strong conditioning methods, as well as the benefit of deep conditioning in multi-layer perceptron-based message functions. In this work, we address not what kind of attributes to use, but how to condition on this information to improve model performance. As such, we focus on an intuitive analysis and in the experimental section we do not intend to achieve the best performance, but focus on ablation studies in order to obtain general take-home messages.

2. PRELIMINARIES

In this section, we introduce the relevant materials on graph neural networks on top of which we will later complement our analysis and definitions of conditioning methods.

2.1. GRAPH NEURAL NETWORK

In this work, we consider the graph regression task as an example. A graph is represented by G = (V, E) with nodes v i ∈ V and edges e ij ∈ E. A typical message passing layer (Gilmer et al., 2017) is defined as: m ij = ϕ e (h l i , h l j , a ij ) m i = j∈N (i) m ij h l+1 i = ϕ h (h l i , m i ) Where h l i is the embedding of node v i at layer l, a ij is the edge attribute of nodes v i and v j , and N (i) is the set of neighbors of the node v i . Finally, ϕ e and ϕ h are the message (edge) and update (node) functions respectively which are commonly parametrized by Multilayer Perceptrons (MLPs).

2.2. GEOMETRIC GRAPH NEURAL NETWORKS

When the graphs have an embedding in Euclidean space, i.e., each node v i has an associated position x i ∈ R n , we want to leverage this geometric information whilst preserving stability/invariance to rigid-body transformations. That is, many tasks are invariant to Euclidean distance preserving transformations in E(n). E.g., the prediction of energy of a system of atoms in invariant to its global position and orientation in space. Several works have shown how to build equivariant message passing based graph neural networks for such geometric graphs. Central in those works is the conditioning of the message and update function on invariant geometric attributes, such as the pairwise distance a ij = ∥x j -x i ∥, as popularized in (Satorras et al., 2021) , or covariant spherical/circular harmonic embeddings of relative position a ij = Y (x j -x i ) as is common steerable group convolution-based graph NNs (Brandstetter et al., 2021) . Here we consider attributes that transforms predictably via representations of E(n) as covariants, and those that remain invariant as invariants. Such covariants typically contain more (directional) information, but require specialized operations such as the Clebsch-Gordan tensor product (Thomas et al., 2018; Anderson et al., 2019) in order to preserve equivariance of the graph NNs. Satorras et al. (2021) show that with a simple recipe based on invariant attributes, one can often obtain equally powerful graph NNs. As such, we focus this paper on the use of a ij = ∥x j -x i ∥ as a sufficiently expressive attribute, and model the message and update functions ϕ e and ϕ h as regular MLPs. Our objective then is to understand what is the most effective way of utilizing attributes in geometric graph NNs. To make this notion of conditioning explicit, we will denote the message and update function as ϕ e (h l i , h l j | a ij ) and ϕ h (h l i | a i ) , where we note that, although uncommon, it is possible to define invariant or covariant geometric node attributes a i (Brandstetter et al., 2021) .

