STABLE, EFFICIENT, AND FLEXIBLE MONOTONE OPERATOR IMPLICIT GRAPH NEURAL NETWORKS

Abstract

Implicit graph neural networks (IGNNs) that solve a fixed-point equilibrium equation for representation learning can learn the long-range dependencies (LRD) in the underlying graphs and show remarkable performance for various graph learning tasks. However, the expressivity of IGNNs is limited by the constraints for their well-posedness guarantee. Moreover, when IGNNs become effective for learning LRD, their eigenvalues converge to the value that slows down the convergence, and their performance is unstable across different tasks. In this paper, we provide a new well-posedness condition of IGNNs leveraging monotone operator theory. The new well-posedness characterization informs us to design effective parameterizations to improve the accuracy, efficiency, and stability of IGNNs. Leveraging accelerated operator splitting schemes and graph diffusion convolution, we design efficient and flexible implementations of monotone operator IGNNs that are significantly faster and more accurate than existing IGNNs.

1. INTRODUCTION

Implicit graph neural networks (IGNNs) that solve a fixed-point equilibrium equation for graph representation learning can learn long-range dependencies (LRD) in the underlying graphs, showing remarkable performance for various tasks [69; 39; 58; 63; 22] . Let G = (V, E) represent a graph, where V is the set of nodes, and E ⊆ V × V is the set of edges. The connectivity of G can be represented by the adjacency matrix A ∈ R n×n with A ij = 1 if there is an edge connecting nodes i, j ∈ V ; otherwise A ij = 0. Let X ∈ R d×n be the initial node features whose i-th column x i ∈ R d is the initial feature of the i-th node. IGNN [39] learns the node representation by finding the fixed point, denoted as Z * , of the Picard iteration below where σ is the nonlinearity (e.g. ReLU), g B is a function parameterized by B (e.g. g B (X) = BXG), matrices W and B ∈ R d×d are learnable weights, and G is a graph-related matrix. In IGNN, G is chosen as Â := D-1/2 (I + A) D-1/2 with I being the identity matrix and D is the degree matrix with Dii = 1+ n j=1 A ij . IGNN constrains W using a tractable projected gradient descent method to ensure the well-posedness of Picard iteration at the cost of limiting the expressivity of IGNNs. The prediction of IGNN is given by f Θ (Z * ), a function parameterized by Θ. IGNNs have several merits: 1) The depth of IGNN is adaptive to particular data and tasks rather than fixed. 2) Training IGNNs requires constant memory independent of their depth -leveraging implicit differentiation [66; 2; 51; 13] . 3) IGNNs have better potential to capture LRD of the underlying graph compared to existing GNNs, including GCN [75] , GAT [73] , SSE [23], and SGC [79] . The latter GNNs lack the capability to learn LRD as they suffer from over-smoothing [56; 84; 62; 20] . Several methods have been proposed to alleviate over-smoothing and hence improve learning LRD by adding residual connections [37; 21; 55] , by geometric aggregation [65] , by adding a fully-adjacent layer [3], by improving breadth-wise backpropagation [59] , and by adding oscillatory layers [27; 67] . Z (k+1) = σ W Z (k) G + g B (X) , for k = 0, 1, 2, • • • ,



Figure 1: Epoch vs. training, validation, and test accuracy of IGNN for classifying directed chains. First row: binary chains of length 100 (left) and 250 (right). Second row: three-class chains of length 80 (left) and 100 (right).

