UNBIASED STOCHASTIC PROXIMAL SOLVER FOR GRAPH NEURAL NETWORKS WITH EQUILIBRIUM STATES

Abstract

Graph Neural Networks (GNNs) are widely used deep learning models that can extract meaningful representations from graph datasets and achieve great success in many machine learning tasks. Among them, graph neural networks with iterative iterations like unfolded GNNs and implicit GNNs can effectively capture longrange dependencies in graphs and demonstrate superior performance on large graphs since they can mathematically ensure its convergence to some nontrivial solution after lots of aggregations. However, the aggregation time for such models costs a lot as they need to aggregate the full graph in each update. Such weakness limits the scalability of the implicit graph models. To tackle such limitations, we propose two unbiased stochastic proximal solvers inspired by the stochastic proximal gradient descent method and its variance reduction variant called USP and USP-VR solvers. From the point of stochastic optimization, we theoretically prove that our solvers are unbiased, which can converge to the same solution as the original solvers for unfolded GNNs and implicit GNNs. Furthermore, the computation complexities for unfolded GNNs and implicit GNNs with our proposed solvers are significantly less than their vanilla versions. Experiments on various large graph datasets show that our proposed solvers are more efficient and can achieve state-of-the-art performance.

1. INTRODUCTION

Graph Neural Networks (GNNs) (Zhou et al., 2020; Wu et al., 2020) can effectively aggregate information from its neighbors and then encode graph information into meaningful representations and have been widely used to extract meaningful representations of nodes in graph-structured data recently. Furthermore, Graph Convolution Networks (GCNs) (Kipf & Welling, 2016) involve the convolution structure in the GNNs and drastically improve the performance on a wide range of tasks like computer vision (Xu et al., 2020b ), recommendation systems (He et al., 2020; Zhang et al., 2020b) and biochemical researches (Mincheva & Roussel, 2007; Wan et al., 2019) . Due to these results, GCN models have attracted a lot of attention and various techniques have be proposed recently, including graph attention (Veličković et al., 2017 ), normalization (Zhao & Akoglu, 2019 ), linearization (Wu et al., 2019; Li et al., 2022) and others (Klicpera et al., 2018; Rong et al., 2020) . Current GNN models usually capture topological information of T -hops by performing T iterations graph aggregation. However, T cannot be large. Otherwise, their outputs may degenerate to some trivial points and such a phenomenon is called over-smoothing (Yang et al., 2020; Li et al., 2019) . Therefore, traditional GNNs cannot discover the dependency with longer ranges. To tackle these problems, researchers have proposed some graph neural networks with iterative update algorithms (Yang et al., 2021a; b) . The implicit graph neural networks (IGNNs) (Gu et al., 2020) is another type of such model. Since these models will finally converge to an equilibrium state (stationary points or fixed points), we call these models Graph Equilibrium Models for convenience in the following paper. Above graph equilibrium models enjoy superior advantages in capturing the long-range information because they implicitly finish the "huge hop" aggregation via its forward procedure. However, graph equilibrium models have to recursively aggregate the neighborhoods of graph nodes since solving their equilibrium state needs iteratively aggregating the full graph. Therefore, it needs expensive computation costs to deal with large graphs especially when they are dense. Although many works (Chen et al., 2018; Hamilton et al., 2017) propose different aggregation methods through the sampling nodes in traditional graph models, there are no guarantees for their convergence and unbiased approximation when applying them to the graph equilibrium models. (2020a)'s works which reveal the connections between the implicit and the unfolded graph neural networks' architecture and learnable graph denoising problems, we are trying to study the efficiency of the above models from the optimization view. Then we propose two stochastic solvers for these graph equilibrium models with convergence guarantees inspired by the stochastic proximal gradient descent algorithms. Since our forward procedure only needs to aggregate subgraphs, the proposed solvers are much more efficient than vanilla deterministic solvers by gradient descent or fixed-point iterations. Furthermore, we can theoretically prove that our solvers can obtain the unbiased output as the vanilla deterministic solvers do. Our Contributions. We summarize the contributions of our methods as follows: • By splitting the graph denoising optimization for the graph equilibrium models as several sub-optimization problems, we treat their forward procedure as solving the proper finite-sum optimization problem. Then we propose two stochastic solvers for graph equilibrium models: Unbiased Stochastic Proximal Solver (USP solver) and its variant with variance reduction USP-VR solver. • Compared with the vanilla deterministic solvers which aggregate the full graph for graph equilibrium models' forward procedure, our USP solver and its variant only need to aggregate subgraphs to reach the equilibrium. Therefore, graph equilibrium models can be more efficient than before with our stochastic solvers. • We theoretically prove that USP solvers can converge to the same outputs obtained by the vanilla deterministic forward procedure in expectation. Furthermore, we empirically demonstrate our proposed method's advantages with various experiments.

2. RELATED WORKS

2.1 GRAPH NEURAL NETWORKS Most GNNs (Kipf & Welling, 2016; Veličković et al., 2017; Xu et al., 2018; Li et al., 2022) aggregate the graph information for finite times due to the over-smoothing problem. Thereby, they can hardly capture very long-range dependency. Contrary to these models, implicit graph models (Liu et al., 2021a; Gu et al., 2020; Park et al., 2021) aggregate the graph information for a lot of iterations or "infinite" iterations with theoretically non-trivial equilibrium outputs. Moreover, recent works tries to explore the connections between the graph neural models and the graph denoising optimization problem. Some works (Zhu et al., 2021; Zhang et al., 2020a) recover different graph models to various graph denoising problems. Furthermore, Zhu et al. ( 2021) also proposed two types of models by reformulating graph denoising problems from spectral filters perspective. Some researchers focus on interpreting existing graph models from the view of redesigning the correlated graph denoising models (Yang et al., 2021a; Ma et al., 2021; Yang et al., 2021b) . Researchers also pay attention to GNNs from a different perspective. For example, robust graph neural networks (Jin et al., 2020; Luo et al., 2021) , pretraining graph neural networks (Hu et al., 2019b; Qiu et al., 2020) , explanations for graph networks (Ying et al., 2019; Yuan et al., 2020) and connections to differential systems (Xu et al., 2020a; Chamberlain et al., 2021; Wang et al., 2021) .



the above reasons, how to efficiently obtain the outputs for these graph equilibrium models is an interesting problem worth exploring. Inspired by Yang et al. (2021b); Zhu et al. (2021); Zhang et al.

