SIGNED GRAPH DIFFUSION NETWORK

Abstract

Given a signed social graph, how can we learn appropriate node representations to infer the signs of missing edges? Signed social graphs have received considerable attention to model trust relationships. Learning node representations is crucial to effectively analyze graph data, and various techniques such as network embedding and graph convolutional network (GCN) have been proposed for learning signed graphs. However, traditional network embedding methods are not end-to-end for a specific task such as link sign prediction, and GCN-based methods suffer from a performance degradation problem when their depth increases. In this paper, we propose SIGNED GRAPH DIFFUSION NETWORK (SGDNET), a novel graph neural network that achieves end-to-end node representation learning for link sign prediction in signed social graphs. We propose a random walk technique specially designed for signed graphs so that SGDNET effectively diffuses hidden node features. Through extensive experiments, we demonstrate that SGDNET outperforms state-of-the-art models in terms of link sign prediction accuracy.

1. INTRODUCTION

Given a signed social graph, how can we learn appropriate node representations to infer the signs of missing edges? Signed social graphs model trust relationships between people with positive (trust) and negative (distrust) edges. Many online social services such as Epinions (Guha et al., 2004) and Slashdot (Kunegis et al., 2009) that allow users to express their opinions are naturally represented as signed social graphs. Such graphs have attracted considerable attention for diverse applications including link sign prediction (Leskovec et al., 2010a; Kumar et al., 2016) , node ranking (Jung et al., 2016; Li et al., 2019b ), community analysis (Yang et al., 2007; Chu et al., 2016 ), graph generation (Derr et al., 2018a; Jung et al., 2020) , and anomaly detection (Kumar et al., 2014) . Node representation learning is a fundamental building block for analyzing graph data, and many researchers have put tremendous efforts into developing effective models for unsigned graphs. Graph convolutional networks (GCN) and their variants (Kipf & Welling, 2017; Velickovic et al., 2018) have spurred great attention in machine learning community, and recent works (Klicpera et al., 2019; Li et al., 2019a) have demonstrated stunning progress by handling the performance degradation caused by over-smoothing (Li et al., 2018; Oono & Suzuki, 2020) (i.e., node representations become indistinguishable as the number of propagation increases) or the vanishing gradient problem (Li et al., 2019a) in the first generation of GCN models. However, all of these models have a limited performance on node representation learning in signed graphs since they only consider unsigned edges under the homophily assumption (Kipf & Welling, 2017). Many studies have been recently conducted to consider such signed edges, and they are categorized into network embedding and GCN-based models. Network embedding (Kim et al., 2018; Xu et al., 2019b) learns the representations of nodes by optimizing an unsupervised loss that primarily aims to locate two nodes' embeddings closely (or far) if they are positively (or negatively) connected. However, they are not trained jointly with a specific task in an end-to-end manner, i.e., latent features and the task are trained separately. Thus, their performance is limited unless each of them is tuned delicately. GCN-based models (Derr et al., 2018b; Li et al., 2020) have extended the graph convolutions to signed graphs using balance theory (Holland & Leinhardt, 1971 ) in order to properly propagate node features on signed edges. However, these models are directly extended from existing GCNs without consideration of the over-smoothing problem that degrades their performance. This problem hinders them from exploiting more information from multi-hop neighbors for learning node features in signed graphs. L) , which is fed to a loss function under an end-to-end framework. (b) A single SGD layer learns node embeddings based on signed random walk diffusion. (c) Our diffusion module aggregates the features of node v so that they are similar to those connected by + edges (e.g., node u), and different from those connected byedges (e.g., node t). Also, it injects the local feature (i.e., the input feature of each module) of node v at each aggregation to make the aggregated features distinguishable. We propose SGDNET (SIGNED GRAPH DIFFUSION NETWORK), a novel graph neural network for node representation learning in signed graphs. Our main contributions are summarized as follows: • End-to-end learning. We design SGDNET that performs end-to-end node representation learning. Given a signed graph, SGDNET produces node embeddings through multiple signed graph diffusion (SGD) layers (Figure 1 (a)), which are fed into a loss function of a specific task such as link sign prediction. • Novel feature diffusion. We propose a signed random walk diffusion method that propagates node embeddings on signed edges based on random walks considering signs, and injects local features (Figure 1(c) ). This enables SGDNET to learn distinguishable node representations considering multi-hop neighbors while preserving local information. • Experiments. Extensive experiments show that SGDNET effectively learns node representations of signed social graphs for link sign prediction, giving at least 3.9% higher accuracy than the state-of-the-art models in real datasets (Table 2 ).

2. RELATED WORK

2.1 GRAPH CONVOLUTIONAL NETWORKS ON UNSIGNED GRAPHS Graph convolutional network (GCN) (Kipf & Welling, 2017) models the latent representation of a node by employing a convolutional operation on the features of its neighbors. Various GCN-based approaches (Kipf & Welling, 2017; Velickovic et al., 2018; Hamilton et al., 2017) have aroused considerable attention since they enable diverse graph supervised tasks (Kipf & Welling, 2017; Yao et al., 2019; Xu et al., 2019a) to be performed concisely under an end-to-end framework. However, the first generation of GCN models exhibit performance degradation due to the over-smoothing and vanishing gradient problems. Several works (Li et al., 2018; Oono & Suzuki, 2020) have theoretically revealed the over-smoothing problem. Also, Li et al. (Li et al., 2019a) have empirically shown that stacking more GCN layers leads to the vanishing gradient problem as in convolutional neural networks (He et al., 2016) . Consequently, most GCN-based models (Kipf & Welling, 2017; Velickovic et al., 2018; Hamilton et al., 2017) are shallow; i.e., they do not use the feature information in faraway nodes when modeling node embeddings. A recent research direction aims at resolving the limitation. Klicpera et al. (Klicpera et al., 2019) proposed APPNP exploiting Personalized PageRank (Jeh & Widom, 2003) to not only propagate hidden node embeddings far but also preserve local features, thereby preventing aggregated features from being over-smoothed. Li et al. (Li et al., 2019a) suggested ResGCN adding skip connections between GCN layers, as in ResNet (He et al., 2016) . However, all of these models do not provide how to use signed edges since they are based on the homophily assumption (Kipf & Welling, 2017), i.e.,



Figure 1: Overall architecture of SGDNET. (a) Given a signed graph G and initial node features X, SGDNET with multiple SGD layers produces the final embeddings H (L) , which is fed to a loss function under an end-to-end framework. (b) A single SGD layer learns node embeddings based on signed random walk diffusion. (c) Our diffusion module aggregates the features of node v so that they are similar to those connected by + edges (e.g., node u), and different from those connected byedges (e.g., node t). Also, it injects the local feature (i.e., the input feature of each module) of node v at each aggregation to make the aggregated features distinguishable.

