GRAPHCGAN: CONVOLUTIONAL GRAPH NEURAL NETWORK WITH GENERATIVE ADVERSARIAL NET-WORKS

Abstract

Graph convolutional networks (GCN) achieved superior performances in graphbased semi-supervised learning (SSL) tasks. Generative adversarial networks (GAN) also show the ability to increase the performance in SSL. However, there is still no good way to combine the GAN and GCN in graph-based SSL tasks. In this work, we present GraphCGAN, a novel framework to incorporate adversarial learning with convolution-based graph neural network, to operate on graphstructured data. In GraphCGAN, we show that generator can generate topology structure and attributes/features of fake nodes jointly and boost the performance of convolution-based graph neural network classifier. In a number of experiments on benchmark datasets, we show that the proposed GraphCGAN outperforms the reference methods by a significant margin.

1. INTRODUCTION

Graph-based semi-supervised learning (SSL) aims to classify nodes in graph, where only small amounts of nodes are labeled due to the expensive and time-consuming label collection process. To solve such task, various graph neural networks (GNNs) have been proposed using the idea of convolutional neural networks (CNN) to implicitly propagate the information of labeled nodes to unlabeled nodes through the linkage between nodes (Kipf & Welling, 2016; Veličković et al., 2017; Hamilton et al., 2017) . These convolution-based graph neural networks have achieved superior performance on multiple benchmark datasets in graph-based SSL tasks (Wu et al., 2019) . Recently, generative adversarial networks (GANs) (Goodfellow et al., 2014) have been shown a power in improving the performance of image-based SSL problems (Odena, 2016; Salimans et al., 2016; Li et al., 2019b) . In semi- GAN (Salimans et al., 2016) , authors converted the M -class classification task into solving (M + 1)-class problem where the synthetic (M + 1)th class is generated by the GAN's generator. Later on, Dai et al. provided a theoretical insight that the generated data are able to boost the performance of classifier under certain assumptions. Our work is motivated by the the semi-GAN. GraphSGAN (Ding et al., 2018) first investigated the adversarial learning over graph, where the graph is embedding into an embedding space and synthetic data are generated in the corresponding space. The multi-layer perceptron (MLP) is trained as the classifier on the embedding vectors. However, to our knowledge, there is still no existed method to combine the adversarial learning to convolution-based GNNs on graph-based SSL task. In this work, we explore the potential of incorporating the convolution-based GNN and GAN. The challenges of constructing a general framework have three folds: first, the attributed graph data are non-Euclidean whose distribution contains information of graph topology structure as well as the attributes of nodes. Hence, it is not trivial to construct generator to model the distribution. Second, even the generator can model the graph's distribution, the generator should be trained properly to boost the performance of the classifier. A poor-quality generator would introduce noise to the existed graph and affect the classifier. Third, many variants of GCN have been proposed continuously. The framework should be built with flexibility to adapt to different convolution-based GNNs. We construct a novel approach called GraphCGAN to deal with above challenges. First, to model the distribution of graph, the generator is built sequentially from two sub-generators: one models the attribute information (node's attribute) and another one models the graph topology structure (adjacency relation of node). Details can be found in Section 3.1. Second, in GraphCGAN, the generator is trained based on the feature matching technique (Salimans et al., 2016) which minimizes the distance between generated nodes and real nodes in the constructed feature space. This technique showed a good performance in SSL tasks in practice. The details for construction of loss functions can be found in Section 3.3. For GCN, the attributes of nodes are aggregated convolutionally by multiple layers. The representation of the last layer is usually considered as the prediction for the labels. For variants of GCN, the main differences exist in the strategy of layer aggregation (Hamilton et al., 2017) . In our framework, we choose the second to the last layer of convolution-based GNN as the feature matching functions. Therefore, our framework is easily extended to variants of GCN. More discussions can be found in Section 3.2.

2. PRELIMINARY

We first introduce the notation about graph. Let G = (V, E) denote a graph, where V is the set of nodes with |V | = n and E ⊂ V × V is a set of edges with |E| = m. The adjacency matrix A ∈ R |V |×|V | is defined as A ij = 1 if node v i and v j has edge, otherwise A ij = 0. Suppose each node v i has a d-dimensional feature x i ∈ R d and a single value label y i ∈ {1, 2, .., M }. In the semi-supervised learning setting, there is a disjoint partition for the nodes, V = V L ∪ V U , such that, for v i ∈ V L , the corresponding label is known and for v j ∈ V U the corresponding label is unknown. The distributions of node in labeled set V L and unlabeled set V U are denoted as p V L and p V U , respectively. The semi-supervised learning is to learn the label for unlabeled set {y j |v j ∈ V U } given adjacency matrix A, feature matrix X = [x i ] vi∈V and labels for labeled sets {y i |v i ∈ V L }.

2.1. CONVOLUTION BASED GRAPH NEURAL NETWORK CLASSIFIER

Based on the Laplacian smoothing, the convolution-based GNN models propagate the information of nodes features across the nodes' neighbors in each layer. Specifically, in GCN, the layer-wise propagation rule can be defined as follows: H (l+1) = σ(D -1 AH (l) W (l) + b (l) ), l = 0, 1, 2.., L -1 (1) where W (l) and b (l) are layer-specific trainable weight matrix and bias, respectively. σ(•) is an activation function. D is the diagonal degree matrix with D ii = j A ij . Hence, D -1 A represents normalization of adjacency matrix A. The initial layer H (0) is the feature matrix X. The final layer H (L) followed by a sof t max layer can be viewed as the prediction of one-hot representation for the true label y. Recently, many variants of the GCN layer-wise propagation rule had been proposed, including graph attention network, cluster GCN (Veličković et al., 2017; Chiang et al., 2019) , which achieved stateof-the-art performances in many benchmark datasets.

2.2. GENERATIVE ADVERSARIAL NETWORK BASED SEMI-SUPERVISED LEARNING

In semi-GAN, the classifier C and generator G play a non-cooperative game, where classifier aims to classify the unlabeled data as well as distinguish the generated data from real data; generator attempts to match feature of real data and that of generated data. Therefore, the objective function for classifier can be divided into two parts (Salimans et al., 2016) . The first part is the supervised loss function L sup = E v,y∼p V L log P C (y|v, y ≤ M ) which is the log probability of the node label given the real nodes. The second part is the unsupervised loss function L un-sup = E v∼p V U log P C (y ≤ M |v) + E v∼p V G log P C (y = M + 1|v) which is the sum of log probability of the first M classes for real nodes and the log probability of the (M + 1)th class for generated nodes V G . The classifier C can be trained by maximize the objective function L C = L sup + L un-sup . (2)

