GRAPH VIEW-CONSISTENT LEARNING NETWORK Anonymous authors Paper under double-blind review

Abstract

Recent years, methods based on neural networks have made great achievements in solving large and complex graph problems. However, high efficiency of these methods depends on large training and validation sets, while the acquisition of ground-truth labels is expensive and time-consuming. In this paper, a graph view-consistent learning network (GVCLN) is specially designed for the semisupervised learning when the number of the labeled samples is very small. We fully exploit the neighborhood aggregation capability of GVCLN and use dual views to obtain different representations. Although the two views have different viewing angles, their observation objects are the same, so their observation representations need to be consistent. For view-consistent representations between two views, two loss functions are designed besides a supervised loss. The supervised loss uses the known labeled set, while a view-consistent loss is applied to the two views to obtain the consistent representation and a pseudo-label loss is designed by using the common high-confidence predictions. GVCLN with these loss functions can obtain the view-consistent representations of the original feature. We also find that preprocessing the node features with specific filter before training is good for subsequent classification tasks. Related experiments have been done on the three citation network datasets of Cora, Citeseer, and PubMed. On several node classification tasks, GVCLN achieves state-of-the-art performance.

1. INTRODUCTION

Convolutional neural networks (CNNs) (Krizhevsky et al., 2012) performed outstandingly in solving problems such as image classification (Rawat & Wang, 2017) , semantic segmentation (Kampffmeyer et al., 2016) and machine translation (Cho et al., 2014) etc. This is because CNNs can effectively reuse the convolution kernel and use the given input to train optimal parameters. The original data mentioned in above problems all have a grid-like data structure, that is, Euclidean spatial data. In reality, there are also lots of non-Euclidean spatial data, such as social networks, telecommunication networks, biological networks, and brain connection structures, etc. These data are usually represented in the form of graphs, where every node in the graph represents a single individual. Graph problems can be roughly divided into there direction: link prediction (Zhang & Chen, 2018), graph classification (Zhang et al., 2018a) and node classification (Kipf & Welling, 2016) . In this paper, we focus on semi-supervised node classification when the label rate is very low. Many new methods have been proposed to generalize the convolution operation to process graph structure data on arbitrary graphs for node classification. These methods can be divided into spatial convolution and spectral convolution methods (Zhang et al., 2018b) . For spatial methods, they directly define graph convolution by designing certain operations on node's neighbors. For example, Duvenaud et al. (2015) propose a convolutional neural network that can directly operate on graph data, which can provide an end-to-end feature learning method; Atwood & Towsley (2016) propose a fusion convolutional neural network (DCNNS), which introduces the graph fusion method to incorporate the context information of the nodes in the graph node classification; The Graph Attention Network (GATs) (Veličković et al., 2017) introduces the attention mechanism into the graph data processing to construct the attention layer for semi-supervised learning. The spectral method generally defines the graph convolution operation on spectral representation of graph. For example, Bruna et al. (2013) propose that graph convolution can be defined in the Fourier domain based on the eigenvalue decomposition of the graph Laplacian matrix; Defferrard et al. (2016) propose to use the Chebyshev expansion of the graph Laplacian to approximate the spectral domain filtering, which can avoid high computational complexity brought by eigenvalue decomposition; Kipf & Welling (2016) propose a simpler Graph Convolutional Networks (GCNs) for semi-supervised learning, which can achieve higher classification accuracy by using a simple two-layer networks. However, large training and validation sets are required in these methods to complish effect classification task, while obtaining true labels is time-consuming, laborious and costly. And they send original graph node features directly into networks for training, however, there are lots of redundant information in the original features of the nodes. In order to train an efficient model with only a few label nodes and even without validation, we put forward our own method: graph view-consistent learning network (GVCLN), which constructs a node classification network based on the consistency between two views. First, we independently train two-view encoders (can be different) to obtain two representations of every node. The function of the viewers is converting high-dimensional node features into low-dimensional embeddings (Zhu et al., 2020) . The clustering hypothesis (Vandenberg & Matthias, 1977) show that examples in the same cluster are more likely to have the same label. According to this hypothesis, the decision boundary should try to pass through the place where the data is relatively sparse, so as to avoid dividing the data points in dense clusters on both sides of the decision boundary. Although the two views have different viewing angles, their observation objects are the same, so their observation results should be consistent. Therefore, the features encoded by the two viewers should make the decision boundary pass through the place where the date is sparse, that is, there should be consistency between the two views. Then, we design three loss functions, namely, supervised loss function, consistency loss function, and pseudo-label loss function. The supervised loss uses the known labeled set, while a view-consistent loss is applied to the two views to obtain the consistent representation and a pseudo-label loss is designed by using the common high-confidence predictions as pseudo label. GVCLN with these loss functions can obtain the view-consistent representation of the original feature. Our contributions are summarized as follows: • We propose a graph view-consistent learning framework for semi-supervised node classification, which fully demonstrates the theoretical structure of graph view-consistency. • We design GVCLN to successfully tackle label insufficiency in semi-supervised learning. • We demonstrate the high efficacy and efficiency of the proposed methods on various semisupervised node classification tasks.

2.1. NOTATIONS

A graph contains two parts: node and edge. Each node represents an individual, which can be a paper or a person, etc. The edge indicates a connection between two nodes. If the edge connecting two nodes in the graph is directional, it is directed graph, otherwise it is undirected graph. A simple and connected undirected graph can be written as G = (V, E), where V is the node set and E is the set of edges. n = |V | represents the number of all nodes in G. Considering that the node itself has a great influence on the graph structure, the graph used in the calculation of the network is generally a self-loop graph, namely G = V, Ẽ , which attaches a self-loop to each node in G. A denotes the adjacency matrix and D is the diagonal degree matrix. Therefore, the adjacency matrix and diagonal degree matrix of G are defined as Ã = A + I and D, respectively. I indicates the identity matrix. The node feature matrix is X ∈ R n×d , in which, each node i is associated with a d-dimensional feature vector x i . The normalized graph Laplacian matrix is defined as L = I -D -1/2 AD -1/2 , which is a symmetric positive semidefinite matrix with eigendecomposition U ΛU , where Λ is a diagonal matrix of the eigenvalues of L, and U ∈ R n×n is a unitary matrix that consists of the eigenvectors of L. The graph convolution operation between signal x and filter g γ (Λ) = diag (γ) is defined as g γ (L) * x = U g γ (Λ) U x, where the parameter γ ∈ R n corresponds to a vector of spectral filter coefficients.

